KNIME:A World-Class Analytics Platform That Doesn’t Require Coding.

1.Introduction

To develop model in Machine Learning, one needs to understand Linear Algebra, Statistics, and other important concepts in Mathematics. Even though you are already comfortable with the subjects mentioned, you still need to learn not only ‘HOW TO CODE’ but also several important concepts in Computer Science like Algorithm and Database. Additionally, in order to start a project in Machine Learning, one needs to learn how to install and set up the whole coding environment, as well as learning how to use command line. These issues are tedious and become one of the biggest challenges for newcomers and beginners in learning several important concepts simultaneously in order to get into Data Science, Data Analysis, and Machine Learning spaces. Unfortunately, some people give up and never look back to Machine Learning again.

However, there is some GOOD NEWS! With great development in GUI-based application, the introduction of KNIME is a major game changer for common people who generally do not identify themselves as a programmer. The major benefit of KNIME is that no programming knowledge is required. If you know how to use Microsoft Excel, KNIME gives you the same feeling. All you need to do is just get into KNIME website and download the application (https://www.knime.com/downloads/download-knime). Now, you are ready to get into Machine learning and use the application without any further setups required. To load data, just drag and drop your file in! To clean and pre-process data, just click the functions! To select, train, and test Machine Learning models, just drag and drop any model you want! To visualize your interesting findings, just drag and drop any kind of graph you want! This allows you to focus your efforts on applying Machine Learning algorithms and techniques to your problems and subjects you are interested in the first day of your work!

In summary, to use KNIME, all you need is to just simply define the Workflow between a variety of predefined nodes which are already provided in its repository. This is very convenient since KNIME already provides several predefined components called “Nodes” for numerous different tasks such as Reading data, Cleaning data, Applying ML algorithms, Visualizing data in different formats, and Analyzing results.

1.1) What is KNIME?

KNIME stands for “Konstanz Information Miner” which was developed at the University of Konstanz in Germany around January 2004. It is an Open Source software written in Java on the Eclipse SDK platform. KNIME platform relies on pre-defined components called Nodes’ for building and executing Workflows’. Its core functionality is available for tasks such as machine learning, data mining, analysis, and manipulation. Additionally, the extra features and functionality are available in KNIME through various extensions and supports from numerous community support groups and vendors.

1.2) Why uses KNIME?

KNIME is a GUI-driven platform for analytics. This means that knowledge of coding is not a requirement (though sometimes writing code is needed but minimal if you want to add more complexity into your workflow.) In addition, as stated before, KNIME is an open source application, meaning that it is free to use. Also, it is a powerful and fully functional GUI-based application that is capable of helping us easily understand the whole complex processes of Machine Learning from start to finish by means of creating, editing, annotating, visualizing, and sharing workflows. Furthermore, it allows us to integrate data from many potential sources(files, database, web services) and rather to perform several essential Machine Learning related algorithms and functions ranging from basic I/O to data manipulations, data transformations, and data mining. In summary, KNIME helps consolidate the combination of various different processes into one single understandable Workflow.

KNIME is a GUI-driven platform for analytics. This means that knowledge of coding is not a requirement (though sometimes writing code is needed but minimal if you want to add more complexity into your workflow.) In addition, as stated before, KNIME is an open source application, meaning that it is free to use. Also, it is a powerful and fully functional GUI-based application that is capable of helping us easily understand the whole complex processes of Machine Learning from start to finish by means of creating, editing, annotating, visualizing, and sharing workflows. Furthermore, it allows us to integrate data from many potential sources(files, database, web services) and rather to perform several essential Machine Learning related algorithms and functions ranging from basic I/O to data manipulations, data transformations, and data mining. In summary, KNIME helps consolidate the combination of various different processes into one single understandable Workflow.

1.3) KNIME Workflow Bench

I). Workflow Project:

It consists of LOCAL workspaces which comprise all workspaces you have created from your own local machine, KNIME hub where you can connect with KNIME online server and community, and EXAMPLE workspace where you can get example projects that have already been created by KNIME community and ready to be used.

II). Recommended Nodes or Workflow Coach:

It lists nodes recommended based on the workflows built by the wide community of KNIME users.

III). Main Tab:

We can call it a ‘ToolBar’ tab as well. It consists of various basic functions for operating KNIME such as a function to execute and cancel selected nodes.

IV). Project Tabs:

It shows our current projects as you can create and execute several projects at the same time.

V). Node Repository:

It consists of all the available nodes in core KNIME Analytics Platform and in the extensions (Also, the nodes you have installed are listed here). The nodes are nicely organized by categories based on the node functions. Under each main node category, you can expand and select specific nodes with your desired functions.

Tip: you can also use the Search box on the top of the node repository to find specific nodes.

5.1). Nodes: A node can have 3 states.

5.1.1). Red: “Not Ready/Idle” state which means that the node is not yet configured and can not be executed with its current settings.

5.1.2). Yellow: “Ready/Configured” state which means that the node has been set up correctly and can be executed at any time.

5.1.3). Green: “Executed” state which means that the node has been successfully executed and we can see the results at the final nodes (downstream nodes).

VI). Node Description:

It shows the description of the currently active workflow or a selected node in the Workflow Editor or Node Repository.

Tip: it is very useful in the initial stages of learning when you are new to KNIME, do not have knowledge much about Machine Learning, or forget about purposes of each node in the Workspace or in the Node Repository.

VII). Outline:

It is an overview of the currently active workflow.

Tip: it is very useful since sometimes your workflow is very big. The outline will work as a map/ big picture for your Workflow space.

VIII). Console:

It shows the execution message and status which help indicate what is going on at the current workflow state such as successful operation, error in file, and so on.

Tip: It is very useful in helping diagnose the workflow and examine the analytics results.

IX). Public Server:

This tab helps you connect to KNIME server in case you want to search something on KNIME online hub.

2) Basic Process of Data Analysis with KNIME:-

I). Data Reading:

Usually, the first thing we should do when analyzing data is reading data. In ‘Node Repository’, we can see all kinds of Reader nodes such as CSV Reader node, EXCEL Reader node, Table Reader node, and so on. All we need to do is simply dragging and dropping the node we want into ‘Workflow Editor’.

Right click the node, we can change the node’s configuration; for example, we can select the path of data where we will get data from and then execute the node. If the node is executed successfully, the red light icon above node name will turn green. Later, we can have a look at the loaded data from the executed node.

II). Data Pre-processing:

2.1) Filtering:

Most of the time, we do not need all information from our dataset. ‘Row Filter’ node and ‘Column Filter’ node help us select rows and columns that we want to use. This operation can be achieved by setting the configuration of the node in order to extract specific rows and columns we intend to use.

2.2) Obtaining Description:

After selecting the columns, we may want to see the description of the data; for example, we may want to know the minimum value, maximum value, mean value, the standard deviation of our numeric data, and so on. All we need to do is to find the ‘Data Explorer’ node from the ‘Node Repository’ and drag it into ‘Workflow Editor’. Later, we connect the current node(‘node 2’) with recently imported node(‘node 3’) by connecting the ‘Black Arrow’ from tail to head between two nodes together. After that, we now can execute our new node by right-clicking on the ‘node 3’ and choosing ‘Execute and Open views’ option for executing our latest operation.

Now we can see the description of our data.

Additionally, KNIME gives us even more information. We can even see the distribution of data in each column (Bar chart) in this step.

2.3) Combining or Joining:

Sometimes, we may need to combine different datasets from various sources into one single dataset as to get all necessary information we want to use. By using ‘Joiner’ node, we can join two datasets into one single dataset in any different joining mode such as Inner join, Left join, or Right join.

2.4) Removing the missing values:

‘Missing Value’ node helps handle missing values found in cells of the input table. For example, we can replace missing values of numeric type with mean value of that column. Similarly, the missing value of string type can also be replaced with the most frequent value occurring in that specific column.

2.5) Sorting:

‘Sorter’ node helps sort the rows according to user-defined criteria. In the dialog box, we can select the columns according to which of our data should be sorted. Also, we can select how our data should be sorted in ascending or descending order.

III). Model Selection and Data Analysis:

In KNIME, there are many analytic methods. In this example, we apply Machine Learning algorithm called Random Forest to perform our analysis. We can just drag the ‘Random Forest Learner’ node from ‘Node Repository’ and drop it into our ‘Workflow Editor’. Furthermore, we can set the configuration of our model node such as number of Trees. We can now execute and train our model. After that if we want to make a prediction, we just drag the ‘Random Forest Predictor’ node from ‘Node Repository’ into the ‘Workflow Editor’ and execute. We can now see the prediction results.

IV). Visualization:

In KNIME, there are many different kinds of plot nodes. For example, we can combine ‘Color Manager’ node and ‘Scatter Plot’ node to customize colors and draw a scatter plot to show the distribution of age. We can select colors and choose which column will be on the x-axis and which column will be on the y-axis in the configuration dialog box.

Conclusion

KNIME is a powerful platform which is very easy to learn and use. When talking about the life cycle of Data Science, we are talking about data collection, data cleaning, data integration, analysis/modeling and visualization. KNIME is very useful and powerful because users could easily complete all of these steps in this single platform. Furthermore, it is easy to learn because users of KNIME do not need to have any background in programming. It makes data analysis available for everyone, especially for the person who needs to analyze data only occasionally.we believe that the innovation of KNIME is beneficial to overall Data Science community as it helps facility and introduce powerful Analytics platform to newcomers and non-programmers.

Just try this out, and ping me if you have any queries:

you can DM me on Linkedin

Will be posting more articles on Knime and Data Science in future.

References:

Data Science Intern at Cilans Systems | Pursuing Msc.Big Data Analytics at St.Xavier's College,Ahmedabad

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store