Data Science: Orange Tool Basic Overview
Data mining is used to build prediction models based on historical data. They can help in making decisions and predict future trends. Orange is a framework for data visualization, machine learning, and data mining with a front-end for visual programming. It is a very helpful tool for analyzing big data sets and supports visual programming tools for Data mining.
In this article, I have briefly described the basic functionalities of the Orange tool.
Orange Widgets — Graphical user interface to orange’s data mining and machine learning techniques. They are the various components present in Orange. The widgets are divided into various categories like Data, Visualize, Model, Evaluate, and so on.
Widgets offer essential functionality, like:
- Displaying data table and allowing to selection features
- Data reading
- Training predictors and comparison of learning algorithms
- Data element visualization, etc.
How to use workflows in Orange?
Orange Workflows consist of components that read, process and visualize data. Widgets communicate by sending information along with a communication channel. An output from one widget is used as input to another. This makes a workflow.
Here, I have used business- financial dataset. Orange provides a few inbuild datasets, you can use one of those or import one of your choices. I have used the Heart Disease dataset. This data is on the presence of heart disease in patients.
The workflow is designed in such a way that data from the dataset is sent to the data table to view the data in tabular form, to Distributions for creating a distribution, and a Scatter Plot to plot from the dataset.
To create this simple workflow in Orange,
Step 1: Load the dataset using the File widget
Step2: Create links from File to Data Info, Data Table, Distributions, and Scatter Plot.
To load data — Drag and drop the File widget from the left pane and place it in the canvas. Double click on the File widget and select the desired dataset.
How to do basic data exploration (like data distribution, data information):
To get information about the loaded data we use the Data Info widget. It shows the dataset name, size, features, description, row count, column count, and targets, and data attributes in the dataset.
To view your data in tabular form, use the Data Table widget, drag and drop the widget to the canvas and create a link from the File widget to the Data Table widget.
In Feature Statistics, You can see Name, Distribution, Mean, Median etc…
If you want to load external data use can select the URL option in the File widget, where one can paste the external dataset link to load the data.
That’s it for the introduction part of the orange tool we will explore this tool in detail in the next part of the Data Science series. You can explore more about the Orange tool here.