Machine Learning Process
The Machine Learning process involves building a predictive model that can be used to find a solution for a problem statement. In order to solve any problem in machine learning, there are a couple of steps that we need to follow. Those are
- Define Objective
- Data Gathering
- Preparing Data
- Data Exploration
- Building a Model
- Model Evaluation
In order to understand the machine learning process, let us assume that you have been given a problem that needs to be solved by using machine learning.
Step 1: Define the objective of the problem:
The problem that we need to solve is we need to predict the occurrence of rain in our local area by using machine learning.
Basically, we need to predict the possibility of rain by studying the weather conditions. What we did here is to define the objective of the problem. In this we also know we are trying to predict the output going to be a continuous variable or is it going to be a discrete variable?
We need to understand which is target variable, what are the predictor variables that we need in order to predict the outcome. Target variable will be basically a variable that can tell us whether it is going to rain or not.
Input data is we need data such as the temperature on a particular day or the humidity level, the precipitation and so on. We need to define objective at this stage. Basically, we have to form an idea of the problem at this storage.
Another one we need to know is the kind of problem we are solving. Is this a binary classification problem or is this a clustering problem or is this a regression problem?
Step 2: Data gathering(Machine learning):
In this step, we have to know the answers for questions such as what kind of data is needed to solve this problem. Is this data available? And if it is available, from where can I get this data and how can I get this data.
Data gathering is one of the most time consuming step in machine learning process. If we have to go manually and collect the data, it’s going to take a lot of time. So in order reduce our time there are a lot of resources online, which were wide data sets. We need to do is web scraping where we just have to go ahead and download data from the websites like Cargill.
Coming to predicting the weather, the data needed for weather forecasting includes measures like humidity level, the temperature, the pressure, the locality, whether or not we live in a hill station, such data has to be collected or stored for analysis. It means we gathered all the data during data gathering stage.
Step 3: Preparing Data:
It is also known as data cleaning. We collected data from online resources from any website, the data will require cleaning and preparation. The data is never in the right format. We have to do some sort of preparation and some sort of cleaning in order to make data ready for analysis.
While cleaning data we should encounter a lot of inconsistencies in the data set like missing values, redundant variables, duplicate values and extra. Removing such inconsistencies is very important, because that might lead to any wrongful computations and predictions. Data cleaning is one of the hardest steps in machine learning process.
Step 4: Exploratory Data Analysis(Machine learning):
Data Exploration involves understanding the patterns and trends in the data. At this stage all the useful insights are drawn and any correlations between the various variables are understood.
For example we have to predict whether rainfall or not? We know that there is a strong possibility of rain if the temperature has fallen down. We know that our output will depend on variables such as temperature, humidity, and so on. At what level it depends on these variables, we have to find. We have to find pattern, correlation between such variables. Such patterns and trends have to be understood and mapped at this stage.
Step 5: Building a machine learning model:
At this stage a predictive model is built by using a machine learning algorithm. All the insights and the patterns that we derive during the data exploration are used to build machine learning model. This stage always begins by splitting the data set into 2 parts. Those are training data and testing data.
When we building a data it always use for training data. We always make use of training data in order to build the model. In training data and testing data we are feed the input data with the machine.
The difference is that we are splitting the data set into two. We are randomly picking 80% of the data assigning for training purpose and 20% of the data for testing purpose. Here we should remember that training data is more than the testing data. Because you need to train our machine and more data we feed the machine during training phase, the better will be during the testing phase. Obviously, we will predict better outcome when we feed more for training data.
So the model is using the machine learning algorithm that predicts the output by using data fed to it.
Step 6: Model evaluation & Optimization:
After we done building a model by using the training data set, it is finally time to put the model road test. The testing data set is used to check the efficiency of the model and how accurately it can predict the outcome.
Once the accuracy is calculated, any further improvements in the model can be implemented during this stage. The various methods that can help us to improve the performance of the model. Like we can use parameter tuning and cross validation methods in order to improve the performance of the model. The main thing we remember in this step is that model evaluation is nothing but we are testing how well our model can predict the outcome. Here we will be using testing data set. After testing had done we need to calculate the accuracy. If we didn’t get accurate output we need go for improvement.
Step 7: Predictions:
After model evaluated and improved it, it is finally used to make predictions. Predictions is the last step in the machine learning process.