For predictive analytics, Excel? Really Really?
Usually, that is the first reaction I get when I bring the subject up.
When I show how we can explore Excel's versatile nature to create predictive models for our data science and analytics ventures, this is accompanied by an incredulous look.
If the stores around you started gathering consumer data, should they follow a data-based approach to sell their goods?
Let me ask you a question. Can their revenue/sales be predicted or the number of goods estimated?
Now you have to wonder how they are going to construct a complex mathematical model in the world that can predict these things?
And it may be beyond their reach to study analytics or recruit an analyst. The good news is here-they don't need to.
Without having to write complicated code that flies over most people's heads, Microsoft Excel gives us the opportunity to conjure predictive models.
In MS Excel, we can easily construct a simple model such as linear regression that can help us perform analysis in a few simplesteps.
What is Linear Regression?
It is a linear approach to statistically model the relationship between the dependent variable (the variable you want to predict) and the independent variables (the factors used for predicting).Linear regression gives us an equation like this:
Y=M1*X1+M2*X2+.............+MnXn+C
Here, we have Y as our dependent variable, X’s are the independent variables and all M’s are the coefficients. Coefficients are basically the weights assigned to the features, based on their importance and C is the constant which is basically the intercept.
Getting the All-Important Analysis ToolPack Excel Add-in
To perform a regression analysis in Excel, we first need to enable Excel’sAnalysis ToolPak Add-in.The Analysis ToolPak in Excel is an add-in program that provides data analysis tools for statistical and engineering analysis.
To add it in your workbook, follow these steps.
Step 1 – Excel Options
Go toFiles->Options:
Step 2 – Locate Analytics ToolPak
Go toAdd-inson the left panel ->Manage Excel Add-ins->Go:
Step 3 – Add Analytics ToolPak
Select the “Analysis ToolPak” and pressOK:
You have successfully added the Analysis ToolPak in Excel! You can check it by going to theDatabar in the Ribbon.
Let’s start building our predictive model in Excel!
Implementing Linear Regression in Excel
Here is the problem statement we will be working with:
"The company Apple wants to predict the price of I-Pad by considering the following factors Screen(type), Storage capacity, Connectivity(type) and Gen"
->Encode the data in order to perform the regression analysis: Assigning numeric value to categorical data set.
Step 1 – Select Regression
Go toData->Data Analysis:
Go to Data Analysis in the Data ToolPak, select Regression and pressOK:
Step 2 – Select Options
In this step, we will select some of the options necessary for our analysis, such as :
- Input y range – The range of independent factor
- Input x range – The range of dependent factors
- Output range – The range of cells where you want to display the results
Analyzing our Predictive Model’s Results in Excel
Theeasypartwasapplyingthelinearregressionmodel.Nowcomesthetrickypartofour study,interpretingtheoutcomesofthepredictivemodelinExcel.
In the summary, we have 3 types of output and we will cover them one-by-one:
- Regression statistics table
- Regression coefficients table
- Residual Table
Regression Statistics Table
The regression statistics table tells us how well the line of best fit defines the linear relationship between the independent and dependent variables. Two of the most important measures are the R squared and Adjusted R squared values.
TheR-squaredstatistic is the indicator ofgoodness of fitwhich tells us how much variance is explained by the line of best fit. R-squared value ranges from 0 to 1. In our case, we have the R-squared value of 0.93 which means that our line is able to explain 93% of the variance - a good sign.
But there is a problem - as we keep adding more variables, our R squared value will keep increasing even though the variable might not be having any effect.Adjusted R-squaredsolves this problem and is a much more reliable metric.
Regression Coefficient Table
The Coefficient table breaks down the components of the regression line in the form of coefficients.
Residual Table
The residual table reflects how much the predicted value varies from the actual value. It consists of the values predicted by our model:
Making Predictions in Excel!
--> The RMSE is 8% that indicates the variability in our predicted model is 8% when compared with the original model.
--> Our model has predicted the price range as per the specifications that was given as input. The detailed model will be reflected in the video attached.
That’s the power of linear regression done simply in Microsoft Excel.
#BusinessToys #DataScience #DataAnalytics #ExcelAnalytics #PassionForData #DataScientist