Linear Regression

Sandeep Venkat
3 min readJun 11, 2021

Linear Regression is one of the most commonly used predictive modeling techniques. It is a technique to model the relationship between a dependent variable and an independent variable using a straight line. Linear regression aims to determine the slope m and intercept b that defines the line
y = mx + b and minimize regression errors by drawing a line that is closest to the data as shown in Fig.1

Fig.1: Linear Regression

Finding Best-fit Line:

Let us understand Linear Regression by taking a 2-D sample with x and y variables, and plot the data points. The point (3,3.6) represents the mean of x and y respectively.

Fig. 2

Now our goal is to find the line that is near to all the data points (best-fit line). Although there are multiple techniques to find the line of best fit, we use the Least Square method. Firstly we need to find m (slope) and b (y-intercept) that suits that data

Fig. 3

Substituting the corresponding values calculated in Fig.3 to the formulas, we get the slope m = 3/10 = 0.3, and then compute the y-coordinate b as
b= 3.6-(0.3)*3 = 3.6 - (0.9) = 2.7. Finally we now get the line of best fit as y = 0.3 x+ 2.7 as shown in Fig.4.

Fig. 4

Calculating the Loss Function:

So, for the given x in {1,2,3,4,5}, the predicted values of y (points in red) will lie on the regression line as shown in Fig. 5.

Fig.5

Mean Square Error (MSE) is the most commonly used regression loss function. The MSE, in a nutshell, tells you how close the set of points is to the

regression line. Because it provides more weight to higher values, it is the favored estimator. This estimator is also popular since it calculates the “average of the set of errors”
This implies that the “smaller the MSE of a model or regression line, the better it performs”. For our example, the Mean Square Error (MSE) is given as
MSE = 1/5[(0.36+0.16+2.56+1.96+0.16)] = 5.2/5 = 1.04.

--

--