A lesson in regression(Linear VS Logistic)

Introduction

Linear Regression is a commonly used supervised Machine Learning algorithm that predicts continuous values. Linear Regression assumes that there is a linear relationship present between dependent and independent variables. In simple words, it finds the best fitting line/plane that describes two or more variables.

Steps of Linear Regression

As the name suggested, the idea behind performing Linear Regression is that we should come up with a linear equation that describes the relationship between dependent and independent variables.

Step 1

Let’s assume that we have a dataset where x is the independent variable and Y is a function of x (Y=f(x)). Thus, by using Linear Regression we can form the following equation (equation for the best-fitted line):

Y = mx + c

This is an equation of a straight line where m is the slope of the line and c is the intercept.

Step 2

Now, to derive the best-fitted line, first, we assign random values to m and c and calculate the corresponding value of Y for a given x. This Y value is the output value.

Step 3

As Logistic Regression is a supervised Machine Learning algorithm, we already know the value of actual Y (dependent variable). Now, as we have our calculated output value (let’s represent it as ŷ), we can verify whether our prediction is accurate or not.

L = 1/n ∑((y — ŷ)^2)

Where n is the number of observations.

Step 4

To achieve the best-fitted line, we have to minimize the value of the loss function.

Gradient Descent

If we look at the formula for the loss function, it’s the ‘mean square error’ means the error is represented in second-order terms.

Fig 1: Gradient Descent
Fig 1: Gradient Descent

Step 5

Once the loss function is minimized, we get the final equation for the best-fitted line and we can predict the value of Y for any given X.

Logistic Regression

As I said earlier, fundamentally, Logistic Regression is used to classify elements of a set into two groups (binary classification) by calculating the probability of each element of the set.

Steps of Logistic Regression

In logistic regression, we decide a probability threshold. If the probability of a particular element is higher than the probability threshold then we classify that element in one group or vice versa.

Step 1

To calculate the binary separation, first, we determine the best-fitted line by following the Linear Regression steps.

Step 2

The regression line we get from Linear Regression is highly susceptible to outliers. Thus it will not do a good job in classifying two classes.

Step 3

Finally, the output value of the sigmoid function gets converted into 0 or 1(discreet values) based on the threshold value. We usually set the threshold value as 0.5. In this way, we get the binary classification.

Example

Let us consider a problem where we are given a dataset containing Height and Weight for a group of people. Our task is to predict the Weight for new entries in the Height column.

The Similarities between Linear Regression and Logistic Regression

  • Linear Regression and Logistic Regression both are supervised Machine Learning algorithms.
  • Linear Regression and Logistic Regression, both the models are parametric regression i.e. both the models use linear equations for predictions

The Differences between Linear Regression and Logistic Regression

  • Linear Regression is used to handle regression problems whereas Logistic regression is used to handle the classification problems.
  • Linear regression provides a continuous output but Logistic regression provides discreet output.
  • The purpose of Linear Regression is to find the best-fitted line while Logistic regression is one step ahead and fitting the line values to the sigmoid curve.
  • The method for calculating loss function in linear regression is the mean squared error whereas for logistic regression it is maximum likelihood estimation.

Data Science Student