Logistic regression classification technique

Logistic regression is a method in which we analyze the input variables that result in the binary classification of the output variables. Even though the name suggests regression, it is a popular method to solve classification problems, for example, to detect whether an email is spam or not, or whether a transaction is a fraudulent or not. The goal of logistic regression is to find a best-fitting model that defines the class of the output variable as 0 (negative class) or 1 (positive class). As a specialized case of linear regression, logistic regression generates the coefficients of a formula to predict probability of occurrence of the dependent variable. Based on the probability, the parameters that maximize the probability of occurrence or nonoccurrence of a dependent event are selected. The probability of an event is bound between 0 and 1. However, the linear regression model cannot guarantee the probability range of 0 to 1.

The following diagram shows the difference between the linear regression and logistic regression models:

Figure 3.10 Difference between linear and logistic Regression models

There are two conditions we need to meet with regards to the probability of the intended binary outcome of the independent variable:

  • It should be positive (p >= 0): We can use an exponential function in order to ensure positivity:

  • It should be less than 1 (p <=1): We can pide the probability exponential term with the same value, + 1, in order to ensure that the outcome probability is less than: