Demystifying Logistic Regression with Code Examples


Logistic Regression is a fundamental algorithm in the realm of machine learning and statistics. Despite its name, it’s not used for regression but rather for classification tasks. In this blog post, we will take a deep dive into the Logistic Regression algorithm, explore how it works, discuss its various applications, and provide ten code examples to demonstrate its versatility.

Demystifying Logistic Regression with Code Examples
Demystifying Logistic Regression with Code Examples

Understanding Logistic Regression

Logistic Regression, often abbreviated as Logit Regression, is a statistical method used for binary and multiclass classification. Its primary purpose is to predict the probability of an observation belonging to a particular class or category. Although it might sound complex, the concept behind Logistic Regression is quite intuitive.

The Core Idea

At its core, It operates on the premise of a linear relationship between the input features and the log-odds of an event happening. The log-odds, also known as the logit, is the logarithm of the odds, which is the probability of the event occurring divided by the probability of it not occurring.

Here’s the mathematical representation:

logit(p) = ln(p / (1 - p))

In this equation:

  • logit(p) is the log-odds of the event.
  • p is the probability of the event.
  • 1 - p is the probability of the event not occurring.

Logistic Regression models this relationship by fitting a linear equation to the observed data. It uses the logistic function (also called the sigmoid function) to transform the linear combination of input features into a probability score.

The Logistic (Sigmoid) Function

P(y=1) = 1 / (1 + e^(-z))

In this equation:

  • P(y=1) is the probability of the positive class (i.e., the event happening).
  • e is the base of the natural logarithm (approximately equal to 2.71828).
  • z is the linear combination of input features and model coefficients (weights).

The logistic function maps any real-valued number z to a value between 0 and 1, which can be interpreted as a probability.

Making Predictions

The model calculates the probability of an observation belonging to the positive class and then applies a threshold (usually 0.5) to classify it.

  • If P(y=1) is greater than or equal to 0.5, the observation is classified as the positive class.
  • If P(y=1) is less than 0.5, the observation is classified as the negative class.

How Logistic Regression Works

  1. Linear Combination: It starts with a linear combination of the input features, weighted by coefficients.
z = b0 + b1*x1 + b2*x2 + ... + bn*xn
  1. Where z is the linear combination, b0 is the intercept, b1 to bn are the coefficients, and x1 to xn are the input features.
  2. Logistic Transformation: The linear combination is then passed through the logistic function (also known as the sigmoid function) to produce the predicted probability.
p(y=1) = 1 / (1 + e^(-z))
  1. Where p(y=1) is the probability of the positive class, and e is the base of the natural logarithm.
  2. Thresholding: Finally, a threshold (usually 0.5) is applied to classify the observation into one of the two classes.

Key Concepts of Logistic Regression

Before delving into code examples, let’s grasp some key concepts related to Logistic Regression:

  1. Binary and Multiclass Classification: It is mainly used for binary classification, where there are two classes (e.g., spam or not spam).
  2. Log-Likelihood: The model’s parameters (coefficients) are estimated by maximizing the log-likelihood of the observed data given the model.
  3. Regularization: To prevent overfitting, It can incorporate regularization techniques like L1 (Lasso) and L2 (Ridge) regularization.
  4. Interpretability: It provides interpretable results. The coefficients associated with each input feature indicate their contribution to the prediction, allowing for meaningful insights.

Code Examples

Let’s explore ten code examples to illustrate Logistic Regression’s use in various scenarios:

Importing Libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

Loading Data

data = pd.read_csv("your_dataset.csv")
X = data.drop("target_column", axis=1)
y = data["target_column"]

Splitting Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Creating a Logistic Regression Model

model = LogisticRegression()

Training the Model

model.fit(X_train, y_train)

Making Predictions

y_pred = model.predict(X_test)

Model Evaluation

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

Regularized Logistic Regression

from sklearn.linear_model import LogisticRegressionCV

# Create a logistic regression model with L1 regularization
model = LogisticRegressionCV(cv=5, penalty='l1', solver='liblinear')

Multiclass Classification

from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

model = LogisticRegression(multi_class='multinomial', solver='lbfgs')

Visualizing Decision Boundary

import matplotlib.pyplot as plt

# Assuming a binary classification problem with two features X1 and X2
X1, X2 = np.meshgrid(np.linspace(X1_min, X1_max, 100), np.linspace(X2_min, X2_max, 100))
Z = model.predict(np.c_[X1.ravel(), X2.ravel()]).reshape(X1.shape)

plt.contourf(X1, X2, Z, cmap=plt.cm.RdBu, alpha=0.6)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdBu_r)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Conclusion

Logistic Regression is a powerful algorithm for binary and multiclass classification tasks. It’s interpretable, easy to implement, and serves as a foundational building block in the world of machine learning. By understanding its inner workings and experimenting with different scenarios, you can leverage Logistic Regression effectively for a wide range of applications, from spam detection to medical diagnosis and beyond.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top