Regression
There are many flavours of regression, each with their own assumptions, loss functions and strengths. This directory contains depth studies of each of the following flavours, including derivations and approximate / closed-form implementations.
"By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and in effect increases the mental power of the race."—Alfred North Whitehead
"Mathematics is the art of giving the same name to different things."—Henri Poincaré
Least Squares: \[\hat{\beta} = \arg \min_{\beta} \frac{1}{n}\|y-X \beta\|^2_2\] LASSO: \[\hat{\beta}_{\text{LASSO}} = \arg \min_{\beta} \frac{1}{n}\|y-X \beta\|^2_2 + \lambda \| \beta \|_1\] Ridge: \[\hat{\beta}_{\text{Ridge}} = \arg \min_{\beta} \frac{1}{n}\|y-X \beta\|^2_2 + \alpha \| \beta \|_2^2\] Logistic: \[\hat{\beta}_{\text{Logistic}} = \arg \min_{\beta} \frac{1}{n} \sum_{i=1}^n \left[ -y_i \log(p_i) - (1 - y_i) \log(1 - p_i) \right], \text{ where } p_i = \sigma(X_i \beta) \text{ and } \sigma(z) = \frac{1}{1 + e^{-z}}\]
A cursory implementation of these first 3 models on a contrived regression gives the following results:
The Contrived Example
Imports
from sklearn.linear_model import LinearRegression, Lasso, Ridge, LassoCV
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from torch.nn.functional import mse_loss
Orchestration and Prediction
rs = np.random.RandomState(123)
n = 1000 # samples
p = 10 # features
noise = 0.4
features = p // 2
X, y = make_regression(n, p, noise=noise, n_informative=features, random_state=rs)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=rs)
# fitting these models to the training data:
m_linear_reg = LinearRegression().fit(X_train, y_train)
m_lasso_reg = Lasso().fit(X_train, y_train)
m_ridge_reg = Ridge().fit(X_train, y_train)
# predictions on training:
ypred_train_linear_reg = m_linear_reg.predict(X_train)
ypred_train_lasso_reg = m_lasso_reg.predict(X_train)
ypred_train_ridge_reg = m_ridge_reg.predict(X_train)
# then test data:
ypred_test_linear_reg = m_linear_reg.predict(X_test)
ypred_test_lasso_reg = m_lasso_reg.predict(X_test)
ypred_test_ridge_reg = m_ridge_reg.predict(X_test)
Tabulation
loss_data = [
['linear regression',
mean_squared_error(y_train, ypred_train_linear_reg),
mean_squared_error(y_test, ypred_test_linear_reg),
mean_absolute_error(y_train, ypred_train_linear_reg),
mean_absolute_error(y_test, ypred_test_linear_reg)
],
['lasso regression',
mean_squared_error(y_train, ypred_train_lasso_reg),
mean_squared_error(y_test, ypred_test_lasso_reg),
mean_absolute_error(y_train, ypred_train_lasso_reg),
mean_squared_error(y_test, ypred_test_lasso_reg)
],
['ridge regression',
mean_squared_error(y_train, ypred_train_ridge_reg),
mean_squared_error(y_test, ypred_test_ridge_reg),
mean_absolute_error(y_train, ypred_train_ridge_reg),
mean_absolute_error(y_test, ypred_test_ridge_reg)
]
]
loss_df = pd.DataFrame(loss_data, columns=['name', 'train mse', 'test mse', 'train mae', 'test mae'])
print(loss_df)
Results
name train mse test mse train mae test mae 0 linear regression 0.151018 0.176519 0.312670 0.334087 1 lasso regression 4.907329 4.128131 1.795599 4.128131 2 ridge regression 0.189097 0.193945 0.353909 0.351923
Analysis
Lasso Regression
performs poorly. this is because it depends on a parameter \(\lambda\) which was improperly set
Refactoring
from sklearn.linear_model import LassoCV
m_lassoCV = LassoCV().fit(X_train, y_train) # you must never FIT the model to test
ypred_train_lassoCV = m_lassoCV.predict(X_train)
ypred_test_lassoCV = m_lassoCV.predict(X_test) # it is fine to predict though
lassoCV_row = {'name': 'lasso cv',
'train mse': mean_squared_error(y_train, ypred_train_lassoCV),
'test mse': mean_squared_error(y_test, ypred_test_lassoCV),
'train mae': mean_absolute_error(y_train, ypred_train_lassoCV),
'test mae': mean_absolute_error(y_test, ypred_test_lassoCV)}
loss_df.loc[len(loss_df)] = lassoCV_row
print(loss_df)
New Results
name train mse test mse train mae test mae 0 linear regression 0.151018 0.176519 0.312670 0.334087 1 lasso regression 4.907329 4.128131 1.795599 4.128131 2 ridge regression 0.189097 0.193945 0.353909 0.351923 3 lasso cv 0.203116 0.208579 0.365991 0.362380
Conclusion
Immediately we encounter a machine learning paradigm: Cross Validation to optimise a hyper-parameter \(\lambda\)1. This will become a recurring theme throughout this site, and will even be very emotionally discussed in my frustrations of determining professionally prohibitive hyper-parameters, c.f. the kits19 challenge. Furthermore, the below posts go deeper into each loss function, providing justifications for their being. I also delve deeper into the idea of not always being able to determine a closed-form solution to the regression problem, and thus learning by gradient descent.
Directory
Footnotes
a more sophisticated and manual analysis of linear regression hyperparameters can be found on my attempt to model the WHO life expectancy dataset as a regression problem.