Linear Regression
This is a page for closed-form and approximation methods to the Linear Regression problem.
The derivation will take the form of assuming a normal distribution on the conditional expectation \[\mathcal{E}[Y|X=x]\].
Closed form of OLE
Taking the derivative of $$\hat{\beta} = \arg \min_{\beta} \frac{1}{n}\|y-X \beta\|^2_2$$ and equating to 0: $$\begin{align*}\hat{\beta} &= \arg \min_{\beta} \frac{1}{n}\|y-X \beta\|^2_2\\ &= (X^TX)^{-1}X^T y\end{align*} $$
Example Pred
import numpy as np
# synthetic data for the rest of the linear models:
np.random.seed(5)
n = 100 # samples
p = 5 # features
sigma = 0.2 # std
X = np.random.normal(0, 1, size=(n,p))
beta_true = np.random.randint(-4, 2, p)
noise = np.random.normal(0, sigma, size=(n))
y = X @ beta_true + noise
betahat = np.linalg.inv(X.T @ X) @ X.T @ y
print("betahat: ", betahat)
print("beta true:", beta_true)
betahat: [-2.94946726 0.01589149 -2.004408 -3.97428268 -3.99637663] beta true: [-3 0 -2 -4 -4]
Iterative Approach
An idea that permeates throughout all of Machine Learning is that
sometimes, you cannot explicitly solve the `argmin` formulation of the parameters in closed form
and other times, it may be possible, but just computationally infeasible
Gradient Descent
Stochastic Gradient Descent:
$$\hat{\beta}^{(k+1)} = \hat{\beta}^{(k)} - \eta\nabla_\beta L (\hat{\beta}^{(k)})$$
in which we iterate only over 1 sample at a time, and,
Batch Gradient Descent:
$$TODO$$
in which we iterate over the entire dataset.
Note, in practice you will choose mini-batch Gradient Descent, which is a mediation of both these approaches: $$TODO$$