\item \points{15} {\bf Convexity of Generalized Linear Models}

In this question we will explore and show some nice properties of Generalized
Linear Models, specifically those related to its use of Exponential Family
distributions to model the output.

Most commonly, GLMs are trained by using the negative log-likelihood (NLL) as
the loss function. This is mathematically equivalent to Maximum Likelihood
Estimation (\emph{i.e.,} maximizing the log-likelihood is equivalent to
minimizing the negative log-likelihood). In this problem, our goal is to show
that the NLL loss of a GLM is a \textit{convex} function w.r.t the model parameters. As
a reminder, this is convenient because a convex function is one for which any
local minimum is also a global minimum, and there is extensive research on how
to optimize various types of convex functions efficiently with various algorithms
such as gradient descent or stochastic gradient descent. 

To recap, an exponential family distribution is one whose probability density
can be represented
%
\begin{equation*}
    p(y; \eta) = b(y)\exp(\eta^TT(y) - a(\eta)),
\end{equation*}
%
where $\eta$ is the \emph{natural parameter} of the distribution. Moreover, in
a Generalized Linear Model, $\eta$ is modeled as $\theta^Tx$, where $x \in
\mathbb{R}^\di$ are the input features of the example, and $\theta \in
\mathbb{R}^\di$ are learnable parameters. In order to show that the NLL loss is
convex for GLMs, we break down the process into sub-parts, and approach them
one at a time. Our approach is to show that the second derivative (\emph{i.e.,}
Hessian) of the loss w.r.t the model parameters is Positive Semi-Definite (PSD)
at all values of the model parameters. We will also show some nice properties
of Exponential Family distributions as intermediate steps.

For the sake of convenience we restrict ourselves to the case where $\eta$ is
a scalar. Assume $p(Y|X;\theta )\sim \text{ExponentialFamily}(\eta)$, where
$\eta \in\mathbb{R}$ is a scalar, and $T(y) = y$. This makes the exponential
family representation take the form
%
\begin{equation*}
    p(y ; \eta) = b(y)\exp(\eta y - a(\eta)).
\end{equation*}
%
\begin{enumerate}
    \input{glmconvexity/01-mean}

\ifnum\solutions=1{
  \input{glmconvexity/01-mean-sol}
}\fi

    \input{glmconvexity/02-var}

\ifnum\solutions=1{
  \input{glmconvexity/02-var-sol}
}\fi


    \input{glmconvexity/03-hessian}

\ifnum\solutions=1{
	\input{glmconvexity/03-hessian-sol}
}\fi


\end{enumerate}

\textbf{Remark:} The main takeaways from this problem are:
\begin{itemize}
  \item Any GLM model is convex in its model parameters.
  \item The exponential family of probability distributions are mathematically
  nice. Whereas calculating mean and variance of distributions in general
  involves integrals (hard), surprisingly we can calculate them using
  derivatives (easy) for exponential family.
\end{itemize}