Non-Parametric

Decision Trees

Entropy and Information Gain

Definition (Entropy)

The entropy of a dataset \(S\) with classes \(C\) is:

\[H(S) = -\sum_{c \in C} p_c \log_2(p_c)\]

where \(p_c\) is the proportion of examples belonging to class \(c\). Entropy is maximised when classes are equally distributed and zero when all examples belong to a single class.

Definition (Information Gain)

The information gain of splitting dataset \(S\) on attribute \(A\) is:

\[\text{IG}(S, A) = H(S) - \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} H(S_v)\]

Read more >

MNIST and FMNIST using KNN

old mate yann lecunn decided to remove the mnist zip from his site along with the corresponding file info

Read more >