{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Logistic Regression\n", "\n", "In this exercise, you will implement logistic regression and apply it to two different datasets. \n", "\n", "\n", "# Outline\n", "- [ 1 - Packages ](#1)\n", "- [ 2 - Logistic Regression](#2)\n", " - [ 2.1 Problem Statement](#2.1)\n", " - [ 2.2 Loading and visualizing the data](#2.2)\n", " - [ 2.3 Sigmoid function](#2.3)\n", " - [ 2.4 Cost function for logistic regression](#2.4)\n", " - [ 2.5 Gradient for logistic regression](#2.5)\n", " - [ 2.6 Learning parameters using gradient descent ](#2.6)\n", " - [ 2.7 Plotting the decision boundary](#2.7)\n", " - [ 2.8 Evaluating logistic regression](#2.8)\n", "- [ 3 - Regularized Logistic Regression](#3)\n", " - [ 3.1 Problem Statement](#3.1)\n", " - [ 3.2 Loading and visualizing the data](#3.2)\n", " - [ 3.3 Feature mapping](#3.3)\n", " - [ 3.4 Cost function for regularized logistic regression](#3.4)\n", " - [ 3.5 Gradient for regularized logistic regression](#3.5)\n", " - [ 3.6 Learning parameters using gradient descent](#3.6)\n", " - [ 3.7 Plotting the decision boundary](#3.7)\n", " - [ 3.8 Evaluating regularized logistic regression model](#3.8)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_**NOTE:** To prevent errors from the autograder, you are not allowed to edit or delete non-graded cells in this lab. Please also refrain from adding any new cells. \n", "**Once you have passed this assignment** and want to experiment with any of the non-graded code, you may follow the instructions at the bottom of this notebook._" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 1 - Packages \n", "\n", "First, let's run the cell below to import all the packages that you will need during this assignment.\n", "- [numpy](www.numpy.org) is the fundamental package for scientific computing with Python.\n", "- [matplotlib](http://matplotlib.org) is a famous library to plot graphs in Python.\n", "- ``utils.py`` contains helper functions for this assignment. You do not need to modify code in this file." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from utils import *\n", "import copy\n", "import math\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 2 - Logistic Regression\n", "\n", "In this part of the exercise, you will build a logistic regression model to predict whether a student gets admitted into a university.\n", "\n", "\n", "### 2.1 Problem Statement\n", "\n", "Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams. \n", "* You have historical data from previous applicants that you can use as a training set for logistic regression. \n", "* For each training example, you have the applicant’s scores on two exams and the admissions decision. \n", "* Your task is to build a classification model that estimates an applicant’s probability of admission based on the scores from those two exams. \n", "\n", "\n", "### 2.2 Loading and visualizing the data\n", "\n", "You will start by loading the dataset for this task. \n", "- The `load_dataset()` function shown below loads the data into variables `X_train` and `y_train`\n", " - `X_train` contains exam scores on two exams for a student\n", " - `y_train` is the admission decision \n", " - `y_train = 1` if the student was admitted \n", " - `y_train = 0` if the student was not admitted \n", " - Both `X_train` and `y_train` are numpy arrays.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# load dataset\n", "X_train, y_train = load_data(\"data/ex2data1.txt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### View the variables\n", "Let's get more familiar with your dataset. \n", "- A good place to start is to just print out each variable and see what it contains.\n", "\n", "The code below prints the first five values of `X_train` and the type of the variable." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First five elements in X_train are:\n", " [[34.62365962 78.02469282]\n", " [30.28671077 43.89499752]\n", " [35.84740877 72.90219803]\n", " [60.18259939 86.3085521 ]\n", " [79.03273605 75.34437644]]\n", "Type of X_train: \n" ] } ], "source": [ "print(\"First five elements in X_train are:\\n\", X_train[:5])\n", "print(\"Type of X_train:\",type(X_train))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now print the first five values of `y_train`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First five elements in y_train are:\n", " [0. 0. 0. 1. 1.]\n", "Type of y_train: \n" ] } ], "source": [ "print(\"First five elements in y_train are:\\n\", y_train[:5])\n", "print(\"Type of y_train:\",type(y_train))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Check the dimensions of your variables\n", "\n", "Another useful way to get familiar with your data is to view its dimensions. Let's print the shape of `X_train` and `y_train` and see how many training examples we have in our dataset." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The shape of X_train is: (100, 2)\n", "The shape of y_train is: (100,)\n", "We have m = 100 training examples\n" ] } ], "source": [ "print ('The shape of X_train is: ' + str(X_train.shape))\n", "print ('The shape of y_train is: ' + str(y_train.shape))\n", "print ('We have m = %d training examples' % (len(y_train)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Visualize your data\n", "\n", "Before starting to implement any learning algorithm, it is always good to visualize the data if possible.\n", "- The code below displays the data on a 2D plot (as shown below), where the axes are the two exam scores, and the positive and negative examples are shown with different markers.\n", "- We use a helper function in the ``utils.py`` file to generate this plot. \n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot examples\n", "plot_data(X_train, y_train[:], pos_label=\"Admitted\", neg_label=\"Not admitted\")\n", "\n", "# Set the y-axis label\n", "plt.ylabel('Exam 2 score') \n", "# Set the x-axis label\n", "plt.xlabel('Exam 1 score') \n", "plt.legend(loc=\"upper right\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Your goal is to build a logistic regression model to fit this data.\n", "- With this model, you can then predict if a new student will be admitted based on their scores on the two exams." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 2.3 Sigmoid function\n", "\n", "Recall that for logistic regression, the model is represented as\n", "\n", "$$ f_{\\mathbf{w},b}(x) = g(\\mathbf{w}\\cdot \\mathbf{x} + b)$$\n", "where function $g$ is the sigmoid function. The sigmoid function is defined as:\n", "\n", "$$g(z) = \\frac{1}{1+e^{-z}}$$\n", "\n", "Let's implement the sigmoid function first, so it can be used by the rest of this assignment.\n", "\n", "\n", "### Exercise 1\n", "Please complete the `sigmoid` function to calculate\n", "\n", "$$g(z) = \\frac{1}{1+e^{-z}}$$\n", "\n", "Note that \n", "- `z` is not always a single number, but can also be an array of numbers. \n", "- If the input is an array of numbers, we'd like to apply the sigmoid function to each value in the input array.\n", "\n", "If you get stuck, you can check out the hints presented after the cell below to help you with the implementation." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# UNQ_C1\n", "# GRADED FUNCTION: sigmoid\n", "\n", "def sigmoid(z):\n", " \"\"\"\n", " Compute the sigmoid of z\n", "\n", " Args:\n", " z (ndarray): A scalar, numpy array of any size.\n", "\n", " Returns:\n", " g (ndarray): sigmoid(z), with the same shape as z\n", " \n", " \"\"\"\n", " \n", " ### START CODE HERE ### \n", " \n", " ### END SOLUTION ### \n", " \n", " return 1/(1+np.exp(-z))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Click for hints\n", " \n", " * `numpy` has a function called [`np.exp()`](https://numpy.org/doc/stable/reference/generated/numpy.exp.html), which offers a convinient way to calculate the exponential ( $e^{z}$) of all elements in the input array (`z`).\n", " \n", "
\n", " Click for more hints\n", " \n", " - You can translate $e^{-z}$ into code as `np.exp(-z)` \n", " \n", " - You can translate $1/e^{-z}$ into code as `1/np.exp(-z)` \n", " \n", " If you're still stuck, you can check the hints presented below to figure out how to calculate `g` \n", " \n", "
\n", " Hint to calculate g\n", " g = 1 / (1 + np.exp(-z))\n", "
\n", "\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you are finished, try testing a few values by calling `sigmoid(x)` in the cell below. \n", "- For large positive values of x, the sigmoid should be close to 1, while for large negative values, the sigmoid should be close to 0. \n", "- Evaluating `sigmoid(0)` should give you exactly 0.5. \n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "deletable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sigmoid(0) = 0.5\n" ] } ], "source": [ "# Note: You can edit this value\n", "value = 0\n", "\n", "print (f\"sigmoid({value}) = {sigmoid(value)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected Output**:\n", "\n", " \n", " \n", " \n", " \n", "
sigmoid(0) 0.5
\n", " \n", "- As mentioned before, your code should also work with vectors and matrices. For a matrix, your function should perform the sigmoid function on every element." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sigmoid([ -1, 0, 1, 2]) = [0.26894142 0.5 0.73105858 0.88079708]\n", "\u001b[92mAll tests passed!\n" ] } ], "source": [ "print (\"sigmoid([ -1, 0, 1, 2]) = \" + str(sigmoid(np.array([-1, 0, 1, 2]))))\n", "\n", "# UNIT TESTS\n", "from public_tests import *\n", "sigmoid_test(sigmoid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected Output**:\n", "\n", " \n", " \n", " \n", " \n", " \n", "
sigmoid([-1, 0, 1, 2])[0.26894142 0.5 0.73105858 0.88079708]
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 2.4 Cost function for logistic regression\n", "\n", "In this section, you will implement the cost function for logistic regression.\n", "\n", "\n", "### Exercise 2\n", "\n", "Please complete the `compute_cost` function using the equations below.\n", "\n", "Recall that for logistic regression, the cost function is of the form \n", "\n", "$$ J(\\mathbf{w},b) = \\frac{1}{m}\\sum_{i=0}^{m-1} \\left[ loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)}) \\right] \\tag{1}$$\n", "\n", "where\n", "* m is the number of training examples in the dataset\n", "\n", "\n", "* $loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)})$ is the cost for a single data point, which is - \n", "\n", " $$loss(f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}), y^{(i)}) = (-y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) \\tag{2}$$\n", " \n", " \n", "* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$, which is the actual label\n", "\n", "* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = g(\\mathbf{w} \\cdot \\mathbf{x^{(i)}} + b)$ where function $g$ is the sigmoid function.\n", " * It might be helpful to first calculate an intermediate variable $z_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x^{(i)}} + b = w_0x^{(i)}_0 + ... + w_{n-1}x^{(i)}_{n-1} + b$ where $n$ is the number of features, before calculating $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = g(z_{\\mathbf{w},b}(\\mathbf{x}^{(i)}))$\n", "\n", "Note:\n", "* As you are doing this, remember that the variables `X_train` and `y_train` are not scalar values but matrices of shape ($m, n$) and ($𝑚$,1) respectively, where $𝑛$ is the number of features and $𝑚$ is the number of training examples.\n", "* You can use the sigmoid function that you implemented above for this part.\n", "\n", "If you get stuck, you can check out the hints presented after the cell below to help you with the implementation." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# UNQ_C2\n", "# GRADED FUNCTION: compute_cost\n", "def compute_cost(X, y, w, b, *argv):\n", " \"\"\"\n", " Computes the cost over all examples\n", " Args:\n", " X : (ndarray Shape (m,n)) data, m examples by n features\n", " y : (ndarray Shape (m,)) target value \n", " w : (ndarray Shape (n,)) values of parameters of the model \n", " b : (scalar) value of bias parameter of the model\n", " *argv : unused, for compatibility with regularized version below\n", " Returns:\n", " total_cost : (scalar) cost \n", " \"\"\"\n", "\n", " m, n = X.shape\n", " \n", " ### START CODE HERE ###\n", " total_cost = 0\n", " for i in range(m):\n", " z = sigmoid(w@X[i] + b)\n", " total_cost += (-y[i]*np.log(z)) - (1-y[i])*np.log(1-z)\n", " ### END CODE HERE ### \n", "\n", " return total_cost / m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click for hints\n", " \n", "* You can represent a summation operator eg: $h = \\sum\\limits_{i = 0}^{m-1} 2i$ in code as follows:\n", "\n", "```python\n", " h = 0\n", " for i in range(m):\n", " h = h + 2*i\n", "```\n", "
\n", "\n", "* In this case, you can iterate over all the examples in `X` using a for loop and add the `loss` from each iteration to a variable (`loss_sum`) initialized outside the loop.\n", "\n", "* Then, you can return the `total_cost` as `loss_sum` divided by `m`.\n", "\n", "* If you are new to Python, please check that your code is properly indented with consistent spaces or tabs. Otherwise, it might produce a different output or raise an `IndentationError: unexpected indent` error. You can refer to [this topic](https://community.deeplearning.ai/t/indentation-in-python-indentationerror-unexpected-indent/159398) in our community for details.\n", " \n", "
\n", " Click for more hints\n", " \n", "* Here's how you can structure the overall implementation for this function\n", " \n", "```python\n", "def compute_cost(X, y, w, b, *argv):\n", " m, n = X.shape\n", "\n", " ### START CODE HERE ###\n", " loss_sum = 0 \n", " \n", " # Loop over each training example\n", " for i in range(m): \n", " \n", " # First calculate z_wb = w[0]*X[i][0]+...+w[n-1]*X[i][n-1]+b\n", " z_wb = 0 \n", " # Loop over each feature\n", " for j in range(n): \n", " # Add the corresponding term to z_wb\n", " z_wb_ij = # Your code here to calculate w[j] * X[i][j]\n", " z_wb += z_wb_ij # equivalent to z_wb = z_wb + z_wb_ij\n", " # Add the bias term to z_wb\n", " z_wb += b # equivalent to z_wb = z_wb + b\n", " \n", " f_wb = # Your code here to calculate prediction f_wb for a training example\n", " loss = # Your code here to calculate loss for a training example\n", " \n", " loss_sum += loss # equivalent to loss_sum = loss_sum + loss\n", " \n", " total_cost = (1 / m) * loss_sum \n", " ### END CODE HERE ### \n", " \n", " return total_cost\n", "```\n", "
\n", "\n", "If you're still stuck, you can check the hints presented below to figure out how to calculate `z_wb_ij`, `f_wb` and `cost`.\n", "\n", "
\n", "Hint to calculate z_wb_ij\n", "     z_wb_ij = w[j]*X[i][j] \n", "
\n", " \n", "
\n", " Hint to calculate f_wb\n", "     $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = g(z_{\\mathbf{w},b}(\\mathbf{x}^{(i)}))$ where $g$ is the sigmoid function. You can simply call the `sigmoid` function implemented above.\n", "
\n", "     More hints to calculate f\n", "     You can compute f_wb as f_wb = sigmoid(z_wb) \n", "
\n", "
\n", "\n", "
\n", " Hint to calculate loss\n", "     You can use the np.log function to calculate the log\n", "
\n", "     More hints to calculate loss\n", "     You can compute loss as loss = -y[i] * np.log(f_wb) - (1 - y[i]) * np.log(1 - f_wb)\n", "
\n", "
\n", " \n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the cells below to check your implementation of the `compute_cost` function with two different initializations of the parameters $w$ and $b$" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cost at initial w and b (zeros): 0.693\n" ] } ], "source": [ "m, n = X_train.shape\n", "\n", "# Compute and display cost with w and b initialized to zeros\n", "initial_w = np.zeros(n)\n", "initial_b = 0.\n", "cost = compute_cost(X_train, y_train, initial_w, initial_b)\n", "print('Cost at initial w and b (zeros): {:.3f}'.format(cost))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected Output**:\n", "\n", " \n", " \n", " \n", " \n", "
Cost at initial w and b (zeros) 0.693
" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cost at test w and b (non-zeros): 0.218\n", "\u001b[92mAll tests passed!\n" ] } ], "source": [ "# Compute and display cost with non-zero w and b\n", "test_w = np.array([0.2, 0.2])\n", "test_b = -24.\n", "cost = compute_cost(X_train, y_train, test_w, test_b)\n", "\n", "print('Cost at test w and b (non-zeros): {:.3f}'.format(cost))\n", "\n", "\n", "# UNIT TESTS\n", "compute_cost_test(compute_cost)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected Output**:\n", "\n", " \n", " \n", " \n", " \n", "
Cost at test w and b (non-zeros): 0.218
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 2.5 Gradient for logistic regression\n", "\n", "In this section, you will implement the gradient for logistic regression.\n", "\n", "Recall that the gradient descent algorithm is:\n", "\n", "$$\\begin{align*}& \\text{repeat until convergence:} \\; \\lbrace \\newline \\; & b := b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\; & w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1} \\; & \\text{for j := 0..n-1}\\newline & \\rbrace\\end{align*}$$\n", "\n", "where, parameters $b$, $w_j$ are all updated simultaniously" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### Exercise 3\n", "\n", "Please complete the `compute_gradient` function to compute $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w}$, $\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ from equations (2) and (3) below.\n", "\n", "$$\n", "\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)}) \\tag{2}\n", "$$\n", "$$\n", "\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)})x_{j}^{(i)} \\tag{3}\n", "$$\n", "* m is the number of training examples in the dataset\n", "\n", " \n", "* $f_{\\mathbf{w},b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$ is the actual label\n", "\n", "\n", "- **Note**: While this gradient looks identical to the linear regression gradient, the formula is actually different because linear and logistic regression have different definitions of $f_{\\mathbf{w},b}(x)$.\n", "\n", "As before, you can use the sigmoid function that you implemented above and if you get stuck, you can check out the hints presented after the cell below to help you with the implementation." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# UNQ_C3\n", "# GRADED FUNCTION: compute_gradient\n", "def compute_gradient(X, y, w, b, *argv): \n", " \"\"\"\n", " Computes the gradient for logistic regression \n", " \n", " Args:\n", " X : (ndarray Shape (m,n)) data, m examples by n features\n", " y : (ndarray Shape (m,)) target value \n", " w : (ndarray Shape (n,)) values of parameters of the model \n", " b : (scalar) value of bias parameter of the model\n", " *argv : unused, for compatibility with regularized version below\n", " Returns\n", " dj_dw : (ndarray Shape (n,)) The gradient of the cost w.r.t. the parameters w. \n", " dj_db : (scalar) The gradient of the cost w.r.t. the parameter b. \n", " \"\"\"\n", " m, n = X.shape\n", " dj_dw = np.zeros(w.shape)\n", " dj_db = 0.\n", "\n", " ### START CODE HERE ### \n", " for i in range(m):\n", " z_wb = 0\n", " for j in range(n): \n", " z_wb_ij = X[i, j]*w[j]\n", " z_wb += z_wb_ij\n", " z_wb += b\n", " f_wb = sigmoid(z_wb)\n", " \n", " dj_db_i = f_wb - y[i]\n", " dj_db += dj_db_i\n", " \n", " for j in range(n):\n", " dj_dw_ij = (f_wb - y[i]) * X[i][j]\n", " dj_dw[j] += dj_dw_ij\n", " \n", " dj_dw = dj_dw * 1/m\n", " dj_db = dj_db * 1/m\n", " ### END CODE HERE ###\n", "\n", " \n", " return dj_db, dj_dw" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Click for hints\n", " \n", " \n", "* Here's how you can structure the overall implementation for this function\n", " ```python \n", " def compute_gradient(X, y, w, b, *argv): \n", " m, n = X.shape\n", " dj_dw = np.zeros(w.shape)\n", " dj_db = 0.\n", " \n", " ### START CODE HERE ### \n", " for i in range(m):\n", " # Calculate f_wb (exactly as you did in the compute_cost function above)\n", " f_wb = \n", " \n", " # Calculate the gradient for b from this example\n", " dj_db_i = # Your code here to calculate the error\n", " \n", " # add that to dj_db\n", " dj_db += dj_db_i\n", " \n", " # get dj_dw for each attribute\n", " for j in range(n):\n", " # You code here to calculate the gradient from the i-th example for j-th attribute\n", " dj_dw_ij = \n", " dj_dw[j] += dj_dw_ij\n", " \n", " # divide dj_db and dj_dw by total number of examples\n", " dj_dw = dj_dw / m\n", " dj_db = dj_db / m\n", " ### END CODE HERE ###\n", " \n", " return dj_db, dj_dw\n", " ```\n", "\n", " * If you are new to Python, please check that your code is properly indented with consistent spaces or tabs. Otherwise, it might produce a different output or raise an `IndentationError: unexpected indent` error. You can refer to [this topic](https://community.deeplearning.ai/t/indentation-in-python-indentationerror-unexpected-indent/159398) in our community for details.\n", " * If you're still stuck, you can check the hints presented below to figure out how to calculate `f_wb`, `dj_db_i` and `dj_dw_ij` \n", " \n", "
\n", " Hint to calculate f_wb\n", "     Recall that you calculated f_wb in compute_cost above — for detailed hints on how to calculate each intermediate term, check out the hints section below that exercise\n", "
\n", "     More hints to calculate f_wb\n", "     You can calculate f_wb as\n", "
\n",
    "               for i in range(m):   \n",
    "                   # Calculate f_wb (exactly how you did it in the compute_cost function above)\n",
    "                   z_wb = 0\n",
    "                   # Loop over each feature\n",
    "                   for j in range(n): \n",
    "                       # Add the corresponding term to z_wb\n",
    "                       z_wb_ij = X[i, j] * w[j]\n",
    "                       z_wb += z_wb_ij\n",
    "            \n",
    "                   # Add bias term \n",
    "                   z_wb += b\n",
    "        \n",
    "                   # Calculate the prediction from the model\n",
    "                   f_wb = sigmoid(z_wb)\n",
    "    
\n", " \n", "
\n", "
\n", " Hint to calculate dj_db_i\n", "     You can calculate dj_db_i as dj_db_i = f_wb - y[i]\n", "
\n", " \n", "
\n", " Hint to calculate dj_dw_ij\n", "     You can calculate dj_dw_ij as dj_dw_ij = (f_wb - y[i])* X[i][j]\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the cells below to check your implementation of the `compute_gradient` function with two different initializations of the parameters $w$ and $b$" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dj_db at initial w and b (zeros):-0.1\n", "dj_dw at initial w and b (zeros):[-12.00921658929115, -11.262842205513591]\n" ] } ], "source": [ "# Compute and display gradient with w and b initialized to zeros\n", "initial_w = np.zeros(n)\n", "initial_b = 0.\n", "\n", "dj_db, dj_dw = compute_gradient(X_train, y_train, initial_w, initial_b)\n", "print(f'dj_db at initial w and b (zeros):{dj_db}' )\n", "print(f'dj_dw at initial w and b (zeros):{dj_dw.tolist()}' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected Output**:\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dj_db at initial w and b (zeros) -0.1
dj_dw at initial w and b (zeros): [-12.00921658929115, -11.262842205513591]
" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dj_db at test w and b: -0.5999999999991071\n", "dj_dw at test w and b: [-44.831353617873795, -44.37384124953978]\n", "\u001b[92mAll tests passed!\n" ] } ], "source": [ "# Compute and display cost and gradient with non-zero w and b\n", "test_w = np.array([ 0.2, -0.5])\n", "test_b = -24\n", "dj_db, dj_dw = compute_gradient(X_train, y_train, test_w, test_b)\n", "\n", "print('dj_db at test w and b:', dj_db)\n", "print('dj_dw at test w and b:', dj_dw.tolist())\n", "\n", "# UNIT TESTS \n", "compute_gradient_test(compute_gradient)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected Output**:\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dj_db at test w and b (non-zeros) -0.5999999999991071
dj_dw at test w and b (non-zeros): [-44.8313536178737957, -44.37384124953978]
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 2.6 Learning parameters using gradient descent \n", "\n", "Similar to the previous assignment, you will now find the optimal parameters of a logistic regression model by using gradient descent. \n", "- You don't need to implement anything for this part. Simply run the cells below. \n", "\n", "- A good way to verify that gradient descent is working correctly is to look\n", "at the value of $J(\\mathbf{w},b)$ and check that it is decreasing with each step. \n", "\n", "- Assuming you have implemented the gradient and computed the cost correctly, your value of $J(\\mathbf{w},b)$ should never increase, and should converge to a steady value by the end of the algorithm." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters, lambda_): \n", " \"\"\"\n", " Performs batch gradient descent to learn theta. Updates theta by taking \n", " num_iters gradient steps with learning rate alpha\n", " \n", " Args:\n", " X : (ndarray Shape (m, n) data, m examples by n features\n", " y : (ndarray Shape (m,)) target value \n", " w_in : (ndarray Shape (n,)) Initial values of parameters of the model\n", " b_in : (scalar) Initial value of parameter of the model\n", " cost_function : function to compute cost\n", " gradient_function : function to compute gradient\n", " alpha : (float) Learning rate\n", " num_iters : (int) number of iterations to run gradient descent\n", " lambda_ : (scalar, float) regularization constant\n", " \n", " Returns:\n", " w : (ndarray Shape (n,)) Updated values of parameters of the model after\n", " running gradient descent\n", " b : (scalar) Updated value of parameter of the model after\n", " running gradient descent\n", " \"\"\"\n", " \n", " # number of training examples\n", " m = len(X)\n", " \n", " # An array to store cost J and w's at each iteration primarily for graphing later\n", " J_history = []\n", " w_history = []\n", " \n", " for i in range(num_iters):\n", "\n", " # Calculate the gradient and update the parameters\n", " dj_db, dj_dw = gradient_function(X, y, w_in, b_in, lambda_) \n", "\n", " # Update Parameters using w, b, alpha and gradient\n", " w_in = w_in - alpha * dj_dw \n", " b_in = b_in - alpha * dj_db \n", " \n", " # Save cost J at each iteration\n", " if i<100000: # prevent resource exhaustion \n", " cost = cost_function(X, y, w_in, b_in, lambda_)\n", " J_history.append(cost)\n", "\n", " # Print cost every at intervals 10 times or as many iterations if < 10\n", " if i% math.ceil(num_iters/10) == 0 or i == (num_iters-1):\n", " w_history.append(w_in)\n", " print(f\"Iteration {i:4}: Cost {float(J_history[-1]):8.2f} \")\n", " \n", " return w_in, b_in, J_history, w_history #return w and J,w history for graphing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's run the gradient descent algorithm above to learn the parameters for our dataset.\n", "\n", "**Note**\n", "The code block below takes a couple of minutes to run, especially with a non-vectorized version. You can reduce the `iterations` to test your implementation and iterate faster. If you have time later, try running 100,000 iterations for better results." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration 0: Cost 0.96 \n", "Iteration 1000: Cost 0.31 \n", "Iteration 2000: Cost 0.30 \n", "Iteration 3000: Cost 0.30 \n", "Iteration 4000: Cost 0.30 \n", "Iteration 5000: Cost 0.30 \n", "Iteration 6000: Cost 0.30 \n", "Iteration 7000: Cost 0.30 \n", "Iteration 8000: Cost 0.30 \n", "Iteration 9000: Cost 0.30 \n", "Iteration 9999: Cost 0.30 \n" ] } ], "source": [ "np.random.seed(1)\n", "initial_w = 0.01 * (np.random.rand(2) - 0.5)\n", "initial_b = -8\n", "\n", "# Some gradient descent settings\n", "iterations = 10000\n", "alpha = 0.001\n", "\n", "w,b, J_history,_ = gradient_descent(X_train ,y_train, initial_w, initial_b, \n", " compute_cost, compute_gradient, alpha, iterations, 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", " Expected Output: Cost 0.30, (Click to see details):\n", "\n", "\n", " # With the following settings\n", " np.random.seed(1)\n", " initial_w = 0.01 * (np.random.rand(2) - 0.5)\n", " initial_b = -8\n", " iterations = 10000\n", " alpha = 0.001\n", " #\n", "\n", "```\n", "Iteration 0: Cost 0.96 \n", "Iteration 1000: Cost 0.31 \n", "Iteration 2000: Cost 0.30 \n", "Iteration 3000: Cost 0.30 \n", "Iteration 4000: Cost 0.30 \n", "Iteration 5000: Cost 0.30 \n", "Iteration 6000: Cost 0.30 \n", "Iteration 7000: Cost 0.30 \n", "Iteration 8000: Cost 0.30 \n", "Iteration 9000: Cost 0.30 \n", "Iteration 9999: Cost 0.30 \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 2.7 Plotting the decision boundary\n", "\n", "We will now use the final parameters from gradient descent to plot the linear fit. If you implemented the previous parts correctly, you should see a plot similar to the following plot: \n", "\n", "\n", "We will use a helper function in the `utils.py` file to create this plot." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_decision_boundary(w, b, X_train, y_train)\n", "# Set the y-axis label\n", "plt.ylabel('Exam 2 score') \n", "# Set the x-axis label\n", "plt.xlabel('Exam 1 score') \n", "plt.legend(loc=\"upper right\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 2.8 Evaluating logistic regression\n", "\n", "We can evaluate the quality of the parameters we have found by seeing how well the learned model predicts on our training set. \n", "\n", "You will implement the `predict` function below to do this.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Exercise 4\n", "\n", "Please complete the `predict` function to produce `1` or `0` predictions given a dataset and a learned parameter vector $w$ and $b$.\n", "- First you need to compute the prediction from the model $f(x^{(i)}) = g(w \\cdot x^{(i)} + b)$ for every example \n", " - You've implemented this before in the parts above\n", "- We interpret the output of the model ($f(x^{(i)})$) as the probability that $y^{(i)}=1$ given $x^{(i)}$ and parameterized by $w$.\n", "- Therefore, to get a final prediction ($y^{(i)}=0$ or $y^{(i)}=1$) from the logistic regression model, you can use the following heuristic -\n", "\n", " if $f(x^{(i)}) >= 0.5$, predict $y^{(i)}=1$\n", " \n", " if $f(x^{(i)}) < 0.5$, predict $y^{(i)}=0$\n", " \n", "If you get stuck, you can check out the hints presented after the cell below to help you with the implementation." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# UNQ_C4\n", "# GRADED FUNCTION: predict\n", "\n", "def predict(X, w, b): \n", " \"\"\"\n", " Predict whether the label is 0 or 1 using learned logistic\n", " regression parameters w\n", " \n", " Args:\n", " X : (ndarray Shape (m,n)) data, m examples by n features\n", " w : (ndarray Shape (n,)) values of parameters of the model \n", " b : (scalar) value of bias parameter of the model\n", "\n", " Returns:\n", " p : (ndarray (m,)) The predictions for X using a threshold at 0.5\n", " \"\"\"\n", " # number of training examples\n", " m, n = X.shape \n", " p = np.zeros(m)\n", " \n", " ### START CODE HERE ### \n", " # Loop over each example\n", " for i in range(m): \n", " z_wb = 0\n", " # Loop over each feature\n", " for j in range(n): \n", " # Add the corresponding term to z_wb\n", " z_wb_ij = X[i,j]*w[j]\n", " z_wb += z_wb_ij\n", " \n", " # Add bias term \n", " z_wb += b\n", " \n", " # Calculate the prediction for this example\n", " f_wb = sigmoid(z_wb)\n", "\n", " # Apply the threshold\n", " p[i] = 1 if (f_wb >= 0.5) else 0\n", " \n", " ### END CODE HERE ### \n", " return p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Click for hints\n", " \n", " \n", "* Here's how you can structure the overall implementation for this function\n", " ```python \n", " def predict(X, w, b): \n", " # number of training examples\n", " m, n = X.shape \n", " p = np.zeros(m)\n", " \n", " ### START CODE HERE ### \n", " # Loop over each example\n", " for i in range(m): \n", " \n", " # Calculate f_wb (exactly how you did it in the compute_cost function above) \n", " # using a couple of lines of code\n", " f_wb = \n", "\n", " # Calculate the prediction for that training example \n", " p[i] = # Your code here to calculate the prediction based on f_wb\n", " \n", " ### END CODE HERE ### \n", " return p\n", " ```\n", " \n", " If you're still stuck, you can check the hints presented below to figure out how to calculate `f_wb` and `p[i]` \n", " \n", "
\n", " Hint to calculate f_wb\n", "     Recall that you calculated f_wb in compute_cost above — for detailed hints on how to calculate each intermediate term, check out the hints section below that exercise\n", "
\n", "     More hints to calculate f_wb\n", "     You can calculate f_wb as\n", "
\n",
    "               for i in range(m):   \n",
    "                   # Calculate f_wb (exactly how you did it in the compute_cost function above)\n",
    "                   z_wb = 0\n",
    "                   # Loop over each feature\n",
    "                   for j in range(n): \n",
    "                       # Add the corresponding term to z_wb\n",
    "                       z_wb_ij = X[i, j] * w[j]\n",
    "                       z_wb += z_wb_ij\n",
    "            \n",
    "                   # Add bias term \n",
    "                   z_wb += b\n",
    "        \n",
    "                   # Calculate the prediction from the model\n",
    "                   f_wb = sigmoid(z_wb)\n",
    "    
\n", " \n", "
\n", "
\n", " Hint to calculate p[i]\n", "     As an example, if you'd like to say x = 1 if y is less than 3 and 0 otherwise, you can express it in code as x = y < 3 . Now do the same for p[i] = 1 if f_wb >= 0.5 and 0 otherwise. \n", "
\n", "     More hints to calculate p[i]\n", "     You can compute p[i] as p[i] = f_wb >= 0.5\n", "
\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you have completed the function `predict`, let's run the code below to report the training accuracy of your classifier by computing the percentage of examples it got correct." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output of predict: shape (4,), value [0. 1. 1. 1.]\n", "\u001b[92mAll tests passed!\n" ] } ], "source": [ "# Test your predict code\n", "np.random.seed(1)\n", "tmp_w = np.random.randn(2)\n", "tmp_b = 0.3 \n", "tmp_X = np.random.randn(4, 2) - 0.5\n", "\n", "tmp_p = predict(tmp_X, tmp_w, tmp_b)\n", "print(f'Output of predict: shape {tmp_p.shape}, value {tmp_p}')\n", "\n", "# UNIT TESTS \n", "predict_test(predict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected output** \n", "\n", "\n", " \n", " \n", " \n", "
Output of predict: shape (4,),value [0. 1. 1. 1.]
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's use this to compute the accuracy on the training set" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train Accuracy: 92.000000\n" ] } ], "source": [ "#Compute accuracy on our training set\n", "p = predict(X_train, w,b)\n", "print('Train Accuracy: %f'%(np.mean(p == y_train) * 100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", "
Train Accuracy (approx): 92.00
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 3 - Regularized Logistic Regression\n", "\n", "In this part of the exercise, you will implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly. \n", "\n", "\n", "### 3.1 Problem Statement\n", "\n", "Suppose you are the product manager of the factory and you have the test results for some microchips on two different tests. \n", "- From these two tests, you would like to determine whether the microchips should be accepted or rejected. \n", "- To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model.\n", "\n", "\n", "### 3.2 Loading and visualizing the data\n", "\n", "Similar to previous parts of this exercise, let's start by loading the dataset for this task and visualizing it. \n", "\n", "- The `load_dataset()` function shown below loads the data into variables `X_train` and `y_train`\n", " - `X_train` contains the test results for the microchips from two tests\n", " - `y_train` contains the results of the QA \n", " - `y_train = 1` if the microchip was accepted \n", " - `y_train = 0` if the microchip was rejected \n", " - Both `X_train` and `y_train` are numpy arrays." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "# load dataset\n", "X_train, y_train = load_data(\"data/ex2data2.txt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### View the variables\n", "\n", "The code below prints the first five values of `X_train` and `y_train` and the type of the variables.\n" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "X_train: [[ 0.051267 0.69956 ]\n", " [-0.092742 0.68494 ]\n", " [-0.21371 0.69225 ]\n", " [-0.375 0.50219 ]\n", " [-0.51325 0.46564 ]]\n", "Type of X_train: \n", "y_train: [1. 1. 1. 1. 1.]\n", "Type of y_train: \n" ] } ], "source": [ "# print X_train\n", "print(\"X_train:\", X_train[:5])\n", "print(\"Type of X_train:\",type(X_train))\n", "\n", "# print y_train\n", "print(\"y_train:\", y_train[:5])\n", "print(\"Type of y_train:\",type(y_train))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Check the dimensions of your variables\n", "\n", "Another useful way to get familiar with your data is to view its dimensions. Let's print the shape of `X_train` and `y_train` and see how many training examples we have in our dataset." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The shape of X_train is: (118, 2)\n", "The shape of y_train is: (118,)\n", "We have m = 118 training examples\n" ] } ], "source": [ "print ('The shape of X_train is: ' + str(X_train.shape))\n", "print ('The shape of y_train is: ' + str(y_train.shape))\n", "print ('We have m = %d training examples' % (len(y_train)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Visualize your data\n", "\n", "The helper function `plot_data` (from `utils.py`) is used to generate a figure like Figure 3, where the axes are the two test scores, and the positive (y = 1, accepted) and negative (y = 0, rejected) examples are shown with different markers.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot examples\n", "plot_data(X_train, y_train[:], pos_label=\"Accepted\", neg_label=\"Rejected\")\n", "\n", "# Set the y-axis label\n", "plt.ylabel('Microchip Test 2') \n", "# Set the x-axis label\n", "plt.xlabel('Microchip Test 1') \n", "plt.legend(loc=\"upper right\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Figure 3 shows that our dataset cannot be separated into positive and negative examples by a straight-line through the plot. Therefore, a straight forward application of logistic regression will not perform well on this dataset since logistic regression will only be able to find a linear decision boundary.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 3.3 Feature mapping\n", "\n", "One way to fit the data better is to create more features from each data point. In the provided function `map_feature`, we will map the features into all polynomial terms of $x_1$ and $x_2$ up to the sixth power.\n", "\n", "$$\\mathrm{map\\_feature}(x) = \n", "\\left[\\begin{array}{c}\n", "x_1\\\\\n", "x_2\\\\\n", "x_1^2\\\\\n", "x_1 x_2\\\\\n", "x_2^2\\\\\n", "x_1^3\\\\\n", "\\vdots\\\\\n", "x_1 x_2^5\\\\\n", "x_2^6\\end{array}\\right]$$\n", "\n", "As a result of this mapping, our vector of two features (the scores on two QA tests) has been transformed into a 27-dimensional vector. \n", "\n", "- A logistic regression classifier trained on this higher-dimension feature vector will have a more complex decision boundary and will be nonlinear when drawn in our 2-dimensional plot. \n", "- We have provided the `map_feature` function for you in utils.py. " ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original shape of data: (118, 2)\n", "Shape after feature mapping: (118, 27)\n" ] } ], "source": [ "print(\"Original shape of data:\", X_train.shape)\n", "\n", "mapped_X = map_feature(X_train[:, 0], X_train[:, 1])\n", "print(\"Shape after feature mapping:\", mapped_X.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also print the first elements of `X_train` and `mapped_X` to see the tranformation." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "X_train[0]: [0.051267 0.69956 ]\n", "mapped X_train[0]: [5.12670000e-02 6.99560000e-01 2.62830529e-03 3.58643425e-02\n", " 4.89384194e-01 1.34745327e-04 1.83865725e-03 2.50892595e-02\n", " 3.42353606e-01 6.90798869e-06 9.42624411e-05 1.28625106e-03\n", " 1.75514423e-02 2.39496889e-01 3.54151856e-07 4.83255257e-06\n", " 6.59422333e-05 8.99809795e-04 1.22782870e-02 1.67542444e-01\n", " 1.81563032e-08 2.47750473e-07 3.38066048e-06 4.61305487e-05\n", " 6.29470940e-04 8.58939846e-03 1.17205992e-01]\n" ] } ], "source": [ "print(\"X_train[0]:\", X_train[0])\n", "print(\"mapped X_train[0]:\", mapped_X[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While the feature mapping allows us to build a more expressive classifier, it is also more susceptible to overfitting. In the next parts of the exercise, you will implement regularized logistic regression to fit the data and also see for yourself how regularization can help combat the overfitting problem.\n", "\n", "\n", "### 3.4 Cost function for regularized logistic regression\n", "\n", "In this part, you will implement the cost function for regularized logistic regression.\n", "\n", "Recall that for regularized logistic regression, the cost function is of the form\n", "$$J(\\mathbf{w},b) = \\frac{1}{m} \\sum_{i=0}^{m-1} \\left[ -y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) \\right] + \\frac{\\lambda}{2m} \\sum_{j=0}^{n-1} w_j^2$$\n", "\n", "Compare this to the cost function without regularization (which you implemented above), which is of the form \n", "\n", "$$ J(\\mathbf{w}.b) = \\frac{1}{m}\\sum_{i=0}^{m-1} \\left[ (-y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right)\\right]$$\n", "\n", "The difference is the regularization term, which is $$\\frac{\\lambda}{2m} \\sum_{j=0}^{n-1} w_j^2$$ \n", "Note that the $b$ parameter is not regularized." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Exercise 5\n", "\n", "Please complete the `compute_cost_reg` function below to calculate the following term for each element in $w$ \n", "$$\\frac{\\lambda}{2m} \\sum_{j=0}^{n-1} w_j^2$$\n", "\n", "The starter code then adds this to the cost without regularization (which you computed above in `compute_cost`) to calculate the cost with regulatization.\n", "\n", "If you get stuck, you can check out the hints presented after the cell below to help you with the implementation." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "# UNQ_C5\n", "def compute_cost_reg(X, y, w, b, lambda_ = 1):\n", " \"\"\"\n", " Computes the cost over all examples\n", " Args:\n", " X : (ndarray Shape (m,n)) data, m examples by n features\n", " y : (ndarray Shape (m,)) target value \n", " w : (ndarray Shape (n,)) values of parameters of the model \n", " b : (scalar) value of bias parameter of the model\n", " lambda_ : (scalar, float) Controls amount of regularization\n", " Returns:\n", " total_cost : (scalar) cost \n", " \"\"\"\n", "\n", " m, n = X.shape\n", " \n", " # Calls the compute_cost function that you implemented above\n", " cost_without_reg = compute_cost(X, y, w, b) \n", " \n", " # You need to calculate this value\n", " reg_cost = 0.\n", " \n", " ### START CODE HERE ###\n", " reg_cost = sum([(w[j])**2 for j in range(n)]) * lambda_ / (2*m)\n", " \n", " \n", " ### END CODE HERE ### \n", " \n", " # Add the regularization cost to get the total cost\n", " total_cost = cost_without_reg + reg_cost\n", "\n", " return total_cost" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Click for hints\n", " \n", " \n", "* Here's how you can structure the overall implementation for this function\n", " ```python \n", " def compute_cost_reg(X, y, w, b, lambda_ = 1):\n", " \n", " m, n = X.shape\n", " \n", " # Calls the compute_cost function that you implemented above\n", " cost_without_reg = compute_cost(X, y, w, b) \n", " \n", " # You need to calculate this value\n", " reg_cost = 0.\n", " \n", " ### START CODE HERE ###\n", " for j in range(n):\n", " reg_cost_j = # Your code here to calculate the cost from w[j]\n", " reg_cost = reg_cost + reg_cost_j\n", " reg_cost = (lambda_/(2 * m)) * reg_cost\n", " ### END CODE HERE ### \n", " \n", " # Add the regularization cost to get the total cost\n", " total_cost = cost_without_reg + reg_cost\n", "\n", " return total_cost\n", " ```\n", " \n", " If you're still stuck, you can check the hints presented below to figure out how to calculate `reg_cost_j` \n", " \n", "
\n", " Hint to calculate reg_cost_j\n", "     You can use calculate reg_cost_j as reg_cost_j = w[j]**2 \n", "
\n", " \n", "
\n", "\n", "
\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the cell below to check your implementation of the `compute_cost_reg` function." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Regularized cost : 0.6618252552483948\n", "\u001b[92mAll tests passed!\n" ] } ], "source": [ "X_mapped = map_feature(X_train[:, 0], X_train[:, 1])\n", "np.random.seed(1)\n", "initial_w = np.random.rand(X_mapped.shape[1]) - 0.5\n", "initial_b = 0.5\n", "lambda_ = 0.5\n", "cost = compute_cost_reg(X_mapped, y_train, initial_w, initial_b, lambda_)\n", "\n", "print(\"Regularized cost :\", cost)\n", "\n", "# UNIT TEST \n", "compute_cost_reg_test(compute_cost_reg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected Output**:\n", "\n", " \n", " \n", " \n", " \n", "
Regularized cost : 0.6618252552483948
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 3.5 Gradient for regularized logistic regression\n", "\n", "In this section, you will implement the gradient for regularized logistic regression.\n", "\n", "\n", "The gradient of the regularized cost function has two components. The first, $\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ is a scalar, the other is a vector with the same shape as the parameters $\\mathbf{w}$, where the $j^\\mathrm{th}$ element is defined as follows:\n", "\n", "$$\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} = \\frac{1}{m} \\sum_{i=0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) $$\n", "\n", "$$\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} = \\left( \\frac{1}{m} \\sum_{i=0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) x_j^{(i)} \\right) + \\frac{\\lambda}{m} w_j \\quad\\, \\mbox{for $j=0...(n-1)$}$$\n", "\n", "Compare this to the gradient of the cost function without regularization (which you implemented above), which is of the form \n", "$$\n", "\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)}) \\tag{2}\n", "$$\n", "$$\n", "\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)})x_{j}^{(i)} \\tag{3}\n", "$$\n", "\n", "\n", "As you can see,$\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ is the same, the difference is the following term in $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w}$, which is $$\\frac{\\lambda}{m} w_j \\quad\\, \\mbox{for $j=0...(n-1)$}$$ \n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Exercise 6\n", "\n", "Please complete the `compute_gradient_reg` function below to modify the code below to calculate the following term\n", "\n", "$$\\frac{\\lambda}{m} w_j \\quad\\, \\mbox{for $j=0...(n-1)$}$$\n", "\n", "The starter code will add this term to the $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w}$ returned from `compute_gradient` above to get the gradient for the regularized cost function.\n", "\n", "\n", "If you get stuck, you can check out the hints presented after the cell below to help you with the implementation." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "# UNQ_C6\n", "def compute_gradient_reg(X, y, w, b, lambda_ = 1): \n", " \"\"\"\n", " Computes the gradient for logistic regression with regularization\n", " \n", " Args:\n", " X : (ndarray Shape (m,n)) data, m examples by n features\n", " y : (ndarray Shape (m,)) target value \n", " w : (ndarray Shape (n,)) values of parameters of the model \n", " b : (scalar) value of bias parameter of the model\n", " lambda_ : (scalar,float) regularization constant\n", " Returns\n", " dj_db : (scalar) The gradient of the cost w.r.t. the parameter b. \n", " dj_dw : (ndarray Shape (n,)) The gradient of the cost w.r.t. the parameters w. \n", "\n", " \"\"\"\n", " m, n = X.shape\n", " \n", " dj_db, dj_dw = compute_gradient(X, y, w, b)\n", "\n", " ### START CODE HERE ### \n", " for j in range(n):\n", " dj_dw[j] += w[j] * lambda_ / m \n", " ### END CODE HERE ### \n", " \n", " return dj_db, dj_dw" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Click for hints\n", " \n", " \n", "* Here's how you can structure the overall implementation for this function\n", " ```python \n", " def compute_gradient_reg(X, y, w, b, lambda_ = 1): \n", " m, n = X.shape\n", " \n", " dj_db, dj_dw = compute_gradient(X, y, w, b)\n", "\n", " ### START CODE HERE ### \n", " # Loop over the elements of w\n", " for j in range(n): \n", " \n", " dj_dw_j_reg = # Your code here to calculate the regularization term for dj_dw[j]\n", " \n", " # Add the regularization term to the correspoding element of dj_dw\n", " dj_dw[j] = dj_dw[j] + dj_dw_j_reg\n", " \n", " ### END CODE HERE ### \n", " \n", " return dj_db, dj_dw\n", " ```\n", " \n", " If you're still stuck, you can check the hints presented below to figure out how to calculate `dj_dw_j_reg` \n", " \n", "
\n", " Hint to calculate dj_dw_j_reg\n", "     You can use calculate dj_dw_j_reg as dj_dw_j_reg = (lambda_ / m) * w[j] \n", "
\n", " \n", "
\n", "\n", "
\n", "\n", " \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the cell below to check your implementation of the `compute_gradient_reg` function." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "deletable": false, "editable": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dj_db: 0.07138288792343662\n", "First few elements of regularized dj_dw:\n", " [-0.010386028450548701, 0.011409852883280124, 0.0536273463274574, 0.003140278267313462]\n", "\u001b[92mAll tests passed!\n" ] } ], "source": [ "X_mapped = map_feature(X_train[:, 0], X_train[:, 1])\n", "np.random.seed(1) \n", "initial_w = np.random.rand(X_mapped.shape[1]) - 0.5 \n", "initial_b = 0.5\n", " \n", "lambda_ = 0.5\n", "dj_db, dj_dw = compute_gradient_reg(X_mapped, y_train, initial_w, initial_b, lambda_)\n", "\n", "print(f\"dj_db: {dj_db}\", )\n", "print(f\"First few elements of regularized dj_dw:\\n {dj_dw[:4].tolist()}\", )\n", "\n", "# UNIT TESTS \n", "compute_gradient_reg_test(compute_gradient_reg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected Output**:\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dj_db:0.07138288792343
First few elements of regularized dj_dw:
[[-0.010386028450548], [0.011409852883280], [0.0536273463274], [0.003140278267313]]
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 3.6 Learning parameters using gradient descent\n", "\n", "Similar to the previous parts, you will use your gradient descent function implemented above to learn the optimal parameters $w$,$b$. \n", "- If you have completed the cost and gradient for regularized logistic regression correctly, you should be able to step through the next cell to learn the parameters $w$. \n", "- After training our parameters, we will use it to plot the decision boundary. \n", "\n", "**Note**\n", "\n", "The code block below takes quite a while to run, especially with a non-vectorized version. You can reduce the `iterations` to test your implementation and iterate faster. If you have time later, run for 100,000 iterations to see better results." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false }, "outputs": [], "source": [ "# Initialize fitting parameters\n", "np.random.seed(1)\n", "initial_w = np.random.rand(X_mapped.shape[1])-0.5\n", "initial_b = 1.\n", "\n", "# Set regularization parameter lambda_ (you can try varying this)\n", "lambda_ = 0.01 \n", "\n", "# Some gradient descent settings\n", "iterations = 10000\n", "alpha = 0.01\n", "\n", "w,b, J_history,_ = gradient_descent(X_mapped, y_train, initial_w, initial_b, \n", " compute_cost_reg, compute_gradient_reg, \n", " alpha, iterations, lambda_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", " Expected Output: Cost < 0.5 (Click for details)\n", "\n", "\n", "```\n", "# Using the following settings\n", "#np.random.seed(1)\n", "#initial_w = np.random.rand(X_mapped.shape[1])-0.5\n", "#initial_b = 1.\n", "#lambda_ = 0.01; \n", "#iterations = 10000\n", "#alpha = 0.01\n", "Iteration 0: Cost 0.72 \n", "Iteration 1000: Cost 0.59 \n", "Iteration 2000: Cost 0.56 \n", "Iteration 3000: Cost 0.53 \n", "Iteration 4000: Cost 0.51 \n", "Iteration 5000: Cost 0.50 \n", "Iteration 6000: Cost 0.48 \n", "Iteration 7000: Cost 0.47 \n", "Iteration 8000: Cost 0.46 \n", "Iteration 9000: Cost 0.45 \n", "Iteration 9999: Cost 0.45 \n", " \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 3.7 Plotting the decision boundary\n", "To help you visualize the model learned by this classifier, we will use our `plot_decision_boundary` function which plots the (non-linear) decision boundary that separates the positive and negative examples. \n", "\n", "- In the function, we plotted the non-linear decision boundary by computing the classifier’s predictions on an evenly spaced grid and then drew a contour plot of where the predictions change from y = 0 to y = 1.\n", "\n", "- After learning the parameters $w$,$b$, the next step is to plot a decision boundary similar to Figure 4.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "plot_decision_boundary(w, b, X_mapped, y_train)\n", "# Set the y-axis label\n", "plt.ylabel('Microchip Test 2') \n", "# Set the x-axis label\n", "plt.xlabel('Microchip Test 1') \n", "plt.legend(loc=\"upper right\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 3.8 Evaluating regularized logistic regression model\n", "\n", "You will use the `predict` function that you implemented above to calculate the accuracy of the regularized logistic regression model on the training set" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false }, "outputs": [], "source": [ "#Compute accuracy on the training set\n", "p = predict(X_mapped, w, b)\n", "\n", "print('Train Accuracy: %f'%(np.mean(p == y_train) * 100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Expected Output**:\n", "\n", " \n", " \n", "
Train Accuracy:~ 80%
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Congratulations on completing the final lab of this course! We hope to see you in Course 2 where you will use more advanced learning algorithms such as neural networks and decision trees. Keep learning!**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Please click here if you want to experiment with any of the non-graded code.\n", "

Important Note: Please only do this when you've already passed the assignment to avoid problems with the autograder.\n", "

    \n", "
  1. On the notebook’s menu, click “View” > “Cell Toolbar” > “Edit Metadata”
  2. \n", "
  3. Hit the “Edit Metadata” button next to the code cell which you want to lock/unlock
  4. \n", "
  5. Set the attribute value for “editable” to:\n", "
      \n", "
    • “true” if you want to unlock it
    • \n", "
    • “false” if you want to lock it
    • \n", "
    \n", "
  6. \n", "
  7. On the notebook’s menu, click “View” > “Cell Toolbar” > “None”
  8. \n", "
\n", "

Here's a short demo of how to do the steps above: \n", "
\n", " \"unlock_cells.gif\"\n", "

" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }