definitialize_with_zeros(dim): """ This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0. Argument: dim -- size of the w vector we want (or number of parameters in this case) Returns: w -- initialized vector of shape (dim, 1) b -- initialized scalar (corresponds to the bias) """
### START CODE HERE ### (≈ 1 line of code) w = np.zeros([dim, 1]) b = 0 ### END CODE HERE ###
This is the predict value from current model $$ \hat{y}^{(i)} = sigmoid(z^{(i)}) $$
Also here we note that $$ \frac{\partial \hat{y}}{\partial z} = \hat{y} (1 - \hat{y}) $$
Loss Function
Now we need to calculate the difference between actual and expected values, here we define the function to compute the different by using $\log$ in logistic regression, rather than square difference. This is because in optimization phase, the square difference equation will result into many local optimal problem.
defpropagate(w, b, X, Y): """ Implement the cost function and its gradient for the propagation explained above
Arguments: w -- weights, a numpy array of size (num_px * num_px * 3, 1) b -- bias, a scalar X -- data of size (num_px * num_px * 3, number of examples) Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
Return: cost -- negative log-likelihood cost for logistic regression dw -- gradient of the loss with respect to w, thus same shape as w db -- gradient of the loss with respect to b, thus same shape as b Tips: - Write your code step by step for the propagation. np.log(), np.dot() """
m = X.shape[1]
# FORWARD PROPAGATION (FROM X TO COST) ### START CODE HERE ### (≈ 2 lines of code) #w.shape = [num, 1] X.shape = [num, NOE] A = sigmoid(np.dot(w.T, X) + b).T #A.shape = [NOE, 1] # compute activation # Y.shape = [1, NOE] cost = (np.dot(Y, np.log(A)) + np.dot((1 - Y), np.log(1 - A))) / m * -1# compute cost ### END CODE HERE ###
# BACKWARD PROPAGATION (TO FIND GRAD) ### START CODE HERE ### (≈ 2 lines of code) dw = np.dot(X, (A.T - Y).T) / m db = np.dot(A.T - Y, np.ones([m])) / m ### END CODE HERE ###
We move forward by computing the loss function and cost function.
Backward
In order to compute backward, we need to calculate the derivitive from loss function and moving backward. Here we add some mathematical explanation for the equation above:
defoptimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False): """ This function optimizes w and b by running a gradient descent algorithm Arguments: w -- weights, a numpy array of size (num_px * num_px * 3, 1) b -- bias, a scalar X -- data of shape (num_px * num_px * 3, number of examples) Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples) num_iterations -- number of iterations of the optimization loop learning_rate -- learning rate of the gradient descent update rule print_cost -- True to print the loss every 100 steps Returns: params -- dictionary containing the weights w and bias b grads -- dictionary containing the gradients of the weights and bias with respect to the cost function costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve. Tips: You basically need to write down two steps and iterate through them: 1) Calculate the cost and the gradient for the current parameters. Use propagate(). 2) Update the parameters using gradient descent rule for w and b. """
costs = []
for i inrange(num_iterations):
# Cost and gradient calculation (≈ 1-4 lines of code) ### START CODE HERE ### grads, cost = propagate(w, b, X, Y) ### END CODE HERE ###
# Retrieve derivatives from grads dw = grads["dw"] db = grads["db"]
# update rule (≈ 2 lines of code) ### START CODE HERE ### w = w - learning_rate * dw b = b - learning_rate * db ### END CODE HERE ###
# Record the costs if i % 100 == 0: costs.append(cost)
# Print the cost every 100 training iterations if print_cost and i % 100 == 0: print ("Cost after iteration %i: %f" %(i, cost))
params = {"w": w, "b": b}
grads = {"dw": dw, "db": db}
return params, grads, costs
Predict
Once we are comfortable with the trained model, now is to test the performance of this one with test data.
defpredict(w, b, X): ''' Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b) Arguments: w -- weights, a numpy array of size (num_px * num_px * 3, 1) b -- bias, a scalar X -- data of size (num_px * num_px * 3, number of examples) Returns: Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X '''
m = X.shape[1] Y_prediction = np.zeros((1,m)) w = w.reshape(X.shape[0], 1)
# Compute vector "A" predicting the probabilities of a cat being present in the picture ### START CODE HERE ### (≈ 1 line of code) A = sigmoid(np.dot(w.T, X) + b) #A.shape = [NOE, 1] ### END CODE HERE ###
for i inrange(A.shape[1]):
# Convert probabilities A[0,i] to actual predictions p[0,i] ### START CODE HERE ### (≈ 4 lines of code) Y_prediction[0][i] = 1if A[0][i] > 0.5else0 ### END CODE HERE ###
defmodel(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False): """ Builds the logistic regression model by calling the function you've implemented previously Arguments: X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train) Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train) X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test) Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test) num_iterations -- hyperparameter representing the number of iterations to optimize the parameters learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize() print_cost -- Set to true to print the cost every 100 iterations Returns: d -- dictionary containing information about the model. """
### START CODE HERE ###
# initialize parameters with zeros (≈ 1 line of code) w, b = initialize_with_zeros(X_train.shape[0])
# Gradient descent (≈ 1 line of code) parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate)
# Retrieve parameters w and b from dictionary "parameters" w = parameters["w"] b = parameters["b"]
# Predict test/train set examples (≈ 2 lines of code) Y_prediction_test = predict(w, b, X_test) Y_prediction_train = predict(w, b, X_train)