## Question 1

Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year.

Specifically, let x be equal to the number of “A” grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of “A” grades they get in their second year (sophomore year).

Here each row is one training example. Recall that in linear regression, our hypothesis is hθ(x)=θ01x, and we use m to denote the number of training examples

For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of m? In the box below, please enter your answer (which should be a number between 0 and 10).

4

### Question 2

Consider the following training set of m=4 training examples:

Consider the linear regression model hθ(x)=θ01x. What are the values of θ0 and θ1 that you would expect to obtain upon running gradient descent on this model? (Linear regression will be able to fit this data perfectly.)

• θ0=0.5,θ1=0
• θ0=0.5,θ1=0.5
• θ0=1,θ1=0.5
• θ0=0,θ1=0.5
• θ0=1,θ1=1

θ0=0,θ1=0.5

As J(θ01)=0, y = hθ(x) = θ0 + θ1x. Using any two values in the table, solve for θ0, θ1

#### Question 3

Suppose we set θ0=−1,θ1=0.5. What is hθ(4)?

Setting x = 4, we have hθ(x)=θ01x = -1 + (0.5)(4) = 1

#### Question 4

Let f be some function so that

f(θ01) outputs a number. For this problem,

f is some arbitrary/unknown smooth function (not necessarily the

cost function of linear regression, so f may have local optima).

Suppose we use gradient descent to try to minimize f(θ01) as a function of θ0 and θ1. Which of the

following statements are true? (Check all that apply.)

• Even if the learning rate α is very large, every iteration of gradient descent will decrease the value of f(θ01).
• If the learning rate is too small, then gradient descent may take a very long time to converge.
• If θ0 and θ1 are initialized at a local minimum, then one iteration will not change their values.
• If θ0 and θ1 are initialized so that θ01, then by symmetry (because we do simultaneous updates to the two parameters), after one iteration of gradient descent, we will still have θ01.

Other Options:

### Question 5

Suppose that for some linear regression problem (say, predicting housing prices as in the lecture), we have some training set, and for our training set we managed to find some θ0, θ1 such that J(θ01)=0.

Which of the statements below must then be true? (Check all that apply.)

• For this to be true, we must have y(i)=0 for every value of i=1,2,…,m.
• Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum.
• For this to be true, we must have θ0=0 and θ1=0 so that hθ(x)=0
• Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line.