1) If you double the value of a given
feature (i.e. a specific column of the feature matrix), what happens to the
least-squares estimated coefficients for every other feature? (assume
you have no other feature that depends on the doubled feature i.e. no
interaction terms).
It is impossible to tell from the information provided (wrong)
They stay the same
Considering when interpreting a parameter, we assume that other features are constant (i.e. no matter a particular feature's measurement got doubled or not).
2) Gradient descent/ascent is...
An approximation to simple linear regression (wrong)
A modeling technique in machine learning (wrong)
A modeling technique in machine learning (wrong)
An algorithm for minimizing/maximizing a
function
by definition...
by definition...
3) Let's analyze how many computations are required to fit a multiple linear regression model using the closed-form solution based on a data set with 50 observations and 10 features. In the videos, we said that computing the inverse of the 10x10 matrix (H^T)H was on the order of D^3 operations. Let's focus on forming this matrix prior to inversion. How many multiplications are required to form the matrix (H^T)H?
1000 (wrong) not reading the question carefully...
N x N x D = 50 x 50 x 10 = 25000 (wrong) not did the linear algebra correctly...
N x D x D = 50 x 10 x 10 = 5000 (review Linear Algebra) be patient
4) More generally, if you have D features and N observations what is the total complexity of computing ((H^T)H)^(-1)?
O(D^3) (wrong) see first failure in 3)
O(N^2D + D^3) (wrong) see second failure in 3)
O(ND^2 + D^3) that's how I got 3) correct