2016年4月27日 星期三

4.27 UW Machine Learning Regression Week 2 Assignment 1

4 Quiz Questions that I did wrong:

1) If you double the value of a given feature (i.e. a specific column of the feature matrix), what happens to the least-squares estimated coefficients for every other feature? (assume you have no other feature that depends on the doubled feature i.e. no interaction terms).
It is impossible to tell from the information provided (wrong)
They stay the same
Considering when interpreting a parameter, we assume that other features are constant (i.e. no matter a particular feature's measurement got doubled or not).


2) Gradient descent/ascent is...


An approximation to simple linear regression (wrong)
A modeling technique in machine learning (wrong)
An algorithm for minimizing/maximizing a function
by definition...



3) Let's analyze how many computations are required to fit a multiple linear regression model using the closed-form solution based on a data set with 50 observations and 10 features. In the videos, we said that computing the inverse of the 10x10 matrix (H^T)H was on the order of D^3 operations. Let's focus on forming this matrix prior to inversion. How many multiplications are required to form the matrix (H^T)H?

1000 (wrong) not reading the question carefully... N x N x D = 50 x 50 x 10 = 25000 (wrong) not did the linear algebra correctly... N x D x D = 50 x 10 x 10 = 5000 (review Linear Algebra) be patient


4) More generally, if you have D features and N observations what is the total complexity of computing ((H^T)H)^(-1)?

O(D^3) (wrong) see first failure in 3)

O(N^2D + D^3) (wrong) see second failure in 3)
O(ND^2 + D^3) that's how I got 3) correct

2016年4月19日 星期二

4.19 Pick up a new language?

This is quoted from UW Machine Learning Discussion Forum:

hey kai,
Need your advice, as you have more experience in data science.
Which is more used in real world application R or Python ?
I know python and I want to learn data science using python.
Would I have to learn R ?
thanks,
Savan

My response:

Hi Savan,
Regarding your question, there is no right/wrong answer in deciding whether it is necessary to pick up a new language. There are many arguments out there (http://insidebigdata.com/2013/12/09/data-science-wars-python-vs-r/). Since I started R first (before R I got some experience in MATLAB), I am primarily an R user and it's my current tool for data processing and analysis. My overall experience with R is that it is very flexible, meaning there are many available packages (and getting more and more) that are developed and can be applied in many fields; but I heard people arguing there are so many tricks in R (i.e. shortcut functions) that could make the codes kind of messy. One strength of python, though I don't have an intuition yet, is it's highly scalable (able to handle small or huge data set), and I guess this is part of the reasons python is very popular in machine learning.
So, it's useful to know another language for data science, but it is also necessary to be realistic and figure out the time and the study load that one might take, since it's also a learning process. My suggestion would be stick to python first, but feel free to explore. After all, the core of data science is to "get hands dirty" (http://www.kdnuggets.com/2015/05/data-science-inconvenient-truth.html), so there will be many opportunities to practice; as the hands get more dirty, one may need to come up with new solutions to the problems, and at that point R might come in and play.
I hope this help.
Best,
Kai