https://researchtraining.nih.gov/programs/career-development
If can do this during PhD, that would be good as it's better to get familiar with the process earlier.
2016年7月6日 星期三
2016年6月29日 星期三
6.29 Remind me of calculating degree of freedom
http://ron.dotsch.org/degrees-of-freedom/
When reporting the F statistics, follow this format: F(df1, df2) = ...
When reporting the F statistics, follow this format: F(df1, df2) = ...
- df1: # of levels (or factors) - 1
- df2: residual; # of subjects - # of levels
2016年6月23日 星期四
6.23 Paired t-test versus ANCOVA
Paired t-test compares differences between means, doing less than ANCOVA. ANCOVA can compare differences between means, controlling covariates, and making predictions according to the coefficients.
6.23 About choosing model
Always think about what question one needs to address. A different model may no longer fit for the question.
Considering covariate does not mean it has to stay in the model forever. If not significant, can be removed and report the raw value (which is statistically equivalent to considering it in the model).
Considering covariate does not mean it has to stay in the model forever. If not significant, can be removed and report the raw value (which is statistically equivalent to considering it in the model).
2016年6月20日 星期一
6.20 Can we study brain cultivated from stem cells?
Excited when reading Dr.Guy Mckhann's article on how differentiating iPSCs and developing them into brain cells can inspire many questions such as studying the Zika virus.
The most exciting part of this research involves creating "mini-brains." These mini-brains, about the size of the head of a pin, can be used to study the development of the human brain and how development is altered by a virus. Hopkins researchers are currently using mini-brains to study the Zika virus. This study is just the tip of the iceberg.
6.20 Better skill of organizing is urgently needed
Inevitably, I write pieces of draft papers everyday during the daily work. If not promptly archived, they will start to fly everywhere on my desk, and I simply can't get a clear mind out of these residues. I need to keep the desk clean which helps me keep the mind sharp.
Usually there are multiple steps in the image processing pipelines, so when I run the pipeline I need to remind myself.
Also, sometimes there are short-hand calculations for example the adjusted p-value after multiple correction.
It also happens when I need to construct a big table format that I should follow so as to smoothly run the statistical analysis.
There are also printed reports with figures and tables to be discussed with PI.
It's also important to keep several notebook, for daily recording, for technical details, etc.
Usually there are multiple steps in the image processing pipelines, so when I run the pipeline I need to remind myself.
Also, sometimes there are short-hand calculations for example the adjusted p-value after multiple correction.
It also happens when I need to construct a big table format that I should follow so as to smoothly run the statistical analysis.
There are also printed reports with figures and tables to be discussed with PI.
It's also important to keep several notebook, for daily recording, for technical details, etc.
2016年6月16日 星期四
6.16 Notes of Statistics
- It is not appropriate to use standard Bonferroni correction for the non-apriori regions, because like always, our neuroimaging dependent measures will be at least moderately intercorrelated. The standard Bonferroni assumes orthogonality (independence), which is not the case for our FA measures. It would be much better to use the modified Bonf method that I've used forever (refer to the Sankoh's paper), or FDR.
- Given there are multiple ways to modify the Bonf which seems to be a black box to me, just use FDR for now.
- I used Spearman's rho to investigate associations between smoking and drinking measures and FA in all groups. All of these associations must be adjusted for age because of the age association with FA. It does not matter if the groups are or are not different with age. We want to know if the association are significant after adjusting for the influence of age. Additionaly, lifetime years of smoking is related to age, so age absolutely must be used as a covariate. You cannot use covariates with the standard Spearman method. These analyses must be repeated with linear regression, using age as a covariate.
- In SPSS, there's something called part correlation (i.e. semi-partial correlation, different from partial correlation).
- Maybe scores of zero were assigned to non-smokers for lifetime years of smoking. If this did happen, this is a fatal design flaw. You can't assign a score of zero to someone who does not have the behavior, i.e., history of smoking. A score of zero is meaningless and creates a "zero clumping" issue that will absolutely lead to spurious results for simple correlations or linear regression.
- e.g. non-smokers can't have "0" for lifetime years of smoking, which would otherwise be equivalent to the smokers who does not have a history of smoking. This will confuse group design and mess the analysis up.
2016年6月6日 星期一
6.6 Reasons that I unenrolled in ML Classification
I already finished the case-study course and the regression course, get
an idea of what machine learning about. When have time, I can always
come back and review the context that I've learnt, or picking up new
knowledge without knowing nothing of it. The regression course is of
particular importance as I will learn most of the materials again in my
PhD.
I cannot grade the Quiz unless pay $79 to Coursera to upgrade to a verified certificate. It is pointless to just watch the video and doing ungraded quiz, and it is unfair for learners who do not wish to pay for verified certificate to pay for grading the quiz. This is a total nonsense.
I have more tasks on the plate, processing more images and analyzing more data, as well as manuscript to work on. I am simply running out of time. The classification courses won't finish until the end of July when I will be already on the road trip. Either I will be studying while traveling (which is weird), or I shall finish the course earlier to make time for the trip, which adds up stress.
To this stage, completing the first two courses are sufficient for me to pick up python. I am thinking about strengthening my understanding in data structure and algorithm so as to write better codes. I should think about what's the best strategy to fit that in. But now I am so disappointed with Coursera. That company tried to squeeze money rather than benefiting the learning community.
I cannot grade the Quiz unless pay $79 to Coursera to upgrade to a verified certificate. It is pointless to just watch the video and doing ungraded quiz, and it is unfair for learners who do not wish to pay for verified certificate to pay for grading the quiz. This is a total nonsense.
I have more tasks on the plate, processing more images and analyzing more data, as well as manuscript to work on. I am simply running out of time. The classification courses won't finish until the end of July when I will be already on the road trip. Either I will be studying while traveling (which is weird), or I shall finish the course earlier to make time for the trip, which adds up stress.
To this stage, completing the first two courses are sufficient for me to pick up python. I am thinking about strengthening my understanding in data structure and algorithm so as to write better codes. I should think about what's the best strategy to fit that in. But now I am so disappointed with Coursera. That company tried to squeeze money rather than benefiting the learning community.
2016年5月16日 星期一
2016年5月10日 星期二
5.10 Wrote to myself
(May 2nd, 2016 posted on Facebook)
Thanks for everyone’s birthday wishes!
Today turns out to be a day of hardworking. Although still got a lot to work on, I am glad to see my first scientific manuscript getting better every single day.
After work, with a little spirit of celebration, I got myself a new pair of StanSmith, a cup of S’more Frapuccino, and a wonderful dinner in SOMA at the Kimchi Burrito place watching Spur’s game.
Thanks for everyone’s birthday wishes!
Today turns out to be a day of hardworking. Although still got a lot to work on, I am glad to see my first scientific manuscript getting better every single day.
After work, with a little spirit of celebration, I got myself a new pair of StanSmith, a cup of S’more Frapuccino, and a wonderful dinner in SOMA at the Kimchi Burrito place watching Spur’s game.
I want to remind myself that in 2015, I wrote down several TODOs:
- Learn R programming
I am so glad that I’ve been picking up the programming skill, which helped me accomplished my master thesis and now become my primary tool for daily work. Every day I learn a bit of it and immediately apply in problem solving, which gives me a full sense of accomplishment.
- Keep traveling
I spent a wonderful year traveling around US and China, as well as Okinawa for PhD interview. It can’t be more enjoyable to get reunion with old friends, families, as well as meeting new friends. Will certainly keep the spirit on!
- Keep reading
I am not reading quite much. Tons of daily fragment readings wouldn’t count. Two of my current interests are “How to be a Modern Scientist” by Jeff Leek and “The Art of Data Science” by Roger Peng, both authors are the instructors of John Hopkins University’s Data Sciences Specialisation on Coursera. Will try reading things that are seemingly irrelevant but genuinely inspiring.
- Keep being grateful for life
Living in a different culture and country is certainly not easy. Though did experienced down moments and hard time, I always had supports in all kinds of ways which helped me got through. Life is certainly wonderful in both explainable truths and unexplainable beauties.
- Make more jokes about life
Have I? I will keep making more!
- Try not to get horned by SF driver
I've become one of them?!
At the age of 26, I want to:
- Get my first vehicle and get the kick on Route 66
- Keep the positive attitude
- Learn machine learning with python
- Not discouraged by the on-going challenges
- Try my best to be an excellent PhD student
#StillOnMyWay
- Learn R programming
I am so glad that I’ve been picking up the programming skill, which helped me accomplished my master thesis and now become my primary tool for daily work. Every day I learn a bit of it and immediately apply in problem solving, which gives me a full sense of accomplishment.
- Keep traveling
I spent a wonderful year traveling around US and China, as well as Okinawa for PhD interview. It can’t be more enjoyable to get reunion with old friends, families, as well as meeting new friends. Will certainly keep the spirit on!
- Keep reading
I am not reading quite much. Tons of daily fragment readings wouldn’t count. Two of my current interests are “How to be a Modern Scientist” by Jeff Leek and “The Art of Data Science” by Roger Peng, both authors are the instructors of John Hopkins University’s Data Sciences Specialisation on Coursera. Will try reading things that are seemingly irrelevant but genuinely inspiring.
- Keep being grateful for life
Living in a different culture and country is certainly not easy. Though did experienced down moments and hard time, I always had supports in all kinds of ways which helped me got through. Life is certainly wonderful in both explainable truths and unexplainable beauties.
- Make more jokes about life
Have I? I will keep making more!
- Try not to get horned by SF driver
I've become one of them?!
At the age of 26, I want to:
- Get my first vehicle and get the kick on Route 66
- Keep the positive attitude
- Learn machine learning with python
- Not discouraged by the on-going challenges
- Try my best to be an excellent PhD student
#StillOnMyWay
2016年5月6日 星期五
5.6 Stats Question in R
Remind myself of these conversations of learning statistics through using R these days.
Thanks Tim, your comments are very helpful.
1) The reason I excluded the intercept (i.e., -1) is only for the convenience of extracting the coefficients (i.e. return the intercept, which refers to the Estimate column, respectively for each group). I indeed included the intercept (i.e., no -1) when running the actual analyses. The difference of taking in/out of the intercept is just the table outputs, not affecting any result of the analyses.
2) Dieter - Tim was correct. These two commands in fact did the same thing in R, meaning main effects are given by both commands in addition to the interaction term.
>lm(FA_308_27 ~ gp*smoker + Age - 1, data = dat3.2)
Estimate Std. Error t value Pr(>|t|)
gp1mALC 0.610268207 0.0425865499 14.330069 4.398040e-22
gp1wkALC 0.603016628 0.0469235498 12.851045 9.528076e-20
smokers 0.028498249 0.0194763982 1.463220 1.480830e-01
Age -0.001429693 0.0007651574 -1.868495 6.606585e-02
gp1wkALC:smokers -0.040359689 0.0368267023 -1.095935 2.770308e-01
>lm(FA_308_27 ~ gp + smoker + gp*smoker + Age - 1, data = dat3.2)
Estimate Std. Error t value Pr(>|t|)
gp1mALC 0.610268207 0.0425865499 14.330069 4.398040e-22
gp1wkALC 0.603016628 0.0469235498 12.851045 9.528076e-20
smokers 0.028498249 0.0194763982 1.463220 1.480830e-01
Age -0.001429693 0.0007651574 -1.868495 6.606585e-02
gp1wkALC:smokers -0.040359689 0.0368267023 -1.095935 2.770308e-01
3) I have read in pairwise comparisons in R (http://www.r-bloggers.com/r-tutorial-series-two-way-anova-with-pairwise-comparisons/) and Two-way ANOVA with Interactions and Simple Main Effects (http://rtutorialseries.blogspot.com/2011/02/r-tutorial-series-two-way-anova-with.html). I will change my statistical procedure accordingly.
Sincerely,
Yukai
===
Hi Yukai,
1) I'm wondering why you do not include the intercept in the model. In your equation, you have -1. I strongly suggest you include the intercept (i.e., +1). Depending on the statistical procedure you run, it can make a huge difference. I don't remember if matters in the standard lm model, but I always include it for form.
2) It is fine to do gp*smoking. In lm, if you use * e.g., variable1*variable2, the main effects for each variable in the interaction term will automatically be included.
3) The reason you do not see means for the 4 groups (ns1mALC, s1mALC, ns1wkALC, and s1wkALC), is because the model you built only includes the main effects and interactions. To get the adjusted means and standard errors for each individual group, you need to do pairwise comparisons among those groups. Just google pairwise comparisons in R, pairwise t-tests in R or try http://www.r-statistics.com/. There should be some sample code that you can adapt, and it is relatively straight-forward to execute.
Tim
Thanks Tim, your comments are very helpful.
1) The reason I excluded the intercept (i.e., -1) is only for the convenience of extracting the coefficients (i.e. return the intercept, which refers to the Estimate column, respectively for each group). I indeed included the intercept (i.e., no -1) when running the actual analyses. The difference of taking in/out of the intercept is just the table outputs, not affecting any result of the analyses.
2) Dieter - Tim was correct. These two commands in fact did the same thing in R, meaning main effects are given by both commands in addition to the interaction term.
>lm(FA_308_27 ~ gp*smoker + Age - 1, data = dat3.2)
Estimate Std. Error t value Pr(>|t|)
gp1mALC 0.610268207 0.0425865499 14.330069 4.398040e-22
gp1wkALC 0.603016628 0.0469235498 12.851045 9.528076e-20
smokers 0.028498249 0.0194763982 1.463220 1.480830e-01
Age -0.001429693 0.0007651574 -1.868495 6.606585e-02
gp1wkALC:smokers -0.040359689 0.0368267023 -1.095935 2.770308e-01
>lm(FA_308_27 ~ gp + smoker + gp*smoker + Age - 1, data = dat3.2)
Estimate Std. Error t value Pr(>|t|)
gp1mALC 0.610268207 0.0425865499 14.330069 4.398040e-22
gp1wkALC 0.603016628 0.0469235498 12.851045 9.528076e-20
smokers 0.028498249 0.0194763982 1.463220 1.480830e-01
Age -0.001429693 0.0007651574 -1.868495 6.606585e-02
gp1wkALC:smokers -0.040359689 0.0368267023 -1.095935 2.770308e-01
3) I have read in pairwise comparisons in R (http://www.r-bloggers.com/r-tutorial-series-two-way-anova-with-pairwise-comparisons/) and Two-way ANOVA with Interactions and Simple Main Effects (http://rtutorialseries.blogspot.com/2011/02/r-tutorial-series-two-way-anova-with.html). I will change my statistical procedure accordingly.
Sincerely,
Yukai
===
Hi Yukai,
1) I'm wondering why you do not include the intercept in the model. In your equation, you have -1. I strongly suggest you include the intercept (i.e., +1). Depending on the statistical procedure you run, it can make a huge difference. I don't remember if matters in the standard lm model, but I always include it for form.
2) It is fine to do gp*smoking. In lm, if you use * e.g., variable1*variable2, the main effects for each variable in the interaction term will automatically be included.
3) The reason you do not see means for the 4 groups (ns1mALC, s1mALC, ns1wkALC, and s1wkALC), is because the model you built only includes the main effects and interactions. To get the adjusted means and standard errors for each individual group, you need to do pairwise comparisons among those groups. Just google pairwise comparisons in R, pairwise t-tests in R or try http://www.r-statistics.com/. There should be some sample code that you can adapt, and it is relatively straight-forward to execute.
Tim
2016年4月27日 星期三
4.27 UW Machine Learning Regression Week 2 Assignment 1
4 Quiz Questions that I did wrong:
2) Gradient descent/ascent is...
3) Let's analyze how many computations are required to fit a multiple linear regression model using the closed-form solution based on a data set with 50 observations and 10 features. In the videos, we said that computing the inverse of the 10x10 matrix (H^T)H was on the order of D^3 operations. Let's focus on forming this matrix prior to inversion. How many multiplications are required to form the matrix (H^T)H?
4) More generally, if you have D features and N observations what is the total complexity of computing ((H^T)H)^(-1)?
O(D^3) (wrong) see first failure in 3)
O(N^2D + D^3) (wrong) see second failure in 3)
O(ND^2 + D^3) that's how I got 3) correct
1) If you double the value of a given
feature (i.e. a specific column of the feature matrix), what happens to the
least-squares estimated coefficients for every other feature? (assume
you have no other feature that depends on the doubled feature i.e. no
interaction terms).
It is impossible to tell from the information provided (wrong)
They stay the same
Considering when interpreting a parameter, we assume that other features are constant (i.e. no matter a particular feature's measurement got doubled or not).
2) Gradient descent/ascent is...
An approximation to simple linear regression (wrong)
A modeling technique in machine learning (wrong)
A modeling technique in machine learning (wrong)
An algorithm for minimizing/maximizing a
function
by definition...
by definition...
3) Let's analyze how many computations are required to fit a multiple linear regression model using the closed-form solution based on a data set with 50 observations and 10 features. In the videos, we said that computing the inverse of the 10x10 matrix (H^T)H was on the order of D^3 operations. Let's focus on forming this matrix prior to inversion. How many multiplications are required to form the matrix (H^T)H?
1000 (wrong) not reading the question carefully...
N x N x D = 50 x 50 x 10 = 25000 (wrong) not did the linear algebra correctly...
N x D x D = 50 x 10 x 10 = 5000 (review Linear Algebra) be patient
4) More generally, if you have D features and N observations what is the total complexity of computing ((H^T)H)^(-1)?
O(D^3) (wrong) see first failure in 3)
O(N^2D + D^3) (wrong) see second failure in 3)
O(ND^2 + D^3) that's how I got 3) correct
訂閱:
文章 (Atom)