God I made these too complicated. I should have talked with someone else to make it simpler, and it obviously can.
Accept the fact that I am not perfect, and I need others' help and advice.
2015年12月13日 星期日
2015年12月4日 星期五
2015年12月2日 星期三
12.2 Your Brain on Drugs: Imaging of Drug-related Changes in the Central Nervous System
Your Brain on Drugs: Imaging of Drug-related Changes in the Central Nervous System
This is a review article. They cited a 2003 article about cocaine-dependent alcoholics having shrinked white matter more than alcoholics without any commorbid substance use. Why the findings can be contradictory?
Chemical structure of cocaethylene:
This is a hard argument, but volumetric loss may not correlate with FA decrease.
This is a review article. They cited a 2003 article about cocaine-dependent alcoholics having shrinked white matter more than alcoholics without any commorbid substance use. Why the findings can be contradictory?
Chemical structure of cocaethylene:
This is a hard argument, but volumetric loss may not correlate with FA decrease.
2015年12月1日 星期二
12.1 Barplot in R base plotting
Still having trouble playing ggplot2.
These two websites are helpful for me to get started with barplot:
These two websites are helpful for me to get started with barplot:
2015年11月29日 星期日
11.29 Coursera Projects Feedback
Anyways, seems that I stole these points successfully...
Reproducible Research
Nice work! Your report does a good job hitting all the basic requirements. I gave you full points. There are some questions you might want to think about to further strengthen the report. 1. Why do you add injuries and fatalities together? It seems to me they are different levels of impacts and cannot be directly compared. 2. Why do you only consider property damage and not crop damage for economic impact? 3. Did you see that the PROPDMGEXP column should have been used to change the magnitude of the property damage value? 4. Would a bar plot be a better visualization than a pie chart to compare the difference of the event types?
Data Product
peer 1 → Very good application for the assignment. The documentation sholuld be combined with the app and attached as a separate link, but it is ok. The presentation does not work...
peer 2 → Your app is very simple. Dead simple. Seems like you skipped the machine learning class, as you are using a complicated approach to prediction by using individual lm coefficients instead of properly building the model and predicting values with appropriate functions. This is just to explain, why I did not give you the extra +1 for the course project. Apart from that, great effort and clean code. Good job!
peer 3 → It worked well! Nice sales job on presentation. Very energetic.
peer 4 → Hi, good work. The only missing thing is and html output. Anyway, great work !!
Reproducible Research
Nice work! Your report does a good job hitting all the basic requirements. I gave you full points. There are some questions you might want to think about to further strengthen the report. 1. Why do you add injuries and fatalities together? It seems to me they are different levels of impacts and cannot be directly compared. 2. Why do you only consider property damage and not crop damage for economic impact? 3. Did you see that the PROPDMGEXP column should have been used to change the magnitude of the property damage value? 4. Would a bar plot be a better visualization than a pie chart to compare the difference of the event types?
Data Product
peer 1 → Very good application for the assignment. The documentation sholuld be combined with the app and attached as a separate link, but it is ok. The presentation does not work...
peer 2 → Your app is very simple. Dead simple. Seems like you skipped the machine learning class, as you are using a complicated approach to prediction by using individual lm coefficients instead of properly building the model and predicting values with appropriate functions. This is just to explain, why I did not give you the extra +1 for the course project. Apart from that, great effort and clean code. Good job!
peer 3 → It worked well! Nice sales job on presentation. Very energetic.
peer 4 → Hi, good work. The only missing thing is and html output. Anyway, great work !!
2015年11月23日 星期一
11.23 Reproducible Research Project 2 Sample
If I am going to enroll in Reproducible Research again, this is the sample project 2 I should refer to:
http://rpubs.com/cbreen66/rr2
http://rpubs.com/cbreen66/rr2
2015年11月19日 星期四
11.19 A Review of what is overfitting
(From lecture notes of Regression Modeling) The variance estimate is biased if we underfit the model. The variance estimate is unbiased if we correctly or overfit the model, including all necessary covariates and/or unnecessary covariates. When including unnecessary variables (i.e. overfit), the variance of the variance is larger.
(From lecture notes of Practical Machine Learning) Overfitting will leads to the out-of-sample error rate higher than the in-sample error rate.
Data have two parts: signal and noise. The goal of a predictor is to find signal, and a perfect in-sample predictor can always be designed, but predictor won't perform as well on new samples.
Overfitting will do worse in extrapolating the missing data. Can fit perfectly within the model, but will generalize poorly, not work as robust when going to the outside world.
Adding extra variable is equal to constraining in a new dimension.
(From lecture notes of Practical Machine Learning) Overfitting will leads to the out-of-sample error rate higher than the in-sample error rate.
Data have two parts: signal and noise. The goal of a predictor is to find signal, and a perfect in-sample predictor can always be designed, but predictor won't perform as well on new samples.
Overfitting will do worse in extrapolating the missing data. Can fit perfectly within the model, but will generalize poorly, not work as robust when going to the outside world.
Adding extra variable is equal to constraining in a new dimension.
11.19 Reproducible Research Checklist
A comprehensive checklist from "Reproducible Research" class in Data Sciences Specialization.
Do:
Do:
- Start with good science
- Teach a computer
- Use some version control (e.g. github commit)
- Keep track of software environment
- Set seed for random number generators
- Think about the entire pipeline
- Don't do everything by hand; if must, document it
- Don't use point-and-click software
- Don't save output (improve efficiency)
2015年10月27日 星期二
10.27 Week 4 Getting and Cleaning Data Note
Here's some useful functions learned from the video lectures:
tolower(): change character to lower case
sub(): subtitute out certain characters
grep(), grepl() (return logical value)
stringr package: nchar(), substr(), str_trim() (remove extra space)
Principles: name of variables should avoid dots; variables with character values should be made into factor and be descriptive.
Metacharacter:
^ beginning of a line
$ end of a line
[] upper/lower case (e.g. [Bb]
[-] a range of letters and numbers (regardless of order)
^ in [] will indicate NOT in the indicated class.
tolower(): change character to lower case
sub(): subtitute out certain characters
grep(), grepl() (return logical value)
stringr package: nchar(), substr(), str_trim() (remove extra space)
Principles: name of variables should avoid dots; variables with character values should be made into factor and be descriptive.
Metacharacter:
^ beginning of a line
$ end of a line
[] upper/lower case (e.g. [Bb]
[-] a range of letters and numbers (regardless of order)
^ in [] will indicate NOT in the indicated class.
2015年10月23日 星期五
10.23 xlcFreeMemory from XLConnect package
Wow. Has been a while since the last post. But is doing super busy these days. Lots of data analysis.
Just found a package that is useful in the future for using R to handle big excel dataset: XLConnect.
original post: http://stackoverflow.com/questions/12476044/r-how-to-clear-memory-used-by-rjava
need both XLConnectJars and XLConnect packages to run the function xlcFreeMemory(), which clear the memory used by rJava.
Just found a package that is useful in the future for using R to handle big excel dataset: XLConnect.
original post: http://stackoverflow.com/questions/12476044/r-how-to-clear-memory-used-by-rjava
need both XLConnectJars and XLConnect packages to run the function xlcFreeMemory(), which clear the memory used by rJava.
2015年9月29日 星期二
2015年9月23日 星期三
9.23 Stuck with left/right handed orientation
http://www.grahamwideman.com/gw/brain/orientation/orientterms.htmToday I am stuck at this issue.
http://atonal.ucdavis.edu/lab_howto/fmri/image_orient_display.htm
has some nice diagram showing neurological and radiological orientation.
RadiologicaL (Right is Left)
NeurologicaL (Right is Not Left)
Be careful when I need to switch label (quoted from FSL forum):
Hi,
Why do you want to do this?
The use of labels is to enable people to rearrange their
data into different views, as seen by FSLView. But the
RL and LR change does not change anything for FSLView,
which always shows things with a consistent "handedness"
which is based on the same one as used by the template
images, shown as "radiological". You can change the
left-right order if you really want, but there is no real need
to do that with the current version of FSL. If you do it, it will
change your storage between the "radiological" and "neurological"
conventions. We explicitly stop this happening when using
the labels with fslswapdim in order to avoid this happening
as an unintentional side-effect.
So I simply recommend that you don't do this.
If you really want to _only_ swap LR, then you need to use the
"-x y z" syntax _and_ use fslorient -swaporient, to make sure
that the relationship between the data and the labels stays
consistent. If you do not run both commands then the left
and right labels will get mixed up with respect to what is
contained in the brain image which is bad.
Hope this makes it clearer.
All the best,
Mark
http://atonal.ucdavis.edu/lab_howto/fmri/image_orient_display.htm
has some nice diagram showing neurological and radiological orientation.
RadiologicaL (Right is Left)
NeurologicaL (Right is Not Left)
Be careful when I need to switch label (quoted from FSL forum):
Hi,
Why do you want to do this?
The use of labels is to enable people to rearrange their
data into different views, as seen by FSLView. But the
RL and LR change does not change anything for FSLView,
which always shows things with a consistent "handedness"
which is based on the same one as used by the template
images, shown as "radiological". You can change the
left-right order if you really want, but there is no real need
to do that with the current version of FSL. If you do it, it will
change your storage between the "radiological" and "neurological"
conventions. We explicitly stop this happening when using
the labels with fslswapdim in order to avoid this happening
as an unintentional side-effect.
So I simply recommend that you don't do this.
If you really want to _only_ swap LR, then you need to use the
"-x y z" syntax _and_ use fslorient -swaporient, to make sure
that the relationship between the data and the labels stays
consistent. If you do not run both commands then the left
and right labels will get mixed up with respect to what is
contained in the brain image which is bad.
Hope this makes it clearer.
All the best,
Mark
2015年9月21日 星期一
9.21 Resource - Generalized Linear Mixed Effect Model (from NCSU)
About generalized linear mixed effect model, I have so much to learn.
http://www4.ncsu.edu/~fisik/Fikret%20Isik%20-%20Lecture%20notes%20for%20statistics%20session%20-%20IUFRO%20Genetics%20of%20Host-Parasite%20Interactions%20in%20Forestry%20-%202011.pdf
(9.30) Found another useful post about definitions of fixed effects and random effects:
http://andrewgelman.com/2005/01/25/why_i_dont_use/
http://www4.ncsu.edu/~fisik/Fikret%20Isik%20-%20Lecture%20notes%20for%20statistics%20session%20-%20IUFRO%20Genetics%20of%20Host-Parasite%20Interactions%20in%20Forestry%20-%202011.pdf
(9.30) Found another useful post about definitions of fixed effects and random effects:
http://andrewgelman.com/2005/01/25/why_i_dont_use/
2015年9月16日 星期三
9.16 New questions
How to perform statistics on longitudinal data, for paired or unpaired test?
How to analyze the data, in the whole ROI or in clusters?
How to analyze the data, in the whole ROI or in clusters?
2015年9月2日 星期三
9.2 Why a paired test is better than unpaired
Because you always groups things by observation. You assume the difference is due to the observation, but there may be confounders that you simply has not observed. The observation can be randomly perceiving only one factor but ignore other factors. Things are related, that one factor may change other factors therefore affect the subject. To control other factors, the best thing can do is use the same subject and control its environment, giving one factor (that one can perceive as one factor) at a time.
2015年8月30日 星期日
8.28 Coursera Regression Models Project Comments
Apparently, I don't think that I can get full credits on this subject because the uncertainty is not fully explained by giving confidence intervals. However, these comments do help me to improve and remind me keep learning regression models.
peer 1 →
I believe the student did a good job in selecting the key predictors.
Also the student provided a good explanation of how the variables
selected and how the final model was derived. I believe the student
left out some key steps but I can he did not want to have more than six
pages.
The last sentence makes me laugh. But I need to learn how to create pdf within page limits. My way of compiling pdf is not the best.
peer 2 →
Strengths: Nice structure and analysis. I came up with the same
conclusions via slightly different methods.
Weaknesses: It doesn't appear that you used knitr to generate this PDF.
The code appears to be more of a word document format rather than output
of a ".Rmd" processed by knitr into a PDF as required. Additionally,
you omitted a bunch of steps on how you arrived at your optimal model.
I should include key steps, and probably I need to take "Reproducible Research" to learn how to make pdf using knitr.
peer 3 →
I think the final model suffered a little by focusing on r^2 and the
p-values. I think that testing the assumptions behind linear regression
should also be considered because the final model has a few violations
with unequal variance of the error term and a non-normal distribution. A
really good thing I saw is that the final model doesn't appear to have
any issues with multi-collinearity, which was a big issue not addressed
in the other 3 projects I saw.
This is the most specific comments of the three. It points out some weaknesses of my strategy, and I believe the person has knowledge in math more than I do.
8.28 Coursera Exploratory Analysis Project Comments
Motor vehicles include more than just motorcycles. As a data scientist, one should always try to answer the question correctly.
peer 1 → I see no issues. Full marks given.
peer 2 →
Nice use of colouring and graphics in general. However, the student has
assumed that "motor vehicles" are the same as "motor cycles", and this
is not the case, so I subtracted 1 point because of this.
peer 3 →
Overall, work is good, with the exception that the two plots on
motor-vehicle sources use only motorcycle data. Though discussion on the
forums made it clear that there could be several interpretations of
motor vehicles, I think it is quite clear that more than just
motorcycles should be included, especially since the motorcycle-only
emissions represent a tiny fraction of the total motor vehicle emissions
(if defined as on-road emissions). The code does appear to produce the
plots, so I gave credit for that, but deducted points on the plots since
they don't tell enough of the story to meaningfully address the motor
vehicle question in my opinion.
peer 4 → [This area was left blank by the evaluator.]
2015年8月29日 星期六
8.28 Why everything crams up again
I need to archive everything. My student laptop, my email account, there are so many files!!
The life is treating me frreeaking well.
The life is treating me frreeaking well.
8.28 Coursera Regression Models Quiz 2 Question 9
Question 9 states the following:
Refer back to the mtcars data set with mpg as an outcome and weight (wt) as the predictor. About what is the ratio of the the sum of the squared errors, ∑ni=1(Yi−Y^i)2 when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?
I am confused about the term numerator and denominator.
Michele Coleman's comment:
Yukai: the use of "numerator" and "denominator" are very specific to this question and not applicable to regression models in general.
In this question the word denominator refers to the phrase "a model with just an intercept" and the word numerator refers to the phrase "the model with the intercept and slope".
The question is asking you to compare two different models by calculating the ratio of the sum of squared errors. You are asked to find A / B, where A = the sum of squared errors for model 1. B = the sum of squared errors for model 2, where model 1 has a slope and an intercept and model 2 has just an intercept.
A/B
= (the sum of squared errors for model 1) / (the sum of squared errors for model 2)
= the sum of squared errors for the model with an intercept and slope) / (the sum of squared errors for the model with just an intercept)
Refer back to the mtcars data set with mpg as an outcome and weight (wt) as the predictor. About what is the ratio of the the sum of the squared errors, ∑ni=1(Yi−Y^i)2 when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?
I am confused about the term numerator and denominator.
Michele Coleman's comment:
Yukai: the use of "numerator" and "denominator" are very specific to this question and not applicable to regression models in general.
In this question the word denominator refers to the phrase "a model with just an intercept" and the word numerator refers to the phrase "the model with the intercept and slope".
The question is asking you to compare two different models by calculating the ratio of the sum of squared errors. You are asked to find A / B, where A = the sum of squared errors for model 1. B = the sum of squared errors for model 2, where model 1 has a slope and an intercept and model 2 has just an intercept.
A/B
= (the sum of squared errors for model 1) / (the sum of squared errors for model 2)
= the sum of squared errors for the model with an intercept and slope) / (the sum of squared errors for the model with just an intercept)
2015年8月20日 星期四
8.20 Some posts on FSL forum (recommended from Donna)
August 20:
AW: [FSL] AW: [FSL] FDR correction in TBSS
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1106&L=fsl&P=R40394&1=fsl&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4
Re: TFCE-randomise and number of permutations
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0803&L=fsl&P=R41695&1=fsl&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4
August 18:
Re: fslstats vs fslmeants
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=fsl&P=R16798&1=fsl&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4
Re: Extracting the actual FA values from TBSS results
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1408&L=fsl&P=R80535&1=fsl&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4
*** Re: how to get FA value from tbss fslmeants (and labels)
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1508&L=fsl&P=R78969&1=fsl&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4
Re: how to calculate FA values (** inside of a mask)
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1301&L=fsl&P=R145712&1=fsl&9=A&I=-3&J=on&K=4&d=No+Match%3BMatch%3BMatches&z=4
** How to visualize JHU labels in FSLVIEW and how to 'merge' with TBSS skeleton
AW: [FSL] Calculating the skeleton white matter mean FA
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=fsl&P=R60755&1=fsl&9=A&I=-3&J=on&K=2&d=No+Match%3BMatch%3BMatches&z=4
Re: FA values for each voxel
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=fsl;e892de49.1508
August 17:
Re: how to extract FA and MD values after running TBSS
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1308&L=fsl&P=R8552&1=fsl&9=A&J=on&d=No+Match%3BMatch%3BMatches&z=4
Re: about TBSS results
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=fsl&P=R34948&1=fsl&9=A&J=on&d=No+Match%3BMatch%3BMatches&z=4
Re: Display TBSS results
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1010&L=fsl&P=R23801&1=fsl&9=A&J=on&d=No+Match%3BMatch%3BMatches&z=4
Calling an atlas in FSL:http://brainder.org/2012/07/30/automatic-atlas-queries-in-fsl/
AW: [FSL] AW: [FSL] FDR correction in TBSS
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1106&L=fsl&P=R40394&1=fsl&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4
Re: TFCE-randomise and number of permutations
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0803&L=fsl&P=R41695&1=fsl&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4
August 18:
Re: fslstats vs fslmeants
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=fsl&P=R16798&1=fsl&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4
Re: Extracting the actual FA values from TBSS results
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1408&L=fsl&P=R80535&1=fsl&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4
*** Re: how to get FA value from tbss fslmeants (and labels)
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1508&L=fsl&P=R78969&1=fsl&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4
Re: how to calculate FA values (** inside of a mask)
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1301&L=fsl&P=R145712&1=fsl&9=A&I=-3&J=on&K=4&d=No+Match%3BMatch%3BMatches&z=4
** How to visualize JHU labels in FSLVIEW and how to 'merge' with TBSS skeleton
AW: [FSL] Calculating the skeleton white matter mean FA
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=fsl&P=R60755&1=fsl&9=A&I=-3&J=on&K=2&d=No+Match%3BMatch%3BMatches&z=4
Re: FA values for each voxel
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=fsl;e892de49.1508
August 17:
Re: how to extract FA and MD values after running TBSS
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1308&L=fsl&P=R8552&1=fsl&9=A&J=on&d=No+Match%3BMatch%3BMatches&z=4
Re: about TBSS results
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=fsl&P=R34948&1=fsl&9=A&J=on&d=No+Match%3BMatch%3BMatches&z=4
Re: Display TBSS results
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1010&L=fsl&P=R23801&1=fsl&9=A&J=on&d=No+Match%3BMatch%3BMatches&z=4
Calling an atlas in FSL:http://brainder.org/2012/07/30/automatic-atlas-queries-in-fsl/
2015年8月19日 星期三
8.19 Reshape Matrix in R
http://www.sixhat.net/how-to-reshape-data-in-r.html
Basically, using the matrix() call specifying the dimension you want, and the output will be simply reshaped!
Basically, using the matrix() call specifying the dimension you want, and the output will be simply reshaped!
> a <- c(1, 2, 3, 4, 5, 6) > matrix(a, 2, 3) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 > matrix(a) [,1] [1,] 1 [2,] 2 [3,] 3 [4,] 4 [5,] 5 [6,] 6
2015年8月17日 星期一
8.17 GGPlot Cheat Sheet
Good resource to check now and then - credited from Coursera TA Al Warren:
http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/
http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/
2015年8月16日 星期日
8.16 Coursera - Multiple R-squared and Adjusted R-squared
This is part of the discussion in the forum of regression model. I will paste the response here in case later the class is finished and the forum will be closed.
Michele's response:
Multiple R-squared (sometimes just called R-squared) *always* will go up when you add more variables! The model with the added variables will better match the training data. The multiple R-squared is a measure of how good the fit is on the training data. But probably your training data has some spurious signals and coincidences that are not true features of the underlying population of interest.
Since multiple R-squared always gets higher when you include a new variable you cannot use it to decide if including that new variable is a good idea. If the new variable actually isn't related to your outcome you are just adding noise and "overfitting" your model.
The adjusted R-squared attempts to compensate for this by adding a "penalty" for including the extra variable. If the adjusted R-squared goes down when you include a new variable that suggests that it is not valuable to keep that variable in your model.
I am not a statistics expert so I will not try to explain how the adjusted R-square accomplishes this magic. My explanation would probably be not wholly correct.
Michele's response:
Multiple R-squared (sometimes just called R-squared) *always* will go up when you add more variables! The model with the added variables will better match the training data. The multiple R-squared is a measure of how good the fit is on the training data. But probably your training data has some spurious signals and coincidences that are not true features of the underlying population of interest.
Since multiple R-squared always gets higher when you include a new variable you cannot use it to decide if including that new variable is a good idea. If the new variable actually isn't related to your outcome you are just adding noise and "overfitting" your model.
The adjusted R-squared attempts to compensate for this by adding a "penalty" for including the extra variable. If the adjusted R-squared goes down when you include a new variable that suggests that it is not valuable to keep that variable in your model.
I am not a statistics expert so I will not try to explain how the adjusted R-square accomplishes this magic. My explanation would probably be not wholly correct.
2015年8月13日 星期四
2015年8月6日 星期四
8.6 Programming is such an essential working skill
Since learning MATLAB and later on R, my capability of improving working efficiency has greatly strengthened. But I believe that what I have used is just a tiny portion of it. The true power is, well, I don't know yet.
The core of programming is to identify patterns which are repeatable. Though everything differ from each other a little bit, by forming assumptions, certain things can be grouped together. So start with simple model first, then adding extra stuff to it to let it become more flexible and specific.
By learning writing R codes and turn it to be executable using ssh command lines, I should be able to modify my previous codes to design the matrix and generate summary using that.
I should also coded up the part of typing in a bunch of cases to put them in the mytbss folder. What I need in essence is just a list of scancode. I don't have to going back and forth to edit the directory every time. Just using the list of scancode, with task specifying (e.g. -m cp ), expected rename pattern (-n s/ns/whatever), I can get a txt file with all the cases to be copied, and what I need to do is just copy and paste and there we go!
So my focus is not spending to much time with that part. With the help of codes, I will be able to focus on more important stuff.
# Can't wait. I just draft it here.
tbsscp <- function(scancode,group) {
cbind(paste(group, scancode, ".nii", sep = ""))
}
scancode <- c("CCS001-1a11", "CCS002-1a22", "CCS003-1a33")
group <- "ns"
# Refresh: http://stackoverflow.com/questions/2470248/write-lines-of-text-to-a-file-in-r
tbsscplist <- file("tbsscplisttest.txt")
writeLines(tbsscp(scancode, group), tbsscplist)
close(tbsscplist)
The core of programming is to identify patterns which are repeatable. Though everything differ from each other a little bit, by forming assumptions, certain things can be grouped together. So start with simple model first, then adding extra stuff to it to let it become more flexible and specific.
By learning writing R codes and turn it to be executable using ssh command lines, I should be able to modify my previous codes to design the matrix and generate summary using that.
I should also coded up the part of typing in a bunch of cases to put them in the mytbss folder. What I need in essence is just a list of scancode. I don't have to going back and forth to edit the directory every time. Just using the list of scancode, with task specifying (e.g. -m cp ), expected rename pattern (-n s/ns/whatever), I can get a txt file with all the cases to be copied, and what I need to do is just copy and paste and there we go!
So my focus is not spending to much time with that part. With the help of codes, I will be able to focus on more important stuff.
# Can't wait. I just draft it here.
tbsscp <- function(scancode,group) {
cbind(paste(group, scancode, ".nii", sep = ""))
}
scancode <- c("CCS001-1a11", "CCS002-1a22", "CCS003-1a33")
group <- "ns"
# Refresh: http://stackoverflow.com/questions/2470248/write-lines-of-text-to-a-file-in-r
tbsscplist <- file("tbsscplisttest.txt")
writeLines(tbsscp(scancode, group), tbsscplist)
close(tbsscplist)
8.6 I should probably start thinking about investing fitness
Standard Membership
• Access to Mission Bay or Parnassus fitness center
• Workout towel service
• Discounted parking: early morning, evening, and weekends
• Exclusive member activities
• Members-only pricing on massage, personal training, swim lessons, and other programs
• 10% discount at Sports Basement
Enrollment Fee: $50
Mission Bay: $52/month
Parnassus: $49/month
- See more at: http://www.campuslifeservices.ucsf.edu/fitnessrecreation/information/program_index/membership_ucsf_staff#sthash.NChIqh6O.dpuf
Oops. They thought I want some more information about the standard membership. I should have replied using my email address haha.
• Access to Mission Bay or Parnassus fitness center
• Workout towel service
• Discounted parking: early morning, evening, and weekends
• Exclusive member activities
• Members-only pricing on massage, personal training, swim lessons, and other programs
• 10% discount at Sports Basement
Enrollment Fee: $50
Mission Bay: $52/month
Parnassus: $49/month
- See more at: http://www.campuslifeservices.ucsf.edu/fitnessrecreation/information/program_index/membership_ucsf_staff#sthash.NChIqh6O.dpuf
Oops. They thought I want some more information about the standard membership. I should have replied using my email address haha.
8.6 Coursera Biostat Quiz 2 Question 6
Let X be a uniform random variable with support of length 1, but where we don't know where it starts. So that the density is
f(x)=1
for x∈(θ,θ+1) and 0 otherwise. We observe a random variable from this distribution, say x1 . What does the likelihood look like?
Discussion Forum posts:
Eric:
You don't need . I think of the likelyhood function () as a probability density function for the parameter given the data .
Often, you can use Bayes' theorem to go from to . (still hard to see how Bayes' theorem relate with this content) For this problem though, I think it just takes wrapping your head around the likelyhood function concept. Only one of the multiple choice answers makes sense.
I'll post the answer explanation on Monday, after the deadline.
Michelle:
You don't need $$F(x)$$. You can think of the ordinary density function $$f(x)$$ as actually a function of $$f(x,\theta)$$ (This is perfect, to look at the function with an extra parameter). Usually we treat $$\theta$$ as known and then ask what the density is at different values of $$x$$. A normal density plot has $$x$$ on the horizontal axis and $$f(x,\theta)$$ on the vertical axis, for a single, constant value of $$\theta$$ .
For this problem instead think of the plot with $$\theta$$ on the horizontal axis and $$f(x,\theta)$$ on the vertical axis, for a single constant value of $$x$$.
My approach was that I just assigned a value to x, such as $$x=2.5$$. Then I made a table with different values of $$\theta$$ in one column and then $$f(x,\theta)$$ in another column. I calculated half a dozen values, then made my plot of $$f(x,\theta)$$ vs. $$\theta$$.
- A horizontal line between x1 and x1 - 1
- A parabola
- A diagonal line from x1 to x1 + 1
- A point at x1
Discussion Forum posts:
Eric:
You don't need . I think of the likelyhood function () as a probability density function for the parameter given the data .
Often, you can use Bayes' theorem to go from to . (still hard to see how Bayes' theorem relate with this content) For this problem though, I think it just takes wrapping your head around the likelyhood function concept. Only one of the multiple choice answers makes sense.
I'll post the answer explanation on Monday, after the deadline.
Michelle:
You don't need $$F(x)$$. You can think of the ordinary density function $$f(x)$$ as actually a function of $$f(x,\theta)$$ (This is perfect, to look at the function with an extra parameter). Usually we treat $$\theta$$ as known and then ask what the density is at different values of $$x$$. A normal density plot has $$x$$ on the horizontal axis and $$f(x,\theta)$$ on the vertical axis, for a single, constant value of $$\theta$$ .
For this problem instead think of the plot with $$\theta$$ on the horizontal axis and $$f(x,\theta)$$ on the vertical axis, for a single constant value of $$x$$.
My approach was that I just assigned a value to x, such as $$x=2.5$$. Then I made a table with different values of $$\theta$$ in one column and then $$f(x,\theta)$$ in another column. I calculated half a dozen values, then made my plot of $$f(x,\theta)$$ vs. $$\theta$$.
2015年8月4日 星期二
8.4 Interweave in dataframe in R
# Interweave in dataframe
# Useful Link: http://stackoverflow.com/questions/14784861/interweave-two-data-frames-in-r
a = data.frame(x=1:5, y=5:1)
b = data.frame(x=2:6, y=4:0)
# Interweave rows
df <- rbind(data.frame(a, index = 1:nrow(a)), data.frame(b, index = 1:nrow(b)))
df <- df[order(df$index), c("x", "y")]
# Interweave columns
dfc <- cbind(rbind(a, 1:ncol(a)), rbind(b, 1:ncol(b)))
dfc[order(dfc[6,])]
# Try: interweave columns using names of the columns
dfc[order(names(dfc))] # This is faster and single line call!
# Useful Link: http://stackoverflow.com/questions/14784861/interweave-two-data-frames-in-r
a = data.frame(x=1:5, y=5:1)
b = data.frame(x=2:6, y=4:0)
# Interweave rows
df <- rbind(data.frame(a, index = 1:nrow(a)), data.frame(b, index = 1:nrow(b)))
df <- df[order(df$index), c("x", "y")]
# Interweave columns
dfc <- cbind(rbind(a, 1:ncol(a)), rbind(b, 1:ncol(b)))
dfc[order(dfc[6,])]
# Try: interweave columns using names of the columns
dfc[order(names(dfc))] # This is faster and single line call!
8.4 Compare two dataframes + Only focus on the differences
# Find elements that are not identical in two dataframes
# Useful link: http://stackoverflow.com/questions/3171426/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in
library(xlsx)
library(proto)
library(sqldf)
library(compare)
x <- c(1,2,3,4,5)
y1 <- c(1,2,3,4,5)
y2 <- c(1,2,3,4,6)
d1 <- data.frame(x, y1)
d2 <- data.frame(x, y2)
compare(d1,d2)
# SQLDF solution
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])
require(sqldf)
a1NotIna2 <- sqldf('SELECT * FROM a1 EXCEPT SELECT * FROM a2')
a1Ina2 <- sqldf('SELECT * FROM a1 INTERSECT SELECT * FROM a2')
# DPLYR solution
library(dplyr)
anti_join(a1,a2)
semi_join(a1,a2)
full_join(a1,a2)
(8.6 update) To only focus on the columns containing different elements instead of viewing a bunch of identical columns (such as ages, education, you know they will all be the same and less likely to be different), the trick is to add an extra row below the original dataframe which is a logical value, indicating if the two columns are identical. Can use idential() function in R to do that. Then, it is just a matter of dataframe subsetting.
Pretty cool. Huh?
# Useful link: http://stackoverflow.com/questions/3171426/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in
library(xlsx)
library(proto)
library(sqldf)
library(compare)
x <- c(1,2,3,4,5)
y1 <- c(1,2,3,4,5)
y2 <- c(1,2,3,4,6)
d1 <- data.frame(x, y1)
d2 <- data.frame(x, y2)
compare(d1,d2)
# SQLDF solution
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])
require(sqldf)
a1NotIna2 <- sqldf('SELECT * FROM a1 EXCEPT SELECT * FROM a2')
a1Ina2 <- sqldf('SELECT * FROM a1 INTERSECT SELECT * FROM a2')
# DPLYR solution
library(dplyr)
anti_join(a1,a2)
semi_join(a1,a2)
full_join(a1,a2)
(8.6 update) To only focus on the columns containing different elements instead of viewing a bunch of identical columns (such as ages, education, you know they will all be the same and less likely to be different), the trick is to add an extra row below the original dataframe which is a logical value, indicating if the two columns are identical. Can use idential() function in R to do that. Then, it is just a matter of dataframe subsetting.
Pretty cool. Huh?
2015年8月3日 星期一
8.3 Writing R that can be run in UNIX command lines
#! /usr/bin/env Rscript
# Useful website: http://stackoverflow.com/questions/2151212/how-can-i-read-command-line-parameters-from-an-r-script
options(echo=FALSE) # if you don't want to see commands in output file
args <- commandArgs(trailingOnly = TRUE)
# trailingOnly=TRUE means that only your arguments are returned, check:
# print(commandsArgs(trailingOnly=FALSE))
cat("Hello", args, "\n")
Return in UNIX:
# Useful website: http://stackoverflow.com/questions/2151212/how-can-i-read-command-line-parameters-from-an-r-script
options(echo=FALSE) # if you don't want to see commands in output file
args <- commandArgs(trailingOnly = TRUE)
# trailingOnly=TRUE means that only your arguments are returned, check:
# print(commandsArgs(trailingOnly=FALSE))
cat("Hello", args, "\n")
Return in UNIX:
>YukaideMacBook-Air:Desktop yukaizou$ ./script.R Kai
>Hello Kai
Alternatively,
>YukaideMacBook-Air:Desktop yukaizou$ Rscript script.R Kai
>Hello Kai
Another useful website here: http://www.cureffi.org/2014/01/15/running-r-batch-mode-linux/
It also shows how to use the optparse package to make options. That guy is a genius. I might follow the blog for some other materials.
# 8.6 update: learning how to add options
library(getopt)
library(optparse)
options(echo=FALSE) # if you don't want to see commands in output file
args <- commandArgs(trailingOnly = TRUE)
# trailingOnly=TRUE means that only your arguments are returned, check:
# print(commandsArgs(trailingOnly=FALSE))
option_list <- list(make_option(c("-a", "--avar"), action = "store", default = NA, type = "character",
help = "just a variable name"),
make_option(c("-v", "--verbose"), action = "store_true", default = TRUE,
help = "Should the program print the extra stuff out? [default %default]"),
make_option(c("-q", "--quiet"), action = "store_false", dest = "verbose",
help = "Make the program not verbose." )
)
opt <- parse_args(OptionParser(option_list = option_list))
if (opt$v) {
cat("Haha, you might think this is an verbose. Quiet me by adding a little -q!", "\n")
}
if (!is.na(opt$avar) ) {
cat("You have just typed in:", "\n")
cat(opt$avar, "\n")
} else {
cat("Please specify variable a", "\n")
}
Another useful website here: http://www.cureffi.org/2014/01/15/running-r-batch-mode-linux/
It also shows how to use the optparse package to make options. That guy is a genius. I might follow the blog for some other materials.
# 8.6 update: learning how to add options
library(getopt)
library(optparse)
options(echo=FALSE) # if you don't want to see commands in output file
args <- commandArgs(trailingOnly = TRUE)
# trailingOnly=TRUE means that only your arguments are returned, check:
# print(commandsArgs(trailingOnly=FALSE))
option_list <- list(make_option(c("-a", "--avar"), action = "store", default = NA, type = "character",
help = "just a variable name"),
make_option(c("-v", "--verbose"), action = "store_true", default = TRUE,
help = "Should the program print the extra stuff out? [default %default]"),
make_option(c("-q", "--quiet"), action = "store_false", dest = "verbose",
help = "Make the program not verbose." )
)
opt <- parse_args(OptionParser(option_list = option_list))
if (opt$v) {
cat("Haha, you might think this is an verbose. Quiet me by adding a little -q!", "\n")
}
if (!is.na(opt$avar) ) {
cat("You have just typed in:", "\n")
cat(opt$avar, "\n")
} else {
cat("Please specify variable a", "\n")
}
8.3 Such a low efficiency
It's August already.
Such a low efficiency now. It's 3 p.m.. What am I doing at VA. Waiting for a bunch of cases to be archived... Don't have motivations to do some other things.
Done Quiz 1, Quiz 2, and Project 1 of Exploratory Data Analysis, still working on Project 2 but that is ~40% complete. Have more time working on Regression Model and Biostat.
Such a low efficiency now. It's 3 p.m.. What am I doing at VA. Waiting for a bunch of cases to be archived... Don't have motivations to do some other things.
Done Quiz 1, Quiz 2, and Project 1 of Exploratory Data Analysis, still working on Project 2 but that is ~40% complete. Have more time working on Regression Model and Biostat.
2015年7月28日 星期二
7.28 Learning handling dataframe by sqldf in R
http://anythingbutrbitrary.blogspot.com/2012/08/manipulating-data-frames-using-sqldf.html
One more tip:
> s01 <- sqldf("select * from data where [group.code] = 'a'")
# The bracket avoids misunderstanding of dot in sqldf. The dot has another meaning
# in SQL (e.g., separating table name from column name) and is replaced by an
# underscore before sending the data to SQLite.
# URL: http://stackoverflow.com/questions/19019883/how-to-handle-column-names-not-supported-by-sqldf-in-r
- A useful blog for learning using sqldf to conduct query in R.
One more tip:
> s01 <- sqldf("select * from data where [group.code] = 'a'")
# The bracket avoids misunderstanding of dot in sqldf. The dot has another meaning
# in SQL (e.g., separating table name from column name) and is replaced by an
# underscore before sending the data to SQLite.
# URL: http://stackoverflow.com/questions/19019883/how-to-handle-column-names-not-supported-by-sqldf-in-r
2015年7月27日 星期一
7.27 R coding - Don't forget the cached input <<-
When using "<<-" instead of "<-", the variable is cached and can be used outside the function. This will be useful especially when one wants to execute a bunch of function within a single call.
2015年7月24日 星期五
7.24 Demean Categorical Variable, Seriously?
What if the categorical variables containing multiple values which are completely irrelevant?
2015年7月23日 星期四
7.23 But we need to figure out how to use GLM do to that first.
As the title indicated. This is taking forever. GLM is so slow.
And there are two websites which is ridiculous. A few complaints here just releasing the stress.
To adjust for multiple covariates, simply add more EVs to the model, one for each additionally covariate and mean center each covariate.
And there are two websites which is ridiculous. A few complaints here just releasing the stress.
- http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FEAT/UserGuide#Unpaired_Two-Group_Difference_.28Two-Sample_Unpaired_T-Test.29
- http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/GLM#Two-Group_Difference_Adjusted_for_Covariate
To adjust for multiple covariates, simply add more EVs to the model, one for each additionally covariate and mean center each covariate.
7.23 Using gedit to create specific design.con and design.mat
This link from UTexas is very helpful:
http://wikis.la.utexas.edu/imagelab/book/statistical-analysis-fa-values
/NumContrasts 2
/PPheights 1 1
/Matrix
1 -1
-1 1
Just open gedit on ssh shell and modify these parameters when later on need to add covariate into the statistical analysis.
Another way is using R to do that, which requires additional step (import the data, re-combine the rows with attention paying to the datatype, then export it (which I still not figure out)). Anyway, gedit is super fast.
http://wikis.la.utexas.edu/imagelab/book/statistical-analysis-fa-values
- The design.mat files looks like this:
/NumWaves 2
/NumPoints 4
/PPheights 1 1
/Matrix
0 1
1 0
1 0
0 1
/NumPoints 4
/PPheights 1 1
/Matrix
0 1
1 0
1 0
0 1
- The design.con file will look like this:
/NumContrasts 2
/PPheights 1 1
/Matrix
1 -1
-1 1
Just open gedit on ssh shell and modify these parameters when later on need to add covariate into the statistical analysis.
Another way is using R to do that, which requires additional step (import the data, re-combine the rows with attention paying to the datatype, then export it (which I still not figure out)). Anyway, gedit is super fast.
2015年7月21日 星期二
7.21 Add pagebreak in Rmd for LaTeX conversion
Very technical detail but useful to learn. Thanks for the midterm from CMU.
http://www.stat.cmu.edu/~nmv/setup/stat202/tests/midterm-student-version.Rmd
Simply adding a line "\pagebreak" will introduce a page break for later on converting to a LaTeX pdf file. Remember to leave a line with the latter contents.
And just figured out that I can make system call in R (system()). This is so cool, lazy boy.
http://www.stat.cmu.edu/~nmv/setup/stat202/tests/midterm-student-version.Rmd
Simply adding a line "\pagebreak" will introduce a page break for later on converting to a LaTeX pdf file. Remember to leave a line with the latter contents.
And just figured out that I can make system call in R (system()). This is so cool, lazy boy.
7.21 VA Research - Add Covariate in TBSS
In the coming future, it is very likely that I will be referencing this website for performing statistics on my master thesis, so I am going to mark it down here.
http://white.stanford.edu/newlm/index.php/MrVista_TBSS
http://white.stanford.edu/newlm/index.php/MrVista_TBSS
2015年7月20日 星期一
7.20 VA Self Study - Coursera and MRI
Today I watched the first video lectures of the third week for Statistical Inference class, which is about Statistical Power. Better to visualize the concept by two side-by-side Gaussian distributions.
(Nishimura p.51) T2 relaxation: The pertinent frequency components of the fluctuating z fields are those near zero. These components contribute only to the T2 relaxation process and not to T1. Therefore T2 is largely independent of field strength.
In solids, the presence of relatively slow fluctuations creates a large resonant frequency broadening effect and thus extremely rapid T2 decay. In biological tissue, spins bound to lumbering macromolecules also experience rapid T2 decay while relatively mobile spins (e.g. those in water) exhibit much slower decay. However, because of rapid exchange of protons between these two types of environments, the effective T2 is some average of the two relaxation rates.
(Nishimura p.51) T2 relaxation: The pertinent frequency components of the fluctuating z fields are those near zero. These components contribute only to the T2 relaxation process and not to T1. Therefore T2 is largely independent of field strength.
In solids, the presence of relatively slow fluctuations creates a large resonant frequency broadening effect and thus extremely rapid T2 decay. In biological tissue, spins bound to lumbering macromolecules also experience rapid T2 decay while relatively mobile spins (e.g. those in water) exhibit much slower decay. However, because of rapid exchange of protons between these two types of environments, the effective T2 is some average of the two relaxation rates.
2015年7月19日 星期日
7.19 Coursera Statistical Inference - Frustration from Compiling
Compiling. Compiling. Compiling again! Every time the little imperfection from converting one file to another using the automatic codes just make me annoyed and frustrated. But I learned something.
- Reduce figure sizes and fit it between bullets instead of popping up at random spots. This proves to be very hard for several reasons:
- Rmd file is built primarily for HTML file conversion and is best at adjusting figure size for HTML file specifically. Things get messy when it comes to LaTeX pdf.
- In Rmd file, adjusting the size for .png file is at best when using the fig.width and fig.length parameters. LaTeX prefers pdf figures.
- Letting Rmd file do plotting by inserting codes is probably worse than inserting figures from other places.
7.19 Coursera Statistical Inference Project 1
Everyday I learnt something new. Mission Hall study on Sunday is a funny adventure.
Get familiar with Rmd and pandoc convertion to LaTeX file is like picking up new languages.
Get familiar with Rmd and pandoc convertion to LaTeX file is like picking up new languages.
Adding a -V flag after pandoc can specify the margin size of the pdf you want. Greatly reduced the report pages. Next is thinking about how to reduce the freaking figure size. They looked so horrible.
Oops and thanks these guys. http://stackoverflow.com/questions/13515893/set-margin-size-when-converting-from-markdown-to-pdf-with-pandoc
2015年7月17日 星期五
7.17 Successfully launched TBSS processing
Today I did my first TBSS trial with success. It was a thrill. Professor and Dr.Murray are also very excited that this is a significant progress.
Once gathering sufficient amount of data, I can start getting results right away. So now let's just be patient and wait for the fixing of those terrible network and database!
TBSS, master thesis, you can do it Kai!
Once gathering sufficient amount of data, I can start getting results right away. So now let's just be patient and wait for the fixing of those terrible network and database!
TBSS, master thesis, you can do it Kai!
2015年7月16日 星期四
7.16 How do T1 and T2 values vary as a function of field strength?
http://www.mri-q.com/bo-effect-on-t1--t2.html
Quick fact: for typical range of field strengths used for clinical MRI (0.3T - 3.0 T), T1 approximatly doubles, and T2 is relatively unchanged.
Some mechanisms of T2 relaxation (such as chemical exchange and molecular diffusion) may actually be more efficient at higher fields and therefore cause a reduction in T2 values.
Quick fact: for typical range of field strengths used for clinical MRI (0.3T - 3.0 T), T1 approximatly doubles, and T2 is relatively unchanged.
Some mechanisms of T2 relaxation (such as chemical exchange and molecular diffusion) may actually be more efficient at higher fields and therefore cause a reduction in T2 values.
7.16 VA Reading - NARR_Module3E_CTSI_Diffusion_Imaging_2014
In DWI, 3 gradient directions allow identifying CNS abnormalities such as cerebral ischemia, TBI, tumors
With more gradient directions, identifying white matter pathology including axonal injury and/or myelin damage will be optimized
Tensor model cannot resolve fiber orientation across intersecting axes (crossing fibers) within the same voxel; high angular resolution diffusion imaging (HARDI) allows more complex models of diffusion and resolve this issue
Non-tensor models of diffusion:
With more gradient directions, identifying white matter pathology including axonal injury and/or myelin damage will be optimized
Tensor model cannot resolve fiber orientation across intersecting axes (crossing fibers) within the same voxel; high angular resolution diffusion imaging (HARDI) allows more complex models of diffusion and resolve this issue
Non-tensor models of diffusion:
- Diffusion spectrum (DSI): full distribution of orientation and magnitude
- Orientation distribution function (Q-ball): no magnitude info, only orientation
- Ball-and-stick: orientation and magnitude for up to N anisotropic compartments
- Tensor (DTI): single orientation and magnitude
訂閱:
文章 (Atom)