Instructor: Prof. Jaimie Kwon (Homepage)
Lecture: MW ScN207, 8:00-9:50 pm
Objectives: We will examine some modern, computer intensive statistical methods such as the bootstrap, resampling tests/permutation tests, Monte Carlo simulation, kernel density estimation and nonparametric regression methods (splines and local regression). We will study when individual methods are applicable and how to apply them to real world data in statistical computing environment R/S-Plus. If time permits, more recent methodological advances like Classification and Regression Tree (CART), neural networks and support vector machines (SVM) will also be discussed. The lecture will be a mixture of overview of theoretical backgrounds and (hands-on) demonstration of R implementation of the methods.
Policy on Projects (TBA)
Link: The R Project for Statistical Computing (http://www.r-project.org/)
Misc: There is "An Introduction to R" in 'Manuals' tab of the R webpage. The acrobat file is usually available with an R installation. Read at your leisure.
HW: Download R and install it in your machine (if you have one). You may use those installed at the computer lab. Using S-Plus is OK, but it may not be compatible with what I teach in the class.
HW: Read chapter 1 of bootstrap book.
ANN: Wed office hours are moved to Fri 10-11:30
Some of the following only if time permits:
Sec. 2.1 - 2.4
Sec 3.2, 3.4
Sec. 4.1 - 4.4
Sec. 5.1 - 5.3
Sec. 6.1 - 6.3
Sec. 7.1, 7.6
Sec. 8.1, 8.2
Most of Ch. 11
Some of the following only if time permits:
Ch. 1 to 4: Basics about the software S-plus/R. Have lots in common with 'Introduction
to R'
Sec. 5.1-7
Sec. 6.2, 6.6
Sec. 8.1, 8.2, 8.7, 8.10
Sec. 9.1, 9.3
Sec. 12.2, 12.4, 12.5
Sec. 16.1
HW: Read V&R Chapters 1-3, Chapter 5, especially 5.1-5.3.
Lecture note for week 1 and 2 + preview of week 3 (PDF)
There will be quiz on Wednesday.
1. Thanks for participating in the survey. Most gave correct :) answers of 'about right' and 'mostly'... but all comments were extremely helpful.
I notice many comment on the unorganized/jumpy style of the lecture which I couldn't agree with more. Please consider that aspect as inevitable byproduct of my 'testing the water' and I expect, as I'm done testing the water, the remaining three quarters will be more organized and streamlined.
2. Wednesday quiz will be in-class, open book and open note, 50 minute long one. If you went over the course note I posted and try out and understand all the codes in it, you won't have a problem doing it.
3. I apologize for tonight's the lock-out from computer lab once more and will try it once more toward the end of Wed class. I put the lab code and questions on the course website. Feel free to try it out at home but that's not a must.
Problems and take-homes. (PDF)
Templates for take-home submission (DOC)
* Please respond (if you don't agree) to my survey email regarding increasing take-home portion for Quiz #1.
Run all the codes above and
1) Cut-and-paste the R codes and results (R-outputs, plots) to an MS Word document
with name "last_name,first_name.doc". (e.g. "kwon, jaimie.doc")
2) you should attach it to an email with subject lines "stat 6601, quiz
1" (nothing else please; no message body)
3) Email it to me no later than Midnight of next Monday (10/18). Results
submitted later than then won't be graded.
4) Minimize comment in the MS Word document. Typical headings (title, date,
and author) and question numbers are enough. (Please keep in mind that I have
to go over 50 of them!) See the template available on the course website.
Quiz 1 take-home is due Monday Midnight!
We will cover parametric and nonparameteric bootstrap for the next two weeks. The draft note is availabe here. (PDF) The note is maximum we may cover. It's likely that I will talk on core concepts with little mathematical details and many illustrations.
Air Conditioning Data exmaple: complete R code
Quiz #1 solution (PDF)
Midterm Take Home will be assigned at the end of Monday (10/25) 's Class.
100% Take Home.
Due before the midnight of the following Tuesday (11/2).
Project Assignment on Wednesday: Collaboration. Diversify. Be adventurous!
Plan on ~30 minutes talk. Maximum of 6 power point slides. (excluding plots)
List of Topics: please sign up for one of:
Graphics (4.1~4.2)
Graphics (4.3~4.4)
Linear Statistical Models, An analysis of covariance example (6.1)
Linear Statistical Models, Model formulae (6.2)
Linear Statistical Models, Regression diagnostics (6.3)
Linear Statistical Models, Robust regression (6.5)
Linear Statistical Models, Bootstrapping linear models (6.6)
Non-linear and smooth regression, introductory example (8.1)
Non-linear and smooth regression, Fitting non-linear regression models (8.2)
Non-linear and smooth regression, One-dimensional curve fitting (8.7)
Non-linear and smooth regression, Neural networks (8.10)
Tree-based methods, partitioning methods (9.1)
Tree-based methods, imeplementation in tree (9.3)
Classification, nueral networks (12.4)
Classification, support vector machines (12.5)
Midterm Take Home (PDF) : Dataset for problem #2 ('corr.dat')
Templates for take-home submission (DOC)
The updated note on bootstrap. (PDF)
On Wednesday: We will have
a) project sign-up session
b) Q&A about midterm questions
Project assignment and tentative schedule of presentations are below. Remember to:
1. Make ~ 10 slides excluding plots. Template for project presentation + tips
(PPT) is available.
2. Plan on ~30 minute talk but adjust depending on the topic.
3. Email the slides + estimated duration of the presentation at least 2 days
before the actual presentation.
4. The presentation is about basic concepts and R implementation. Don't present
too much details or math.
5. You're free to use your own data as an example. Or just use the data in the
book.
6. If you're familiar with other softwares like SAS, feel free to do "comparative"
study. ("What advantages/disadvantages does R/S-Plus have compared to them?")
7. Don't be stressed out. If you are spending more than 6 hours for preparation,
you're working too much. If your group has that situation, let me know.
8. Seek help if you need one. Let me know by email or coming to office hours
if you or your group experience difficulty.
Good luck!
11/1
Pattern Classificationslide (PPT)
11/3
Midterm Solution (PDF) , completed R code (R) -
Data Mining slide (PPT)
11/8
Graphics (4.1~4.2) Katherine Moore, Terri Carroll, Jackie Shaffer (PPT)
Bioinformatics Overview, Dairian Wang (PPT)
More bioinformatic slides (PPT1, PPT2)
More on XGOBI :
./xgobi.bat data_xgobi/randu
11/10
Graphics (4.3~4.4) Ying Li, Qian Zhang, Xingyan Bai (PPT)
LM, An analysis of covariance example (6.1) Myrna Moreno, Gadir Marian
(PPT)
LM, Model formulae (6.2) Alvin Hsieh, Antonio Curtis, Wai Mak (PPT,
reading.txt, ritalin.txt)
11/15
LM, Regression diagnostics (6.3) Kathy Fung, Anthony Britt, Kai Koo
(PPT, data
info.txt, lesilie data.xls, projdata.csv)
LM, Robust regression (6.5) Leila Saberi, Mi Lam, Denise Hum (PPT)
11/17
LM, Bootstrapping linear models (6.6) Winnie Li, Ke Xu, Wenlai Wang
(PPT)
11/22
NLR&Smoothing, introductory example (8.1) Lin Zhang, Rommel Vives
(PPT)
NLR&Smoothing, Fitting non-linear regression models (8.2) Besse Nguyen,
Oymae Louie, Gina Piscitelli (PPT)
11/24
Final Take Home test (50%): DOC,
collisions.txt
Both MS Word file (via email) and printout are due the beginning of the in-class
final.f
11/29
NLR&Smoothing, One-dimensional curve fitting (8.7) Vivian Tam, Shuhong
Li, Wenli Li (PPT
XLS)
Tree: partitioning methods (9.1) Demeke Kasaw, Andreas Nguyen, Mariana
Alvaro (PPT)
Tree: imeplementation in tree (9.3) Yu Ye, Philip Wong, Bin Hu (PPT)
12/1
NLR&Smoothing, Neural networks (8.10) Xu Yang, Jing Wu, Haiou Wang
(PPT)
Classification, neural networks (12.4) Mable Kong, Gary Gongwer,
Madhu Iyer (PPT)
Classification, support vector machines (12.5) Joseph Rickert, Tim McKusick
(PPT , PPT2,
supplementary DOC)
December 6th, Monday, 8:00 pm to 9:50 pm (50% + Take-Home)
In-class final and solution (PDF)
Misc: Lab reservation:
Mondays - 7:00 PM - 7:50 PM JAIMYOUNG KWON STAT 3872-01
Mondays - 8:00 PM - 9:50 PM JAIMYOUNG KWON STAT 6601-01
Weds - 7:00 PM - 7:50 PM JAIMYOUNG KWON STAT 6872-01
Weds - 8:00 PM - 9:50 PM JAIMYOUNG KWON STAT 6601-01