Fork me on GitHub

Course Notes and Assignments

Fall 2015
Monday, Wednesdays 11:35 - 12:50
60 Sachem Street (Watson Center), Rm A60

Instructor: Taylor Arnold
E-mail: taylor.arnold@yale.edu
Office Hours: Wednesdays 13:30-15:00, Hillhouse 24, Rm 206
Teaching Assistant: Jason Klusowski
TA Session: Tuesdays, 19:00-20:30, 24 Hillhouse, Main Classroom
DateDescriptionResourcesReferences
2015-09-02 Course Introduction [Syllabus]
[Lecture 01]
  • Linear Transformations. [video]
2015-09-04 Simple linear models: MLEs and Gauss-Markov [Lecture 02]
[script02.R]
[Galton Pea Data]
  • RT 2.1-2.6
  • Jeffrey M. Stanton. "Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors". [article]
  • Intro to Simple Linear Models. [video]
  • Linear Transformations. [video]
  • Francis Galton (1889). Natural Inheritance. [paper]
2015-09-09 Simple linear models: hypothesis tests [Lecture 03]
[script03.R]
[Galton Heights Data]
  • RT 2.8,2.10
  • Francis Galton (1888). "Co-relations and their measurement, chiefly from anthropometric data". [paper]
  • William Sealy Gosset (1908). "The probable error of a mean". [paper]
2015-09-14 Multivariate linear models: Normal Equations [Lecture 04]
[script04.R]
  • FH 1.1,1.2
  • David Bindel. "Least squares: the big idea". [pdf slides]
2015-09-16 Geometric Properties of OLS [Lecture 05]
  • RT 3.3. FH 1.3
  • Geometry of least squares. [video]
  • Walter Sosa-Escudero. "OLS Geometry". [pdf slides]
2015-09-16 Problem Set #1 Due [Problem Set #1]
[Solution to problem #3]
[indicators.csv]
[glakes.csv]
[playbill.csv]
2015-09-21 Finite-Sample Properties of OLS [Lecture 06]
  • FH 1.3
  • Hiroki Tsurumi, "Gauss-Markov Theorem". [pdf notes]
  • Kao-Peng Chou and Jia-Chin Lin. "A Study of Cramér-Rao-Like Bounds". [pdf article]
2015-09-23 Hypothesis Testing under Normality [Lecture 07]
[script07.R]
  • FH 1.4, 1.5
  • Arne Hallam. "Some Theorems on Quadratic Forms and Normal Variables". [pdf notes]
  • Ricardo Mora. "Hypothesis Testing in Linear Regression Models". [pdf chapter]
2015-09-28 Measuring Airline On-time Performance [Lecture 08]
[2007.csv.bz2]
[airline2007_clean.Rds]
[airports.csv]
[carriers.csv]
[plane-data.csv]
[clean_asa_data.R]
[script08a.R]
[script08b.R]
  • ASA 2012 Flight Data [website]
  • David Smith. "Analysis of airline performance". [blog post]
2015-09-30 Prediction and Leverage with ASA Flight Data [Lecture 09]
[script09.R]
  • FH 1.3, RT 7.3
  • Thomas Leininger. "Confidence and Prediction Intervals for Simple Linear Models".[pdf slides]
2015-09-30 Problem Set #2 Due [Problem Set #2]
[Selected solutions]
[Latour.txt]
2015-10-05 Factor Contrasts and Hierarchical Linear Models [Lecture 10]
[script10.R]
  • UCLA Institute for Digital Research and Education. "R Library: Contrast Coding Systems for categorical variables". [html notes]
  • Heather Woltman, Andrea Feldstain, J. Christine MacKay, Meredith Rocchi. "An introduction to hierarchical linear modeling ". [pdf article]
2015-10-07 Weighted Least Squares and Review [Lecture 11]
[script11.R]
  • FH 1.6
2015-10-07 Problem Set #3 Due [Problem Set #3]
[R Solution Code]
[airline2007_pset03.Rds]
[clean_script.R]
2015-10-12 Midterm #1 [Midterm Questions]
[Answers and Grading]
2015-10-14 Logistic Regression [Lecture 12]
[script12.R]
  • Germán Rodríguez. "Logit Models for Binary Data". [pdf notes]
  • UCLA Institute for Digital Research and Education. "R Data Analysis Examples: Logit Regression". [html page]
2015-10-26 Guest Lecture: Jay Emerson
  • John Emerson and Silas Meredith. "Nationalistic Judging Bias in the 2000 Olympic Diving Competition". [pdf paper]
  • Emerson, John W. and Taylor Arnold. "Statistical Sleuthing by Leveraging Human Nature: A Study of Olympic Figure Skating". [pdf article]
2015-10-28 Solving Least Squares [Lecture 13]
[script13.R]
  • GVL, Ch 5
  • Do Q. Lee. "Numerically Efficent Methods for Solving Least Squares Problems". pdf notes
2015-10-28 Problem Set #4 Due [Problem Set #4]
[R Solution Code]
[airline2007_pset04.Rds]
2015-11-02 Singular Value Decomposition [Lecture 14]
[script14.R]
  • GVL, 2.7
  • Ben Harris. "Computing the Singular Value Decomposition". [video]
2015-11-04 Ridge Regression and PCR [Lecture 15]
[script15.R]
  • RT, 3.14
  • Tian-Tsong Ng, Shih-Fu Chang, Jessie Hsu, Martin Pepeljugoski. "Columbia Photographic Images and Photorealistic Computer Graphics Dataset." pdf paper
2015-11-09 Solving GLMs via IRWLS [Lecture 16]
[script16.R]
  • RT 10.4.1
  • Cosma Shalizi. "Generalized Linear Models and Generalized Additive Models". pdf chapter
2015-11-11 Intro to Lasso Regression [Lecture 17]
[script17.R]
  • BvdG, 2.1-2.4
  • Robert Tibshirani. "Regression Shrinkage and Selection via the Lasso". pdf paper
  • Efron, Hastie, Johnstone, Tibshirani. "Least Angle Regression. pdf paper
2015-11-11 Problem Set #5 Due [Problem Set #5]
[R Solution Code]
[columbiaImages.zip]
[photoMetaData.csv]
2015-11-16 Applications to Image Classification [Lecture 18]
[script18.R]
  • BvdG, 2.1-2.4
  • Tian-Tsong Ng, Shih-Fu Chang, Jessie Hsu, Martin Pepeljugoski. "Columbia Photographic Images and Photorealistic Computer Graphics Dataset." pdf paper
  • Taylor Arnold, Lauren Tilton. "Humanities Data in R". book; free pdf version from Yale
2015-11-18 Tuning Ridge and Lasso Regression [Lecture 19]
[script19.R]
  • BvdG, 2.5
  • Ryan Tibshirani. "Model selection and validation 1: Cross-validation" pdf notes
  • Ryan Tibshirani. "Model selection and validation 2: Model assessment" pdf notes
2015-11-18 Problem Set #6 Due [Problem Set #6]
[questions]
2015-11-30 Theory of the Lasso I [Lecture 20]
[script20.R]
  • BvdG, 6.0-6.3
  • Meinshausen, Nicolai, and Bin Yu. "Lasso-type recovery of sparse representations for high-dimensional data." The Annals of Statistics (2009): 246-270.
2015-12-02 Theory of the Lasso II [Lecture 21]
[script21.R]
  • BvdG, 6.3
  • Bickel, Peter J., Ya'acov Ritov, and Alexandre B. Tsybakov. "Simultaneous analysis of Lasso and Dantzig selector." The Annals of Statistics (2009): 1705-1732.
  • Zhao, Peng, and Bin Yu. "On model selection consistency of Lasso." The Journal of Machine Learning Research 7 (2006): 2541-2563.
2015-12-07 Midterm #2 [Midterm Questions]
[lset.Rds] Set of lemmas
[metaData.Rds] Metadata: responses and training flag
[mmLemma.Rds] Data Matrix
[metaDataAll.Rds] Metadata, with solutions to test set
[lemma_gender.mp4] visualization of the lasso solution
[lemma_age.mp4] visualization of the lasso solution
2015-12-07 The Generalized Lasso [Lecture 22]
[trendfilter_linear.mp4]
[trendfilter_quadratic.mp4]
  • Taylor Arnold and Ryan Tibshirani. "genlasso: Path algorithm for generalized lasso problems" R package and vignette
  • Taylor Arnold and Ryan Tibshirani. "Efficient Implementations of the Generalized Lasso Dual Path Algorithm", Journal of Computational and Graphical Statistics. pdf article
  • Ryan Tibshirani and Johnathan Taylor. "The solution path of the generalized lasso", Annals of Statistics 39 (3) 1335-1371.
2015-12-09 ADMM and Course Review [Lecture 23]
  • S. Boyd et al., Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011. pdf article
2015-12-16 Problem Set #7 Due [Problem Set #7]
[pset07_X.Rds]
[pset07_y.Rds]