STAT 550- Multivariate Statistical Analysis- Winter 2018

spritelogo

Applied Multivariate Statistics with R by Daniel Zelterman

Meeting Information

Instructor: Keshav P. Pokhrel, Ph.D.
Meeting Times: TR 4:30 PM - 5:45PM
Email: kpokhrel(at)umich.edu
Meeting Location: 2046CB
Office: 2087CB
Office Hours:
Tuesday 2:00 PM- 3:00 PM
Wednesday 1:30 PM- 3:00 PM
Thursday 11:00AM-12:00 PM
and by appointments

Course Description and Objectives


Description:
An introduction to commonly encountered statistical and multivariate techniques, while assuming only a limited knowledge of higher-level mathematics. Topics include: multivariate analysis of variance, multivariate regression, principal component analysis, factor analysis, canonical correlation, and linear discriminant analysis. We will extensively use computer software to analyze data with emphasis on interpretation of estimated parameters. The major computing workhorse for this course is a software called "R".

Objectives:
Primary objective of this course is to introduce applications of multivariate statistics in different deciplines. This course tries to hit a good middle ground of mathematical statiatics and computing. In addition, the course aims to introduce supervised and unsupervised learning with ample data visualizations techniques.

Student Leanrning Outcome:
  • Increase students’ command in problem-solving through multivariate Statistical techniques.
  • Increase students’ ability to communicate and work cooperatively.
  • Learn and analyse the applications of supervised learning techniques: Multivariate Linear Regression, Regression Trees and Logistic Regression.
  • Learn and analyse the applications unsupervided learning techniques : Principal Component Analysis, Factor analysis and k-means clustering.
  • Understand and critique dimension reduction problems.
  • Write a semester project by applying different multivariate statistical techniques for real life data and come up with a resonale statistical model/s with non-techical interpretations.

    Textbook

    Applied Multivariate Statistics with R by Daniel Zelterman

    Major Reference Books
    1. G. James, D. Witten, T. Hastie, R. Tibshirani (2013) An Introduction to Statistical Learning with Applications in R. Springer.
    2. Applied Multivariate Statistical Analysis by Dean W. Wichern and Richard A. Johnson
    3. Applied Regression Models-4th Edition by Kutner, Nachtsheim, and Neter
    4. Diez, Barr, and Cetinkaya-Rundel, OpenIntro Statistics.

    Homework


    A set of homework problems from each chapter will be assigned. Majority of the homework problems will be from book but be prepared to solve any problems of that streamlines with our course content. Good news! Lowest homework grade will be dropped. You are expected to spend an average of 3-5 hours of work per week outside of class. Late assignment is accepted with 20% penalty per day. For better exam results you need to master all the homework problems.

    Rlabs and In class assignments


    You will get worksheets with problems in the class. For a majority of in-class assignments students can interact with the friends and look at the notes to solve the problems. I encourage everyone to solve the problems on the white board and interpret the results to the class. I urge you to find interesting problems from the areas (eg. business, sociology, biology, sports, public health etc.) of your interest, this will help you to prepare for your project and at the same time you are higly likely to earn better score.

    Exams


    There will be two mid-term exams, and a final project. The final project is your final exam. To answer the exam questions, you are expected to have a clear mathematical reasoning of the statistical methods used to solve the subject problems.

    Project


    There will be two mini-projects and a semester project. For a good project, you need to describe the data, pose reasonable hypotheses, estimate parameters, select appropriate regression model/s, and explain the results in both statistical terms and in a nontechnical language. Primary objective of these projects is to apply statistical methods in the real life situations and come up with logical reasonings and explanations of the statistical methods. Late submission of project will result in losing 10% of total points everyday.

    Software


    We use a software called "R". R is a programming language for statistical computing and visualizing data. It can be downloaded for free from http://www.r-project.org. We will R Studio for regular classroom activities. R studio is an open source Integrated development Environment(IDE) for R. To download R click here for windows and here for Mac. After Installing R: click R Studio to download R studio.

    Evaluations and Important Dates:
    Exam I (20%) Tuesday, February 13
    Exam II (20%) Tuesday, March 27
    Mini Project I(5%) Due, March 09
    Mini Project II(5%) Due, April 05
    In class Assignments/Rlabs(10%) TBD
    Homework (15%) TBD
    Final Project(25%) Presentation, Thursday, April 26 (3:00-6:00PM)
    Final paper due April 20

    Grade Distribution

    Your final grade will be based on two mid-term exams, five sets of graded homeworks, two mini-projects, and a final project. Lowest homework grade will be dropped. If you have any grade disputes you need to notify me within a week after grades are posted in canvas.
    Letter Grade E D- D D+ C- C C+ B- B B+ A- A A+
    Percentage 0-59 60-62 63-66 67-69 70-72 73-76 77-79 80-82 83-86 87-89 90-92 93-96 97-100

    Disability Statement


    The University will make reasonable accommodations for persons with documented disabilities. Student need to register with Disability Resource Services (DSR) every semester they are enrolled for classes. DRS is located in counseling & Support Services, 2157 UC. To be assured of having services when they are needed, students should register no later than the end of add/ drop deadline of each term. Visit the DSR website at: webapps.umd.umich.edu/aim. If you have disability that necessitates an accommodation or adjustment to the academic requirements stated in this syllabus, you must register with DRS as directed above and notify me. Upon receipt of your notification, we will make accommodation as directed by DRS.

    Academic Integrity


    The University of Michigan-Dearborn values academic honesty and integrity. Each student has a responsibility to understand, accept, and comply with the University's standards of academic conduct as set forth by the Code of Academic Conduct (mdearborn.edu/policies_st-rights), as well as policies established by each college. Cheating , collusion, misconduct, fabrication, and plagiarism are considered serious offenses, and may be monitored using tools including but not limited to TurnItIn. Violations can result in penalties up to and including expulsion from the University. At the instructor's direction, the penalty may be a grade zero on the assignment up to and including recommending that student be expelled from the University. It is the sole responsibility of the student to understand and follow academic guidelines regarding plagiarism. The University of Michigan-Dearborm has an online academic integrity tutorial that can be accessed at: umdearborn.edu/umemergencyalert

    Safety


    All students are encouraged to program 911 and UM-Dearborn’s University Police phone number (313) 593-5333 into personal cell phones. In case of emergency, first dial 911 and then if the situation allows call University Police. The Emergency Alert Notification (EAN) system is the official process for notifying the campus community for emergency events. All students are strongly encouraged to register in the campus EAN, for communications during an emergency. The following link includes information on registering as well as safety and emergency procedures information: .
    If you hear a fire alarm, class will be immediately suspended, and you must evacuate the building by using the nearest exit. Please proceed outdoors to the assembly area and away from the building. Do not use elevators. It is highly recommended that you do not head to your vehicle or leave campus since it is necessary to account for all persons and to ensure that first responders can access the campus.
    If the class is notified of a shelter-in-place requirement for a tornado warning or severe weather warning, your instructor will suspend class and shelter the class in the lowest level of this building away from windows and doors. If notified of an active threat (shooter) you will Run (get out), Hide (find a safe place to stay) or Fight (with anything available). Your response will be dictated by the specific circumstances of the encounter.


    Tentative Academic Calender


    Week Chapters/SectionsTopics covered Remarks
    Week 1 (Jan 9, 11) Chapter 1-3 Introduction
    Week 2 (Jan 16, 18) Chapter 4 Basic Linear Algebra
    Week 3 (Jan 23, 25) Chapter 5 Univariate Normal Distribution
    Week 4 (Jan 30, Feb 01) Chapter 6 Bivariate Normal Distribution
    Week 5 (Feb 06, 08) Chapter 7(sections 7.1-7.4), review Multivariate Normal Distribution
    Week 6 (Feb 13, 15) Chapter 7 (sections 7.5-7.7) Multivariate Normal Distribution
    Week 7 (Feb 20, 22) Chapter 8 Principal Component Analysis
    Week 8 ( Feb 27, Mar 01) Spring recess
    Week 9(Mar 06, 08) Chapter8 Factor Analysis
    Week 10 (Mar 13, 15) Chapter 9 Multivariable Linear Regression
    Week 11 (Mar 20, 22) Chapter 9 Multivariable Linear Regression, review
    Week 12(Mar 27, 29) Chapter 10 , Exam II review Exam II, Multinomial Logistic Regression
    Week 13 (Apr 3, 5) Chapter 10 Support Vector Machine and Regression Trees
    Week 14 (Apr 10, 12) Chapter 11 Hirarchical Clustering, K-means Clustering
    Week 15 (Apr 17, 19) Chapter 11 Model Diagnostics and Validation


    Homework



    Description Remarks


    R-labs



    Description Remarks
    Auto Data Download
    US Baby Names Download
    Salary Data Download


    Some Helpful Resources


    Data sets Data from Johnson and Wichern's book.
    A must read resource by professor Harrel Veryuseful collection of data, R-codes and many statistical modeling strategies.
    FactomineR A rich resource of tutirials and datasets to perform principal component analysis, factor analysis, and classifications.
    Data sets This link has a large collection of data which can be useful for different areas of research including linear regression to repeated measure analysis.
    Bike Data Bike sharing data
    Machine Larning data UCI Machine Learning Repository.
    National Survey National Survey on Drug Use and Health, 2009 (ICPSR 29621)
    Springboard Datasets 19 Free Public Data Sets For Your First Data Science Project
    Openintro Data An organized Source of Data for class projects.
    DataQuest 18 places to find data sets for data science projects.
    StatSci.org. A good resource for varieties of data sets. These data sets are open to public and you can use these data sets for your own projects.
    Online Statistics Education This a very helpful resource for introductory statistics.
    Data from text book Supplementary Material for our Text book
    Linear Algebra A comprehensive refresher of linear algebra
    Statistical Foundation of Machine Learning A An online free resource for machine learning with R-codes.
    Data data from Kutner's book
    Install R Guideline to download and install R
    Try R A good resourse to learn R online
    R tutorials Yet another collection of resourses to learn R
    Regression Models in R A An online portal to learn regression
    OpenItro This is an excellent resource for introductory statistics. Apart from lecture notes they also have well explained examples with R code.
    Exploratory Data Analysis Wide range of statistical topics are covered in this web page with video lectures and other supplementary materials.
    StatSci.org. A good resource for varieties of data sets. These data sets are open to public and you can use these data sets for your own projects. If you happen to use these data please do not forget to mention the source.
    More Stat Apps Wonderful collection of Statistics Apps for data visualization
    Machine Learning repository UCI Machine Learning Repository- a comprehensive webpage with varities of data sets.
    List of data A rich collection of data
    Data Journalism Open data sets by British newspaper "theguardian".
    Markdown Themes Appearance and Style themes to create HTML document using R Studio.
    Shiny Apps A comprehensive Resource of Shiny Apps