Stern School of Business
Statistics and Data Analysis


Professor: William Greene, Departments of Economics and IOMS

BS Ohio State University, 1972 (Operations Research); MA Wisconsin, 1974 (Economics); PhD Wisconsin, 1976 (Econometrics); History: Cornell 1976-1982; Real world, 1982-1983; Return to ivory tower at Stern (then GBA) NYU, 1983-; Robert Stansky Professor, 2010-. Permanent affiliations: University of Lugano, American University, University of Sydney, Monash University, University of Queensland, Curtin University, Queensland University of Technology, University College London. Publications: Articles - see vita on home page; Books: Modeling Ordered Choices, 2010, Econometric Analysis, 7th Ed (2012); Applied Choice Analysis (2006,2015); Software, NLOGIT (, Editor in Chief, Foundations and Trends in Econometrics. Editor in Chief, Journal of Productivity Analysis, Associate Editor, Journal of Economic Education, Journal of Choice Modeling, Economics Letters. Research interests: econometric methodology, discrete choice modeling, efficiency and productivity analysis, health economics, transportation, nonlinear estimation, entertainment and media.
Office:  MEC 7-90, Ph. 998-0876, Fax. 995-4218

Home Page:


This course has two broad objectives: (1) This course will provide students with an understanding of fundamental notions of data presentation and analysis. We will develop tools to enable students to use statistical thinking in the context of business problems. The course deals with modern methods of data exploration (partly to reveal unusual or problematic aspects of data sets), the uses and abuses of the basic techniques of statistical inference, and the use of linear regression as a tool for management and financial analysis. (2) There is randomness everywhere in life and in the environment. We will develop models of probability and random variables that help to understand the randomness of everyday life and the business environment.


I will assume that students are familiar with routine algebra, exponents and logarithms as well as graphical tools such as the slope and intercept of a straight line. Algebra will be used freely throughout the course. We may have rare occasions to use calculus, but these will be sparing at most.


Course Requirements and Course Grades


Final grades for the course will be determined on the basis of the following components and weights:


*   Mid-term exam: 30% (2012 Midterm Exam with Solutions) (2013 Midtern Exam with Solutions)

*   Final exam: 30% (2012 Final Exam with Solutions)

*   In class short (10 minutes) quizzes 15% (5 @ 3% ) 15% (in aggregate)

*   Homework assignments (details below): (5 @ 3%) 15% (in aggregate). Students may work in groups of up to 4 and submit a single report for the group.

*   Model development project (details below) 10%. Students may work in groups of up to 4 and submit a single report for the group.


Official policy at Stern mandates that grades in core classes follow a distribution in which no more than 35% of students receive A or A-. Homework assignments are mandatory. Late submissions will be accepted only with a persuasive justification, but not after the solutions have been posted. All examinations are open book, open notes, closed telephone, closed PDA, closed iPhone, closed iPad, closed Droid, closed laptop, closed tablet, open mind. Do bring a conventional hand calculator (not a cell phone that includes a calculator) to both exams. Links to copies of past exams appear below.


Honor Code:  Of course.


Course Materials


*    Text for this course is Statistics for Business: Decision Making and Analysis (Pearson, 2nd Edition) by Robert Stine and Dean Foster: (


*    Software for the course will be Minitab, Release 17. You can rent a copy for about $30 for the semester from the website, - at the site, click on e-Store. (Introduction to Minitab) (Useful notes on Minitab) Minitab 17 can also be run on the Stern Citrix server. We will discuss this in class.

Other Stuff

*    Please remember to turn off your cell phone before you come to class.

*    Please try to arrive early. Late entrances are disruptive.

*    As a general rule, laptops are an annoyance during class, particularly when you are checking your email, playing with Facebook, tweeting or watching YouTube videos while others are studying statistics. If you absolutely must use your laptop to take notes, please be respectful of the interests of your colleagues.

*    There will be no makeups for the quizzes.

Course Outline and Schedule

Materials: (Introductory notes for the course - Notes 0: Introduction - Right click to download

Session 1: Introduction to statistics; data description and presentation; types of data; Minitab.
Reading: Text:, Chapters 1 and 2, Sections 3.1-3.3, Chapter 4. (Notes 1: Data Presentation.)

Session 2: Sampling, Descriptive statistics: mean, median, mode, standard deviation, correlation.
Reading: Text, Chapters 4 and 6. (Notes 2: Descriptive Statistics.)

Session 3: Probability, conditional and unconditional probability, independence, joint probability, Bayes Theorem.
Reading: Text, Chapters 7, 8. Probabilities and the Gulf oil spill. (Notes 3: Probability.)   

Session 4: Expected value, applications of expected value. 
Reading: Text, Section 3.4, 4.3, Chapter 9 (Notes 4: Expected Value.)

Session 5: Random variables.
Reading: Text, Chapter 9
. (Notes 5: Random Variables.)

Session 6: Covariance and correlation.
Reading: Text Chapter 10
(Notes 6: Covariance and Correlation.)

Session 7: Discrete distributions, Bernoulli, binomial.
Text Sections 11.1 - 11.3. (Notes 7: Bernoulli and Binomial Distributions)

Session 8: Discrete distributions, Poisson Model.
Reading: Text. Section 11.4, 18.4.
(Notes 8: Poisson Processes)

Session 9: The normal distribution.
Readings: Text Chapter 12.
(Notes 9: The Normal Distribution.) (Sample Problems)


Session 10: Samples and sampling distributions, normal distribution, large samples, law of large numbers, central limit theorem.
Reading: Text Pages 155-157, Sections 10.3-10.5, Chapter 12, pp. 302-307, Chapter 13, Sections 14.1-14.3, (DataStor case)
Materials: (Notes 10: The Central Limit Theorem and the Law of Large Numbers.) (Random Walk Models for Stock Prices)

Session 11: Central Limit Theorem, normal approximations, lognormality, random walk.
Reading: Text Chapter 13
(Notes 11: Normal Approximation and Random Walks.) (Lognormal Random Walks for Stock Prices)

Session 12: Statistical inference, point estimates and confidence intervals
Text Chapter 15. (Notes 12: Statistical Inference.)  

Session 13: Statistical tests - 1
Reading: Text Chapters 15,16
. (Notes 13: Testing Hypotheses Part 1) 

Session 14: Statistical tests - 2
Reading: Text Chapters 15,17.
(Notes 14; Testing Hypotheses Part 2.)

Session 15: Hypothesis testing,
Reading: Text Chapters 5, 18
(Notes 15: Hypothesis Testing)

Session 16: Linear regression.
Reading: Text Chapter 19, 20, 21, 22.
(Notes 16: Linear Regression.)
(A controversial regression study) (Slides for an application of modeling) (Handout for application) (Regression Analysis by WHO)

Session 17: Linear regression model, sample and population.
Reading: Text Chapter 19, 21.
(Notes 17: Regression Modeling)

Session 18: Least squares linear regression, residual analysis, analysis of variance.
Reading: Text Chapter 19, 20
(Notes 18: Regression Analysis.)

Session 19: Correlation and covariation.
Reading: Text Chapter 20, Section 22.2.
(Notes 19: Regression and Correlation.)

Session 20:  Aspects of regression to the mean, measurement error, truncation, selection.  
Reading: Section 21.4, 22. (Notes 20: Specifying the Regression Model.)

Session 21: Multiple regression - 1,
Reading: Text Sections 23.1, 23.2, 24, 25.
(Notes 21: Multiple Regression Part 1)

Session 22: Multiple regression - 2
Reading: Text Chapter 24.
(Notes 22: Multiple Regression Part 2)

Session 23: Multiple regression - 3
Reading: Text Section 22.2, 24.2, 24.3.
(Notes 23: Multiple Regression Part 3)

Session 24: Multiple regression - 4
Text Section 23.4, Chapter 25. (Notes 24 Multiple Regression Part 4)

Session 25: Modeling Qualitative Data 
Reading: None (Notes 25: Analyzing Qualitative Data.) (The Netflix Prize)

Individual Problem Sets and Assignments

Students may work in groups of up to four on these homework assignments and submit your assignment as a group. All data needed sets for the assignments are linked below. You can left click to open them in Minitab, or right click to download them to your own computer.

Assignment 1.  Data Description and basic probability.  Problem set 1  Problem set 1 solutions
Data sets: (HOG-Ex0201.mpj) (HOG-Ex0202.mpj) (HOG-Ex0218.mpj) (HOG-Ex0222.mpj) (97employ.mpj) (WHO-HealthStudy.mpj

Assignment 2.  Probability and Random Variables, Expected Value, Poisson and Hypergeometric Distributons, Normal Distribution. Problem set 2  Problem set 2 solutions
Data sets: (WHO-HealthStudy.mpj) (Easton.mpj) (salary.mpj) (Movies9OCT2003.mpj)  

Assignment 3.  Statistical Inference. Problem set 3  Problem set 6 solutions
Data sets: (German Health Survey Data) (Sale Prices for Monet Paintings)

Assignment 4.  Basic Regression. Problem set 4   Problem set 4 solutions  
Data sets: (WHO-HealthStudy.mpj) (EconGrades.mpj) (heating.mpj) (KansasCtyPopn.mpj) (WSJ-Height-Income.mpj)

Assignment 5.  Multiple Regression. Problem set 5  Problem set 5 solutions
Data sets: (UKElectronics.mpj) (GermanHealth.mpj) (MoreMoviemadness data.mpj) (Credit Application data)

Model Development Project

Notes on the model development project Data for Model Development (Minitab) (Excel)