Microeconometric Modeling and Discrete Choice
Analysis with Cross Section and Panel Data
Professor William. Greene
Home Page: http://people.stern.nyu.edu/wgreene
This course will survey techniques used in modeling cross section and panel data. Emphasis will be on discrete data, though results and techniques are mostly generic and will extend to other modeling frameworks. Discrete choice models have become an essential tool in modeling individual behavior. The techniques are used in all social sciences, health economics, medical research, marketing research, transport research, and in a constellation of other disciplines. This course will examine a large number of models and techniques used in these studies. We will begin with a brief review of regression modeling concepts, then turn to the fundamental building block in discrete choice modeling, the binary choice model. Several variants and extensions will be discussed before we turn attention to multiple equation binary choice models, ordered choice models and models for counts. The second half of the course will be devoted to multinomial outcome models of the sort used, e.g., in modeling brand choice in marketing, travel mode choice in transport, and a huge variety of applications in the social and behavioral sciences.
The course will include lectures that develop the relevant theory and extensive practical, laboratory applications. Emphasis in the laboratory sessions will be on estimation of discrete choice models and using them to describe behavior and to predict discrete outcomes. Course participants will apply the techniques on their own computers using the NLOGIT computer program and several real data sets that have been used in applications already in the literature.
This website provides materials that will be used in the course as well as additional resources that students might want to use to explore on their own the material presented in the course. We will only use a few of the data sets. The articles are provided for self study. We will not be working through them in detail, though some of them will be superficially examined as applications of the methods discussed in class.
Prior knowledge is assumed to include calculus at the level assumed in the first year of a Ph.D. program in economics and a course in econometrics at the beginning Ph.D. level out of a textbook such as Greene, W., Econometric Analysis, 7th edition. Familiarity with NLOGIT will be helpful, but is not necessary. Two useful reference books for the course are the primer Applied Choice Analysis by David Hensher, John Rose and William Greene (Cambridge University Press, 2005) and the survey monograph, Modeling Ordered Choices by William Greene and David Hensher (Cambridge University Press, 2010). An up to date and comprehensive survey of econometric theory that is a bit more advanced than needed for our purposes is Wooldridge, J., Econometric Analysis of Cross Section and Panel Data, 2nd ed., MIT Press, 2010.
Students in this course will obtain background in both the theory and methods of estimation for discrete choice modeling. This course will provide a gateway to the professional literature as well as practical applications of the methods at the level of the contemporary research in the field. Emphasis in the course is on applications of methods. Derivations of, e.g., asymptotic properties of estimators, and theoretical fine points, such as the implications of different types of independence assumptions in panel data models are left for more advanced treatments.
Background Material and Related Topics
No specific textbook is assigned for the course. Some of the presentation will be based on Econometric Analysis, 7th ed., by Greene, W. (Prentice Hall, 2012). 7 chapters are included below:
Courses in Econometrics. These are semester long courses in econometrics.
Applied Econometrics (Basic Econometrics; Text is Greene (2012))
Analysis of Panel Data (Panel Data Methods; Text is Greene (2012), Wooldridge (2010), Baltagi (2014), Cameron and Trivedi (2005))
Left click to open. Right click to download. (These are PDF and pptx files.)
Greene, W., Econometric Analysis, 7th Ed.
Greene 14-Maximum likelihood estimation
Greene 15-Simulation based estimation and inference
Greene 17-Discrete choice models
Greene 19-Censoring and truncation
Correcting estimated standard errors in the presence of clustering (Wooldridge, 2003)
Modeling endogeneity in nonlinear models (Terza, Basu, Rathouz, 2008)
Interaction effects in nonlinear models (Greene, 2010)
Modeling dynamic effects in nonlinear models (Wooldridge, 2005)
Estimating willingness to pay with mixed logit models (Czajkowski and Carson, 2014)
These are class notes on some specific topics related to modeling individual behavior.
Microeconometrics Topics 1. Descriptive Statistics and Linear Regression (PDF)
Microeconometrics Topics 2. Endogeneity (PDF)
Microeconometrics Topics 3. Linear Regression with Panel Data (PDF)
Microeconometrics Topics 4. Quantile Regression and Bootstrapping (PDF)
Microeconometrics Topics 5. Bayesian Analysis (PDF)
Microeconometrics Topics 6. Nonlinear Models (PDF)
Microeconometrics Topics 7. Sample Selection in Nonlinear Models (PDF)
Microeconometrics Topics 8. Random Parameter and Hierarchical Linear Models (PDF)
Microeconometrics Topics 9. Some Latent Class Models (PDF)
Microeconometrics Topics 10. Censoring and Truncation (PDF)
Microeconometrics Topics 11. Duration Models (PDF)
The following are two descriptive papers that introduce modern forms of discrete choice models and some survey papers on specific topics in discrete choice modeling.
Economic Choices, American Economic Review, McFadden, D. (2001). McFadden’s Nobel Prize lecture.
Mixed MNL Models for Discrete Response, McFadden, D. and Train, K., Journal of Applied Econometrics, 2000.
The Behaviour of the Maximum Likelihood Estimator of Limited Dependent Variable Model in the Presence of Fixed Effects, Greene, W., Econometric Journal, 2004.
Discrete Choice Models, W. Greene, (a survey of discrete choice models), Palgrave Handbook of Applied Econometrics, 2009.
Functional Form and Heterogeneity in Models for Count Data, W. Greene (a survey of models for count data); Foundations and Trends in Econometrics, 2007.
Modeling Ordered Choices, Greene, W. and Hensher, D., Cambridge University Press, 2010.
Applications: Health Econometrics
Left click to open. Right click to download. (These are PDF files.)
O Askildsen, J., Baltagi, B., and T. Holmas (2003): Wage Policy in the Health Care Sector:
A Panel Data Analysis of Nurses Labour Supply, Health Economics, 12, 705-719. (semiparametric
O Au and Lorgelly: Anchoring Vignettes
O Bago d Uva and Jones: Latent Class Health Care Models
O Christensen and Kallestrup-Lamb: Duration Models for Retirement
O Contoyannis, Rice, Jones: Dynamic Ordered Choice Model of Health Satisfaction
O Evans and Schwab: Recursive Bivariate Probit Model and an Exploration of Selectivity
O Finkelstein et al.: Oregon Health Insurance Experiment
O Frolich: Survey of Nonparametric Binary Choice Models
O Gannon: Dynamic Probit Model
O Greene: Convenient Estimators for a Panel Data Probit Model
O Gregory and Deb: Does SNAP Improve Your Health?
O Kerkhofs and Lindeboom: Dynamic Panel Data Model Health and Labor Market
O Lagarde: Latent Class Logit Analysis of Infant Care
O Lindeboom, M., Portrait, F., and G. van den Berg (2002): An Econometric Analysis of the Mental-
Health Effects of Major Events in the Live of Older Individuals, Health Economics, 11, 505-520.
(Model for attrition)
O Laporte: Quantile Regression
O Johnston, Schurer and Shields: Dynamic Ordered Choice Model
O Jones and Schurer: Dynamic Ordered Choice Model
O Petrin and Train: A Control Function Approach to Endogeneity in Consumer Choice Models
O Riphahn, Wambach, Million: Mixed Poisson Models for Health Care Utilization
O Scott et al.: Recursive Bivariate Probit Analysis of Quality of Diabetes Care
O Tamm, M., Tauchmann, H., Wasem, J., Gress, S. (2007): Elasticities of Market Shares and
Social Health Insurance Choice in Germany: A Dynamic Panel Data Approach, Health
Economics, 16, 243-246. (Logit model for aggregate shares. GMM estimation)
O Van Ophem: Extension of Winkelmann Hurdle Model
O Winkelmann: Econometric Exploration of Count Models of Health Care
Applications: Environment and Energy
Left click to open. Right click to download. (These are PDF files.)
C. Di Maria, S. Ferreira, and E. Lazarova. Shedding light on the light bulb puzzle: the role of attitudes and perceptions
in the adoption of energy efficient light bulbs. Scottish Journal of Political Economy, 57(1):48-67,2010.
[CFL light bulb adoption. Binary choice modeling.] (PDF)
Carl Christian Michelsen and Reinhard Madlener. Homeowners' preferences for adopting innovative residential heating system:
A discrete choice analysis for germany. Energy Economics, 34:1271-1283, 2012.
[Multinomial logit model for heating system adoption.] (PDF)
Faust, A. and A. Baranzini, The Economic Performance of Swiss Drinking Water Utilities, Journal of Productivity Analysis,
41, 2014, pp. 383-397. (PDF)
J.C. Whitehead and T.L. Cherry. Willingness to pay for a green energy program: a comparison of ex-ante and ex-post
hypothetical bias mitigation approaches. Resource and Energy Economics, 9(4):247-261, 2007.
[Willingness to pay for green energy. Multinomial logit.] (PDF)
R. Scarpa and K. Willis. Willingness-to-pay for renewable energy: Primary and discretionary choice of British households'
for micro-generation technologies. Energy Economics, 32(1):129-136, 2010.
[Mixed logit, demand for renewable energy.] (PDF)
Nada Wasi and Richard T. Carson. The influence of rebate programs on the demand for water heaters: The case of New South
Wales. Energy Economics, 40:645-656, 2013. [Mixed logit, willingness to pay, stated preference, water heating system choice.] (PDF)
Tom Ndebele and Dan Marsh. Environmental attitude and the demand for green electricity in the context of supplier choice:
A case study of the New Zealand retail electricity. Working paper, Department of Economics, Waikato Management School,
University of Waikato, 2014. [Latent class MNL, willingness to pay for green energy.] (PDF)
To set the stage for the main subjects of the course, we will briefly review some descriptive tools and the linear regression model. We will lay the groundwork for looking at nonlinear panel data models by examining the fixed and random effects linear models. We will then study four models that comprise the foundation for discrete choice modeling:
1. The fundamental model of binary choice (and a number of variants);
2. Models for ordered choices;
3. The Poisson regression model for count data;
4. The fundamental model for multinomial choice, the multinomial logit model.
The course will consist of discussions of theoretical material and examinations of empirical studies that appear in the modern literature. Laboratory sessions will apply the techniques to live data sets. In some cases, time will be devoted to topics, discussions and laboratory work on student projects. Discussions will cover the topics listed below. Lab sessions will apply the techniques discussed in the preceding sessions. Practicals will consist of directed exercises and student assignments to be completed singly or in groups.
For the materials below, left click to open, right click to download.
I. Class Notes: These are Powerpoint slide presentations for use during the class sessions.
(Abbreviated versions of these notes appear here: (UAM-Madrid) (SSPH-Lugano) (GCEP-Georgetown)
Part ME-1-1: Tools and Regression
Part ME-1-2: Bootstrap, Quantile Regression, Stochastic Frontier
Part ME-1-3: Panel Data Models
Part ME-2-2: Nonlinear Effects Models, Endogeneity
Part ME-2-3: Panel Data Models for Binary Choice
Part ME-3-2: Models for Count Data
Part ME-3-3: Multinomial Logit
Part ME-4-1: Nested Logit, Multinomial Probit
Part ME-4-2: Latent Class Models
Part ME-5-1: Stated Preference Data, Hybrid Choice Models
Part ME-5-2: Discrete Choice Models for Spatially Correlated Data
Part ME-5-3: Multinomial Choice Modeling with Aggregate Share Data
II. Exercises: Exercises and Practicals for Discrete Choice Modeling.
A. Scripted Exercises: These are exercises for the student to do with the instructor. Note there are two parts to each, the .pdf file for the assignment and the .lim file that contains the NLOGIT commands.
Tutorial: Executing the Scripts for the Assignments (pptx file)
Assignment 1: Basic Regression, (NLOGIT Commands for Assignment 1)
Assignment 1A: Binary Choice (NLOGIT Commands for Assignment 1A)
Assignment 2: Binary Choice: Estimation and Testing, Panel Data (NLOGIT Commands for Assignment 2)
Assignment 3: Binary Choice Modeling with Heterogeneity (NLOGIT Commands for Assignment 3)
Assignment 4: Ordered Choice and Count Data Models (NLOGIT Commands for Assignment 4)
Assignment 5: Multinomial Choice Models (NLOGIT Commands for Assignment 5)
Assignment 6: Stated Preference Model, Student Project: Model for Moral Hazard (NLOGIT Commands for Assignment 6)
Assignment 7: The Linear Regression Model (NLOGIT Commands for Assignment 7)
Assignment 8: Stochastic Frontier Models (NLOGIT Commands for Assignment 8)
[Stata versions] (Quick start guide)
Assignment 1: Basic Regression, (Stata Commands for Assignment 1)
Assignment 2: Binary Choice: Estimation and Testing, Panel Data (Stata Commands for Assignment 2)
Assignment 3: Binary Choice Modeling with Heterogeneity (Stata Commands for Assignment 3)
B. Additional Exercises.
1. Binary Choice Models: (Exercise 1-pdf), (Commands-lim), (Data-lpj)
2. Multinomial Choice: (Exercise 2-pdf), (Commands-Part1-lim), (Commands-Part2-lim), (Data-lpj)
3. Advanced Multinomial Choice (Exercise 3 (was exercise 5)-pdf), (Commands-lim), (Multinomial Choice Data-lpj), (SPRP Data-lpj), (Description of SPRP Data)
4. Stated Preference Multinomial Choice (Exercise 4-pdf), (Commands-lim), (Data-lpj)
5. Panel Data Binary Choice (Exercise 5-pdf), (Commands-lim), (Data-lpj)
6. Count Data (Exercise 6-pdf), (Commands-lim), (Data-lpj)
III. Data Sets: The lpj files are NLOGIT project (.lpj) files. Each data set is also provided in the portable comma separated values (csv) format.
American Express Credit Data, 13444 observations (Filename=AmexData) (lpj) (csv)
Small Demonstration Income/Education Data, 14 observations (Filename=IncomeData) (lpj) (csv)
Travel Mode Choice Data, 840 observations (Filename=clogit) (lpj) (csv)
Brand Choices Data, 12800 observations (Filename=brandchoicesSP) (lpj) (csv)
Combined Travel Mode and Brand Choices Data, 12800 observations (Filename=mnc) (lpj) (csv)
Health Economics, Small Subset of GSOEP Data, 2039 observations (Filename=HealthData) (lpj) (csv)
Manufacturing Innovation Data, 6350 observations (Filename=panelprobit) (lpj) (csv)
Multinomial Choice Stated Preference Experiment, 9408 observations (Filename=sprp) (lpj) (csv)
Labor Supply Data, 753 observations (Filename=labor) (lpj) (csv)
Health Care Panel Data 27326 observations (Filename=healthcare) (lpj) (csv)
Southern California Fishing Data (long form), 4728 observations (Filename=fishing_data(NL)) (lpj) (csv)
Rand Study Data, 19339 observations (Filename=rand_data_2012) (lpj) (csv)
General Practitioner Visits Data, 342 observations (Filename=gpvisits_panel_shp_data_2011) (lpj) (csv)
Kenneth Train California Public Utility Survey, 17232 rows, 4308 obs. (Filename=TrainCalUtilitySurvey) (lpj) (csv)
Stated Preference Data on Commuting Choices (lpj) (csv)
Panel Data from NLS (Cornwell and Rupert) (lpj) (csv)
Panel Data for Production Modeling (Spanish Dairy Data) (lpj) (csv)
Panel Data for Costs and Production (Swiss Railroads) (lpj) (csv)
Panel Data for Production and Cost (US Airlines Data) (lpj) (csv)
IV. NLOGIT Software: This section contains a brief introduction and two manuals: The short introduction is a getting started. The LIMDEP manual explains the basics of using LIMDEP and NLOGIT. (LIMDEP is embedded in NLOGIT). The NLOGIT manual contains descriptions of how to use the special features for discrete choice modeling with NLOGIT (as well as some additional material on other discrete choice models that are also contained in LIMDEP). The setup file contains an installation kit for installing a copy of NLOGIT made specially for this course on your own computer. You should download the setup file to your own computer and execute it there, rather than launching it from your web browser.
These are Powerpoint slide presentations that explain using NLOGIT and how to do the assignments.
Lab 5: Useful Tools: Simulation, Partial Effects, Bootstrapping
Lab 6: Random Parameters and Latent Class Models
Lab 7: Multinomial Choice Models
These are short manuals that document how to use the program:
Quickstart Introduction to NLOGIT (Command script file to use with Quickstart)
This is the installation kit for installing the program. Download this file to your computer before you use it to install NLOGIT on your computer.
NLOGIT Software Setup for Installing NLOGIT on Your Computer