How to use this book

In this casebook you will find examples drawn from many fields, where statistical analysis is needed to answer a particular question. Almost all introductory statistics textbooks concentrate on explaining the methodology, without paying much attention to applications. At the end of such a course a student comes away with a set of tools (techniques) without having a precise idea about how and when they are to be applied. As a consequence the introductory course often appears to be dull, irrelevant, and not worth remembering. This casebook is an attempt to remedy this problem.

The most effective way to use these cases is to study them concurrently with the statistical methodology being learned. To simplify this we have grouped the cases by broad statistical topics. The relevant statistical concepts and techniques pertaining to a particular case are also noted. The cases are arranged in a sequence of topics that we follow at NYU and that is fairly typical of introductory courses.

The cases presented in the book are also classified by the amount of additional analysis needed for the resolution of the case and completion of the study. Some cases are analyzed fully and are meant to be examples or paradigms for effective analysis. These cases are marked below by an F. There are other cases where guidance is provided, in that the appropriate analysis is indicated, but not supplied. These cases are marked below by a G. The reader is expected to carry out the analysis. There is a third type of case, and these are marked below by an O. In these cases, only the data description is provided, along with background information on the context of the problem and nature of the question at issue. The reader is asked to carry out the relevant analysis in an open ended fashion. The data themselves are provided in a corresponding computer file. No analysis is provided. We believe that after having worked through F and G type cases, a reader will be able to analyze successfully the O cases.

The best way to use the F cases is to carry out the analysis described in the text interactively on a computer, to verify statements and conclusions made during the analysis. This direct involvement with the analysis will generate the ability and confidence to analyze cases marked G and O.

The data for each of the cases are contained in the diskette accompanying this book. The files are in text format (ASCII), and can be imported and analyzed using virtually any standard available software. A brief description of each file is given in the appendix "Descriptions of the data files" and should be consulted as a preliminary step in each analysis.

Given below are the cases, grouped by their relevant statistical concepts and classified by the degree of analysis already presented in the case. Cases marked with the symbol "->" are F cases that contain a good deal of material for that general area (e.g., Data Analysis), although they also contain material from more advanced areas (e.g., Statistical Inference). For example, the case "The flight of the space shuttle Challenger" appears in the "Data Analysis" section with the symbol "->" because it contains a good deal of material appropriate for "Data Analysis," although it is "officially" an F case under "Applied Probability."

DATA ANALYSIS

Association between two variables; Comparison of location and variation in sub-groups; Data summary; Examination of univariate data; Identifying subgroups; Transformations; Variation over time
   F Eruptions of the "Old Faithful" geyser                              5
   F International adoption rates                                       13
   G The performance of stock mutual funds                              21
   O Predicting the sales and airplay of popular music                  23
   O Another look at the "Old Faithful" geyser and adoption visas       24
   F Productivity versus quality in the assembly plant                  25
   O Health care spending in the United States                          32

Applied Probability

  -> The flight of the space shuttle Challenger                         33

Statistical Inference

  -> Volume and weight from a vineyard harvest                          86
  -> Baseball free agency: do teams get what they pay for?             101
  -> Reporting of sexual partners by men and women                     113
  -> The comparative volatility of stock exchanges                     123

Analysis involving Regression

  -> Emergency calls to the New York Auto Club                         145
  -> Purchasing power parity: is it true?                              153
  -> PCB contamination of U.S. bays and estuaries                      164
  -> Electricity usage, temperature and occupancy                      177
  -> The effectiveness of National Basketball Association guards       201
  -> The possibility of voting fraud in an election                    213
  -> The birth of a beluga whale                                       256

APPLIED PROBABILITY

Binomial random variable; Central Limit Theorem; Conditional probability; Definitions of probability; Extra-Binomial variation; Gaussian (normal) distribution
   F The flight of the space shuttle Challenger                         33
   F Random drug and disease testing                                    37
   G Amniocentesis, blood tests, and Down's syndrome                    41
   G Perceptions of the New York City subway system: 
     safety and cleanliness                                             43
   O Perceptions of the New York City subway system: other issues       47
   F The Central Limit Theorem for census data                          48
   F The sampling distribution for the median                           59
   F The sampling distribution for the standard deviation               67
   F Racial imbalance in Nassau County public schools                   72

Statistical Inference

  -> Air bags and types of automobiles                                 134
  -> A further look at the reporting of sexual partners 
     by men and women                                                  137

STATISTICAL INFERENCE

Comparison of Binomial proportions; Confidence interval; Contingency tables; Hypothesis testing; Paired sample comparisons; Prediction interval; t--statistic; Tests for difference in location; Tests of independence; Two-sample (unpaired sample) comparisons; z-statistic
   F The return on stocks in the Over the Counter market                82
   F Volume and weight from a vineyard harvest                          86
   O The performance of stock mutual funds compared to the 
     market as a whole                                                  99
   O The return on stocks in the New York Stock Exchange               100
   F Baseball free agency: do teams get what they pay for?             101
   O Baseball free agency: another look at player effectiveness        112
   F Reporting of sexual partners by men and women                     113
   F The comparative volatility of stock exchanges                     123
   G Validity of t-testing for the sexual partners reported 
     by men and women                                                  129
   O Subgroups in the "Old Faithful" geyser data                       130
   O Mortgage rates for different types of mortgages                   131
   F Condom use and the prevention of AIDS                             132
   F Air bags and types of automobiles                                 134
   F A further look at the reporting of sexual partners 
     by men and women                                                  137
   O Betting on professional football -- can you beat the bookies?     142
   O Formal investigation of the perceptions of the New York City
     subway system                                                     144

Analysis involving Regression

  -> Emergency calls to the New York Auto Club                         145
  -> PCB contamination of U.S. bays and estuaries                      164

ANALYSIS INVOLVING REGRESSION

Indicator variables; Lagged variables; Model selection; Multiple regression; Prediction; Regression coefficients; Regression diagnostics; Residuals; Semilog model; Simple regression; Time series data; Transformation; Unusual observations; Weighted regression
   F Emergency calls to the New York Auto Club                         145
   F Purchasing power parity: is it true?                              153
   F PCB contamination of U.S. bays and estuaries                      164
   F Electricity usage, temperature and occupancy                      177
   G Long and short term performance of stock mutual funds             185
   F Estimating rates of return of investments                         186
   O Predicting international adoption visas from earlier years        189
   O The pattern in employment rates over time                         190
   F Estimating a demand function                                      191
   F The effectiveness of National Basketball Association guards       201
   F The possibility of voting fraud in an election                    213
   G Purchasing power parity and high inflation countries              222
   G Prediction of the time interval between "Old Faithful" eruptions  224
   G The success of teams in the National Hockey League                227
   G The effectiveness of NBA guards: assists per minute               230
   O The effectiveness of NBA guards: beyond points and assists        232
   O Another look at emergency calls to the New York Auto Club         233
   O Subgroups in the electricity consumption data                     234
   O A closer look at productivity and quality in the assembly plant   235
   O Predicting mortgage rates for different types of mortgages        236
   F Better prediction of the yield in a small vineyard                237
   G Incomes of Long Island communities                                247
   G Looking more deeply into the average weight per lug 
     from a vineyard                                                   250
   O Predicting incomes of Long Island communities                     252
   G Prediction of yields from partial harvesting in a small vineyard  253
   F The birth of a beluga whale                                       256
   F Estimating a demand function -- it's about time                   276
   G A comparison of the Dow Jones Industrial Average and 
     the S & P 500 index                                               283
   O Further investigation of the DJIA and the S & P 500 index         285

The preceding material consists of excerpts from A CASEBOOK FOR A FIRST COURSE IN STATISTICS AND DATA ANALYSIS by Chatterjee, Handcock and Simonoff, copyright 1995 John Wiley & Sons, Inc., and is transmitted by permission of John Wiley & Sons, Inc. You may download this material for use in connection with the CASEBOOK by you or your students but you may not otherwise reproduce, republish or re-transmit it. Please bear in mind that this material, like any information transmitted over the Internet, is subject to change and/or tampering during the transmission process. The authors and John Wiley & Sons, Inc. expressly disclaim any liability for any use or application of this material.