We have attempted in this casebook to present cases representing situations and contexts from a diverse set of fields, where statistical analysis is required to arrive at a meaningful conclusion. Topics covered include eruptions of the "Old Faithful" geyser, the issuance of international adoption visas, the space shuttle Challenger tragedy, patterns in the Dow Jones Industrial Average and Standard and Poor's index, health expenditures of states, random drug and disease testing, baseball free agency, performance of NBA guards, energy consumption of a household, grape yields in a vineyard, and the birth and nursing of a beluga whale calf. All of the datasets are real and complete.
Each case is motivated by a question that needs to be answered, and full background material is presented. The statistical analysis flows naturally from the question. The discussion given in the cases attempts to demonstrate the logic of the analysis and emphasize the interactive and iterative nature of the task. The aim of these cases is to show the reader by example that statistical analysis clarifies and throws light on a complex situation. It enables one to draw useful conclusions. Besides the final conclusion, much is learned about the problem during the analysis. The journey, as well as the arrival, matters.
In addition to investigation of the specific questions raised by a particular case, we hope that the reader also will develop a feel for the kind of approach to data analysis that is likely to be fruitful in general. As statistical software has become generally available, the possibilities of superficial, but inadequate, analysis of data have increased correspondingly. However, if a data analyst is trained to develop a system of general principles in performing a data analysis that are widely applicable, it is much more likely that she will analyze future data sets in a reasonable way. It is our hope that this casebook can be helpful in highlighting the kinds of questions that need to be answered if such a system is being used.
The second category of cases in the book consists of ones that are partially analyzed, in that no analysis is actually performed, but suggestions are made that guide the reader along the path to an effective analysis. This will get the reader gradually involved in direct analysis of the data. In the third category of cases, we present the problem, pose questions, and provide the data. No analysis is suggested; the analyst is on her own. Readers who have worked through the first two categories of cases should find the cases in the third category within their capabilities. The hope of the authors is that, having worked through the cases in the book, the reader will be well prepared to analyze her own data in a real-life setting.
Each case is organized in the same format. Each case describes the background of the data, and poses the questions that are to be answered by statistical analysis. In the cases that are completely analyzed, the analysis is described in detail. The actual sequence of steps followed is noted. The cases are written in an informal style, so that the reader can have the sense of examining the data (using a statistical package) on a computer, with a statistically knowledgeable person looking over their shoulder. Finally, the conclusions of the analysis are summarized in a clear, readable, and non-technical form. Presentation of statistical findings with clarity is essential if they are to be taken seriously. The authors feel that the importance of presenting conclusions of a statistical analysis effectively is not often emphasized, due to the lack of instructive examples accessible to the student. In reality, the inability to communicate effectively often leads to the unfortunate circumstance that little or no attention is paid to the statistical analysis and the findings.
The casebook probably contains more material than can be covered in a single course. This remark holds particularly concerning the material contained on regression analysis. We have provided the extra material with the hope that it will be used in the teaching of a course in applied regression analysis (the second course). Availability of this case material should considerably ease the learning and understanding of the concepts in regression analysis.
Besides students of an introductory statistics course, this book is likely to prove useful to anyone who is interested in learning how to apply statistical analysis to data encountered in practice. For people who are well versed only in mathematical statistics, the book will provide a useful supplement to theory. Statistics is an applied discipline and justifies itself only in useful application. This casebook is intended to be a pioneer in the line of statistical texts, in that it is designed for the beginner, with this vision and motivation. We would be interested to hear from our readers concerning how well we have succeeded in this enterprise. The authors can be reached electronically on the Internet at the addresses schatter@stern.nyu.edu, handcock@stat.washington.edu, and jsimonof@stern.nyu.edu, respectively.
We are always on the lookout for novel applications and interesting data and welcome any submissions in this area from our readers. To foster the exchange of ideas and material we have provided Internet access to a growing archive of supplementary cases similar in style to those in this book. The archive can be accessed by gopher or the World Wide Web (WWW) using a WWW browser (e.g., Netscape). We also will maintain a list of useful suggestions sent to us and updated information about the cases in the book. Information on how to access this archive is given in the Appendix.
We would like to thank the many hundreds of introductory statistics students at New York University who have been (unwitting) guinea pigs in the development of these cases. One of the joys of discussing real data with students is that they bring their own backgrounds and experiences to the discussion, with the result that the teacher learns as much as the students do. We would like to thank David Ahlstrom, Orley Ashenfelter, Mark and Barbi Barnhill, Charlie Himmelberg, Jeanne McLaughlin-Russell, Martina Morris, Sundar Polavaram, Tom Pugel and Brooke Squire for providing data sets that are used in the cases. We would also like to thank our colleagues at New York University for teaching from a draft version of this book, and contributing to the final version. Special thanks go to Halina Frydman, Joel Owen and Donald Richter.
SAMPRIT CHATTERJEE
MARK S. HANDCOCK
JEFFREY S. SIMONOFF