# Preface

The most effective way to learn statistics is by actively engaging in doing the statistical analysis. This idea drives this casebook. An introductory course in statistics often fails to give the students an idea of the excitement of statistics, and its relevance in the present day world. A considerable amount of material has to be covered, with no complementary time for discussion of real life examples. Students often come away with a blurred impression of formulas, and some words like "mean," "standard deviation," and "regression." The point that statistical analysis is vital to arrive at conclusions in a sensible and rational manner is often neglected. This casebook is an attempt to remedy this deficiency by providing an active resource for classroom use. The book is based on cases that we have developed through almost fifty cumulative years of teaching the introductory statistics course at New York University.

We have attempted in this casebook to present cases representing situations and contexts from a diverse set of fields, where statistical analysis is required to arrive at a meaningful conclusion. Topics covered include eruptions of the "Old Faithful" geyser, the issuance of international adoption visas, the space shuttle Challenger tragedy, patterns in the Dow Jones Industrial Average and Standard and Poor's index, health expenditures of states, random drug and disease testing, baseball free agency, performance of NBA guards, energy consumption of a household, grape yields in a vineyard, and the birth and nursing of a beluga whale calf. All of the datasets are real and complete.

Each case is motivated by a question that needs to be answered, and full background material is presented. The statistical analysis flows naturally from the question. The discussion given in the cases attempts to demonstrate the logic of the analysis and emphasize the interactive and iterative nature of the task. The aim of these cases is to show the reader by example that statistical analysis clarifies and throws light on a complex situation. It enables one to draw useful conclusions. Besides the final conclusion, much is learned about the problem during the analysis. The journey, as well as the arrival, matters.

In addition to investigation of the specific questions raised by a particular case, we hope that the reader also will develop a feel for the kind of approach to data analysis that is likely to be fruitful in general. As statistical software has become generally available, the possibilities of superficial, but inadequate, analysis of data have increased correspondingly. However, if a data analyst is trained to develop a system of general principles in performing a data analysis that are widely applicable, it is much more likely that she will analyze future data sets in a reasonable way. It is our hope that this casebook can be helpful in highlighting the kinds of questions that need to be answered if such a system is being used.

### The casebook and your introductory course topics

The material presented in this book can be analyzed using techniques that are almost always taught in an introductory course. The cases should be used concurrently with the statistical methods discussed in a course (or an independent study of introductory statistics). The cases are grouped by broad statistical topics, and are arranged by topics in a sequence that is conventionally followed in a beginning course. The cases can be done in or out of sequence. There is, however, a certain sense of progression in the material. Concepts from data analysis are used in cases dealing with applied probability, and statistical inference. Cases on regression analysis draw heavily on material covered in data analysis and inference (indeed, general issues of data analysis pervade almost all of the cases).

### Using the different kinds of cases

The cases presented in the book fall into three categories. Cases in the first category are analyzed completely. These are meant to be models or paradigms for the system of general principles mentioned earlier. We have presented the approach that, to us, appears most simple and direct. There may be other ways of analyzing the same data. We feel confident, however, that the conclusions reached by a valid alternative analysis will be very similar to the ones that we reach. There are many paths to the summit! We would like to hear from any reader who finds a discrepancy between our conclusions and theirs and also welcome suggestions for effective analyses different from the ones that we have presented.

The second category of cases in the book consists of ones that are partially analyzed, in that no analysis is actually performed, but suggestions are made that guide the reader along the path to an effective analysis. This will get the reader gradually involved in direct analysis of the data. In the third category of cases, we present the problem, pose questions, and provide the data. No analysis is suggested; the analyst is on her own. Readers who have worked through the first two categories of cases should find the cases in the third category within their capabilities. The hope of the authors is that, having worked through the cases in the book, the reader will be well prepared to analyze her own data in a real-life setting.

Each case is organized in the same format. Each case describes the background of the data, and poses the questions that are to be answered by statistical analysis. In the cases that are completely analyzed, the analysis is described in detail. The actual sequence of steps followed is noted. The cases are written in an informal style, so that the reader can have the sense of examining the data (using a statistical package) on a computer, with a statistically knowledgeable person looking over their shoulder. Finally, the conclusions of the analysis are summarized in a clear, readable, and non-technical form. Presentation of statistical findings with clarity is essential if they are to be taken seriously. The authors feel that the importance of presenting conclusions of a statistical analysis effectively is not often emphasized, due to the lack of instructive examples accessible to the student. In reality, the inability to communicate effectively often leads to the unfortunate circumstance that little or no attention is paid to the statistical analysis and the findings.

### Using the computer

The output, and most plots that appear in the cases (with some editing), were generated using the PC package STATISTIX 4.0 (a product of Analytical Software). Virtually all of the analyses that we present, however, can be performed using commonly available statistical software. We believe that a statistical package that is not capable of performing the kinds of analyses we describe has serious shortcomings and should be abandoned for a more adequate one.

The casebook probably contains more material than can be covered in a single course. This remark holds particularly concerning the material contained on regression analysis. We have provided the extra material with the hope that it will be used in the teaching of a course in applied regression analysis (the second course). Availability of this case material should considerably ease the learning and understanding of the concepts in regression analysis.

Besides students of an introductory statistics course, this book is likely to prove useful to anyone who is interested in learning how to apply statistical analysis to data encountered in practice. For people who are well versed only in mathematical statistics, the book will provide a useful supplement to theory. Statistics is an applied discipline and justifies itself only in useful application. This casebook is intended to be a pioneer in the line of statistical texts, in that it is designed for the beginner, with this vision and motivation. We would be interested to hear from our readers concerning how well we have succeeded in this enterprise. The authors can be reached electronically on the Internet at the addresses schatter@stern.nyu.edu, handcock@stat.washington.edu, and jsimonof@stern.nyu.edu, respectively.

We are always on the lookout for novel applications and interesting data and welcome any submissions in this area from our readers. To foster the exchange of ideas and material we have provided Internet access to a growing archive of supplementary cases similar in style to those in this book. The archive can be accessed by gopher or the World Wide Web (WWW) using a WWW browser (e.g., Netscape). We also will maintain a list of useful suggestions sent to us and updated information about the cases in the book. Information on how to access this archive is given in the Appendix.

We would like to thank the many hundreds of introductory statistics students at New York University who have been (unwitting) guinea pigs in the development of these cases. One of the joys of discussing real data with students is that they bring their own backgrounds and experiences to the discussion, with the result that the teacher learns as much as the students do. We would like to thank David Ahlstrom, Orley Ashenfelter, Mark and Barbi Barnhill, Charlie Himmelberg, Jeanne McLaughlin-Russell, Martina Morris, Sundar Polavaram, Tom Pugel and Brooke Squire for providing data sets that are used in the cases. We would also like to thank our colleagues at New York University for teaching from a draft version of this book, and contributing to the final version. Special thanks go to Halina Frydman, Joel Owen and Donald Richter.

SAMPRIT CHATTERJEE
MARK S. HANDCOCK
JEFFREY S. SIMONOFF

Almaty, Kazakhstan
Croton-on-Hudson, New York