Comment on the case:

Prediction of the time interval between 'Old Faithful' eruptions

Subject: Year-to-year stability of the eruption process
Keywords: lagged variables
Date: Friday, 25 November, 1994
From: Donald Richter (drichter@stern.nyu.edu)
Organization: Department of Statistics and Operations Research, New York University
There are some differences in the 1978 and 1979 data, so that, if you were trying to predict eruptions during September 1979, you would do well not to pool the 1978 and the 1979 data. This in turn raises a question about the year-to-year stability of the eruption data.

In this Comment, let Y denote time interval to the next eruption, and X denote duration of an eruption. We separate out or unstack the two years of data, letting Y1 denote the 107 values of Y from 1978, Y2 the 115 values of Y from 1979, deriving X1 and X2 from X in a similar manner.

A scatter plot of Y versus X appears in the case, "Eruptions of the 'Old Faithful' Geyser". Plots of Y1 versus X1 and Y2 versus X2, as shown in the Appendix below, are similar to the plot of Y versus X, but the two new plots display a difference: namely, the vertical spread of points is larger in the second plot than in the first. That is, the relationship between Y2 and X2 is stronger than that between Y1 and X1.

This can be seen another way. If we regress Y1 on X1, we find R-square = 73.7% and S = 6.68. But if we regress Y2 on X2, we find R-square = 81.8% and S = 5.44, a substantial improvement.

Suppose now, as in the case, we introduce lagged variables and drop the first observation of each day. Let the first order lag of Y2 be denoted by Y2_1, and other lags denoted similarly. Using the first year of data only, we can get a good model with R-square = 79.7% and S = 6.07. Using the data from both years along with the dummy variable YEAR, we can get a better model with R-square = 81.0% and S = 5.67. But using the second year of data only, we can get a still better model with R-square = 83.5% and S = 5.18.

In the case, a model is derived from the 1978 and 1979 data and applied to the year 1985. The tacit assumption is made that the data are stable from year to year. This assumption may be true only approximately.

APPENDIX


                        Scatter Plot of Y1 vs X1
Y1        +---------------------------------------------------+
      100 +                                                   |
          |                                                   |
          |                                           ++      |
          |                                  + +  +           |
          |                          +  + +    +    +  +      |
          |                           +  +  ++ 2  2++ 3       |
       80 +                       +   +  +  +  +   ++ 2  +    |
          |                   +    +     +   + 2  +   +   +   |
          |                          ++  23 2+  + ++  2+      |
          |                             ++   + 2              |
          |          +  +                +   +    +           |
          |                                                   |
       60 +   ++           +              +                   |
          | 2  +              + +                             |
          | + 2+ 2   +                                        |
          | 2 23 2   +                                        |
          |      +                                            |
          |   3                                               |
       40 +                                                   |
          ++---------+---------+---------+---------+---------++
          1.6       2.3       3.0       3.7       4.4       5.1
                                   X1



                        Scatter Plot of Y2 vs X2
Y2        +---------------------------------------------------+
      100 +                                                   |
          |                                                   |
          |                                         +         |
          |                                   +    2    +     |
          |                                  +    +2          |
          |                              +    2+ 2+22    +    |
       80 +                                  2+2 + 23   2     |
          |                               2 222  +2+   +      |
          |                             2 2  +35  +2          |
          |                               3   +  2+           |
          |              +        +         +                 |
          |     ++  +             +                           |
       60 +   +    +    2    2                                |
          |    ++   2                                         |
          |    22  3            +                             |
          |    ++2  3+                                        |
          |      + +                                          |
          |    + +                                            |
       40 +                                                   |
          ++---------+---------+---------+---------+---------++
          1.5       2.3       3.1       3.9       4.7       5.5
                                   X2



LINEAR REGRESSION OF Y1  

PREDICTOR
VARIABLES    COEFFICIENT    STD ERROR     STUDENT'S T       P
---------    -----------    ---------     -----------    ------
CONSTANT        33.8282       2.26182        14.96       0.0000
X1              10.7410       0.62634        17.15       0.0000

R-SQUARED           0.7369      RESIDUAL MEAN SQUARE (MSE)    44.6573
ADJUSTED R-SQUARED  0.7344      STANDARD ERROR OF ESTIMATE    6.68261



LINEAR REGRESSION OF Y2  

PREDICTOR
VARIABLES    COEFFICIENT    STD ERROR     STUDENT'S T       P
---------    -----------    ---------     -----------    ------
CONSTANT        33.2572       1.75087        18.99       0.0000
X2              10.2512       0.45493        22.53       0.0000

R-SQUARED           0.8180      RESIDUAL MEAN SQUARE (MSE)    29.6118
ADJUSTED R-SQUARED  0.8164      STANDARD ERROR OF ESTIMATE    5.44167



LINEAR REGRESSION OF Y1  

PREDICTOR
VARIABLES    COEFFICIENT    STD ERROR     STUDENT'S T       P       VIF
---------    -----------    ---------     -----------    ------    -----
CONSTANT        58.1498       7.12580         8.16       0.0000
X1              8.82446       0.84058        10.50       0.0000      1.8
Y1_1           -0.24920       0.06800        -3.66       0.0004      1.8

R-SQUARED           0.7711      RESIDUAL MEAN SQUARE (MSE)    41.1431
ADJUSTED R-SQUARED  0.7663      STANDARD ERROR OF ESTIMATE    6.41429



LINEAR REGRESSION OF Y1  

PREDICTOR
VARIABLES    COEFFICIENT    STD ERROR     STUDENT'S T       P       VIF
---------    -----------    ---------     -----------    ------    -----
CONSTANT        64.1548       6.96458         9.21       0.0000
X1              8.85425       0.79593        11.12       0.0000      1.8
Y1_1           -0.53582       0.10461        -5.12       0.0000      4.9
X1_1            4.07700       1.17284         3.48       0.0008      4.1

R-SQUARED           0.7969      RESIDUAL MEAN SQUARE (MSE)    36.8845
ADJUSTED R-SQUARED  0.7905      STANDARD ERROR OF ESTIMATE    6.07326



LINEAR REGRESSION OF Y  

PREDICTOR
VARIABLES    COEFFICIENT    STD ERROR     STUDENT'S T       P       VIF
---------    -----------    ---------     -----------    ------    -----
CONSTANT        55.8712       4.22566        13.22       0.0000
X               9.31683       0.47029        19.81       0.0000      1.7
YEAR           -2.99949       0.81605        -3.68       0.0003      1.1
Y_1            -0.37154       0.07021        -5.29       0.0000      5.2
X_1             2.65669       0.79853         3.33       0.0010      4.8

R-SQUARED           0.8103      RESIDUAL MEAN SQUARE (MSE)    32.2007
ADJUSTED R-SQUARED  0.8065      STANDARD ERROR OF ESTIMATE    5.67457



LINEAR REGRESSION OF Y2  

PREDICTOR
VARIABLES    COEFFICIENT    STD ERROR     STUDENT'S T       P       VIF
---------    -----------    ---------     -----------    ------    -----
CONSTANT        44.5381       4.92005         9.05       0.0000
X2              9.39219       0.55085        17.05       0.0000      1.5
Y2_1           -0.12033       0.04826        -2.49       0.0142      1.5

R-SQUARED           0.8348      RESIDUAL MEAN SQUARE (MSE)    26.7848
ADJUSTED R-SQUARED  0.8316      STANDARD ERROR OF ESTIMATE    5.17541