? *******************************************************************
? Part 1.  Modifications of least squares to estimate inefficiency
? *******************************************************************
? a.  Define a production function. We omit the Materials variable
?     for reasons explored in part 5 below.
?
NAMELIST ; X = ONE,LF,LE,LL,LP $
?
?     Fit the production function by least squares, then shift the constant 
?     term up so that all points are below it.  Then change the sign, to 
?     make it easy to interpret the residuals as the inefficiencies.
?     The mean inefficiency is 0.418 or about 42%.  Seems high. What
?     are the minimum and maximum?
?
REGRESS ; Lhs = LQ ; Rhs = X ; Res = e $
CALC    ; Maxe = Max(e) $
CREATE  ; uicols = Maxe - e $
DSTAT   ; Rhs = uicols $

FRONTIER ; Lhs = LQ ; Rhs = X ; Model = COLS 
         ; Techeff = eucols $
KERNEL   ; Rhs = eucols ; Grid $

?
?
? b.  Now, assume a distribution for u(i) and shift the function
?     up by the mean of the residuals instead.  This is an alternative
?     way to estimate u(i). It lowers the mean inefficiency to something
?     more plausible, 18.7%, but it leaves some of them negative, which
?     is difficult to reconcile with the theory. This is not an very
?     favorable way to fit the model.  This procedure manipulates the
?     same OLS residuals.
?
?     Modified OLS.  For the exponential model, the standard deviation 
?     is 1/theta.  We already have an estimator of this parameter.
?
?     Mean inefficiency is now about 18.7%

CALC    ; thetainv = sdv(e) $
CREATE  ; uimols = thetainv - e $
DSTAT   ; rhs=uimols $

? *******************************************************************
? Part 2.  Stochastic frontier model
? *******************************************************************
?
?     Now fit the normal - half normal stochastic frontier model.
?
FRONTIER ; Lhs = LQ ; Rhs = X ; Techeff = euihn ; Eff=uihn$
KERNEL   ; Rhs = euihn, eucols 
         ; Grid
         ; Title=Half Normal Model vs. Corrected OLS $

?     Coefficients and variance parameters look mostly ok.
?     Negative output elasticity (also significant) for labor is
?     a problem, however.  From the output for the model, we have the
?     estimates of the variance parameters,
?|            Sigma(v)        =       .14824   |
?|            Sigma(u)        =       .18661   |
?     so the "u" part appears to be larger. But, be careful. The
?     standard deviation of u is sqr[(pi - 2)/pi] * sigma(u) = 0.11249.
?
CALC    ; List ; SDU = Sqr((pi - 2)/pi) * .18661 $
?
?     To test the hypothesis of the frontier model, we use the log
?     likelihoods for the SF model and for OLS.  From the results already
?     computed, 
?
?     logL(frontier) = 68.54675.  LogL(OLS) is 67.01375.
?     2*difference is 2*1.533 = 3.066 < 3.84.  So, it looks like the 
?     stochastic part of the frontier model is not significant. But, in 
?     the results for the SF model, the t ratio for lambda is far greater 
?     than 2. The only way lambda can be nonzero is if sigma(u) is nonzero, 
?     so on this basis, the hpothesis of the regression model against the 
?     frontier model is rejected. This is a contradictory result that, 
?     we assume, results from having a finite sample.  There is one more
?     statistic, the LM statistic, which equals 4.882.  This is consistent
?     with the rejection of the null hypothesis of no inefficiency.


? *******************************************************************
? Part 3.  Exponential and Rayleigh Models
? *******************************************************************
?
? To fit the exponential model, we just add ;MODEL=E to the command.

FRONTIER ; Lhs = lq ; Rhs = one,lf,le,ll,lp ; Model=Exponential 
         ; Techeff=euiexp ; Eff = uiexp $
?
? The Rayleigh model is done likewies
?
FRONTIER ; Lhs = lq ; Rhs = one,lf,le,ll,lp ; Model=Rayleigh 
         ; Techeff=euiray ; Eff = uiray $
?
? Now, examine the distribution of the estimates of E[u(i)|e(i)].
?
KERNEL ; Rhs = euihn,euiexp,eucols,euiray 
       ; Grid
       ; Title=Estimated Technical Efficiency from Several Models $
DSTAT  ; Rhs = uihn,uiexp,uicols,uimols,uiray $

FRONTIER ; lhs = lq ; rhs=one,lf,le,ll,lp ; model=r ; techeff=euiray $
?
?     The sample mean is .098, the standard deviation is .064.  From the model, 
?     since theta = 10.2, the implied mean of ui is 1/theta=.098.  The 
?     implied standard deviation is also .098.  The standard deviation of 
?     the computed uis is only .064, but this is because the estimates are 
?     not of ui but of E[ui|epsilon_i].  So it should have a smaller variance. 


? *******************************************************************
? Part 4.  Observed heterogeneity in the model. Explaining u(i)
? *******************************************************************
?
? The three extra variables are included in the model to attempt
? to account for the inefficiency based on known factors.
? We first repeat the earlier estimation to get u(i)
?
FRONTIER ; Lhs = lq ; rhs = x ; techeff = euihn $
REGRESS  ; Lhs = euihn ; rhs = one,loadfctr,lstage,points $
?
? Now add the observed heterogeneity to the model and recompute the
? inefficiencies.  The correlation is very high, but the differences
? are visible in a plot.  One airline in particular, seems to stand out.
?
FRONTIER ; Lhs = lq ; rhs = x,loadfctr,lstage,points 
         ; techeff=euihnz $
CALC     ; List ; Cor (euihn, euihnz) $
?
? Plotting UI against UIA.  Including UI in the plot adds a 45 degree
? line to the figure.  Now it is clear that UIA is always less than
? UI.  The explanation is that the heterogeneity, load factor, etc.,
? has accounted for some of what we called inefficiency.
?
PLOT     ; Lhs = euihn ; Rhs = euihnz 
         ; Rh2 = euihn 
         ; Grid
         ; Title=Estimated Efficiency with Heterogeneity vs. No Heterogeneity $
?
? How do the 'z' variables affect efficiency.  The following plots the
? average efficiency for the sample against the range of values of
? load factor.
?
SIMULATE ; Scenario: & loadfctr = .4(.05).95 ; Plot(ci) $


? *******************************************************************
? Part 5.  A vexing problem. The dreaded error 315.
? *******************************************************************
?
? We now try to complete the production function by including the 5th
? factor, materials in it.  Notice that the model "doesn't work" any
? more.  The MLE is OLS with zero inefficiency.  At this point, one
? should question the model.  What do you think we should do next?
?
FRONTIER ; Lhs = lq ; rhs = x,lm,loadfctr,lstage,points $
REGRESS  ; Quietly
         ; Lhs = lq ; Rhs = x,lm,loadfctr,lstage,points
         ; Res = u315$
KERNEL   ; Rhs = u315;normal $
?
? One thing we might do is add a restriction to the model. Note, we did
? this earlier, implicitly, by omitting LM from it.  Now, instead, we
? impose constant returns to scale. Now it works.  Is constant returns
? to scale a reasonable restriction?  Keep in mind, the labor coefficient
? in our results is persistently negative, so in any event, the whole
? model is suspect. Note the 315 warning is given based on the unconstrained
? OLS residuals.
?
FRONTIER ; Lhs = lq ; rhs = x,lm,loadfctr,lstage,points 
         ; cml: lf+lm+le+ll+lp = 1 $

? Here we do an experiment. The OLS residuals produce the 325 error.
? When the CRTS restriction is imposed, the residuals are negatively
? skewed, which is what we need. The following examines the two sets
? of residuals.

REGRESS  ; Quietly ; Lhs = lq ; rhs = x,lm,loadfctr,lstage,points
         ; Res = OLS $
REGRESS  ; Quietly ; Lhs = lq ; rhs = x,lm,loadfctr,lstage,points 
         ; cls: lf+lm+le+ll+lp = 1 ; Res=CNS_OLS$
DSTAT    ; Rhs = OLS,CNS_OLS ; Normality $
KERNEL   ; Rhs = OLS,CNS_OLS $



? *******************************************************************
? Part 6.  Comparing stochastic frontier and DEA.
? *******************************************************************
?
? We now use DEA as an alternative method of examining inefficiency.
? To begin, we recompute the normal-half normal model and its estimates
? of the inefficiency terms. These are then translated into estimates
? of efficiency.
?
FRONTIER  ; Lhs = LQ ; Rhs = X ; techeff= euisf $
?
? We use data envelopment rather than SF to compute efficiency firm
? by firm.  Note, for DEA, we use levels, not logs.  Also, of course,
? there is no constant term. (There is no "function." LHS and RHS here
? just tell LIMDEP what variables is output and what the inputs are.
? We pick up the input oriented efficiency from the computation
?
FRONTIER ; Lhs = output ; Rhs = fuel,eqpt,labor,prop ; alg=dea $
CREATE   ; euidea=deaeff_o$
?
DSTAT    ; Rhs = euisf,euidea $
?
? Additional ways to compare the two.  The two sets of results are not
? actually all that different.  Closer than in many other studies.
?
PLOT     ; Lhs=euidea ; Rhs=euisf ; Rh2 = euidea $
CALC     ; List  ; Cor(euidea,euisf)$
KERNEL   ; Rhs=euisf,euidea
         ; Grid ; Title=Densities for SF and DEA Efficiencies $
?
? The DEA computation has no direct way to incorporate heterogeneity
? in the computation.  Some researchers compute a second step analysis
? by regressing the estimated efficiencies on the interesting variables.
? Since some of the efficiency values are 1.0 by construction, some have
? used a tobit model to account for this.  (This is not necessarily a
? good idea, as the data are not at all generated by a tobit model. 
? But, it has been done.)  In a compromise, others have used a truncated
? regression.  Not necessarily a better idea, but it has been done.
? What do you find?

REGRESS  ; lhs = euidea;rhs=one,loadfctr,lstage,points$
?
?  A followup exercise:  The DEA computation has produced a second
?  efficiency measure, DEAEFF_I. We analyzed the input based measure
?  above, DEAEFF_O.  Repeat the exercise with the output oriented 
?  measure, DEAEFF_I.  Do you get the same results?



?********************************************************************
?
? This exercise uses the Spanish Dairy Data.  We compare SFA and
? DEA.  The similarity of the predictions is striking.
?
?********************************************************************
?
SETPANEL ; Group = farm ; Pds=T $
NAMELIST ; (new) ; means=cowsbar,landbar,laborbar,feedbar $
NAMELIST ; factors = cows,land,labor,feed $
CREATE   ; means = Group Mean (factors,pds=t)$
CREATE   ; milkbar=Group Mean (milk,pds=t) $
CREATE   ; yb=log(milkbar) 
         ; x1b=log(cowsbar)  ;x2b=log(landbar)
         ; x3b=log(laborbar) ;x4b=log(feedbar)$
CREATE   ; Output = Milkbar/10000 ; Food = feedbar/10000 $
FRONTIER ; If [ year = 98] ; Lhs = output
         ; Rhs = cowsbar,landbar,laborbar,food
         ; Alg=DEA ; List $
FRONTIER ; If[year=98] ;Lhs = yb ; Rhs = one,x1b,x2b,x3b,x4b 
         ; techeff = eusf $
DSTAT    ; if[year=98] ; Rhs = eusf,deaeff_o $
PLOT     ; if[year=98] ; Lhs=eusf ; Rhs = deaeff_o ; Rh2=eusf
         ; Title=Stochastic Frontier Efficiency vs. DEA
         ; Grid $