
Panel Data Econometrics
Panel
Data Sets
Professor W. Greene
Department of Economics
Office: MEC 7-78, Ph. 998-0876, Fax. 995-4218
E-mail: wgreene@stern.nyu.edu
Home Page: http://stern.nyu.edu/~wgreene
Return to course home page.
Notes: The following list points to a series of data sets. We will use
some of these in our class discussions. A number of others are provided
for students to analyze as part of their study of the topic. Note, there
are two major cross country data bases online that provide a wealth of interesting
data. These can be accessed directly: (The sites are accessed below
just by clicking the names.)
The Penn
World Tables
Barro's
Cross Country Data
There are many other sources of data on the
web. One that is particularly rich is the archives of the Journal of
Applied Econometrics: (Click here to
visit)
Data below are provided in three formats: (1)
The 'Text format' is a plain vanilla ascii text file containing the variable
names at the top of the file followed by the variables, arranged neatly in the
file. (2) The .XLS is the closest thing we have right now to a generic file
format. Most econometric programs can import an Excel spreadsheet
file. If your cannot, you can use Excel to write it in another format,
or, perhaps, use the ASCII text file. (3) If you are using LIMDEP or NLOGIT,
the project file can be imported directly into the program, as is.
- Grunfeld Investment Data, 10 Firms, 20
Years (1935-1954)
Variables in the file are
Firm = Firm ID, 1,...,10
Year = 1935,...,1954
I = Investment
F = Real Value of the Firm
C = Rea; Value of the Firm's Capital Stock
Data are from the Ph.D. dissertation of Y. Grunfeld (Univ. of Chicago,
1958), See, e.g., Zellner, A., "An Efficient Method of Estimating
Seemingly Unrelated Regression Equations and Tests for Aggregation
Bias," Journal of the American Statistical Association, 57,
1962, pp. 348-368 for analyses of these data.
- Spanish Dairy Farm Production, N = 247, T = 6
Variables in the file are
FARM = Farm ID
YEAR = year, 93, 94, ..., 98
Inputs
COWS, X1 = log of, deviations from means
(logs)
LAND, X2 = same
LABOR, X3 = same
FEED, X4 = same
Translog terms = squares and cross
products: X11, X22, X33, X44, X12, X13, X14,X23, X24, X34
YEAR93,...,YEAR98 = year dummy
variables
Output
MILK = farm output
YIT = log of MILK production
- Bank Cost Data, 500 Banks, 5 Years:
Variables in the file are
Cit = total cost of transformation of financial and physical
resources into loans and
investments = the sum of
the five cost items described below;
Y1it = installment loans to individuals for personal and
household expenses;
Y2it = real estate loans;
Y3it = business loans;
Y4it = federal funds sold and securities purchased under
agreements to resell;
Y5it = other assets;
W1it = price of labor, average wage per employee;
W2it = price of capital = expenses on premises and fixed
assets divided by the dollar value of
of premises and fixed assets;
W3it = price of purchased funds = interest expense on money
market deposits plus expense of
federal funds purchased and securities sold
under agreements to repurchase plus interest
expense on demand notes issued by the U.S.
Treasury divided by the dollar value of
purchased funds;
W4it = price of interest-bearing deposits in total
transaction accounts = interest expense on
interest-bearing categories of total
transaction accounts;
W5it = price of interest-bearing deposits in total nontransaction
accounts = interest expense on
total deposits minus interest expense on money
market deposit accounts divided by the
dollar value of interest-bearing deposits in
total nontransaction accounts;
T = trend variable, t = 1,2,3,4,5 for years 1996, 1997, 1998,
1999, 2000
The data in the file are for a translog cost function, linearly homogeneous in
the input prices. Specifically,
C = log(Cost/W5), W1,W2,W3,W4 = log(Wj/W5), Q1,...,Q5 = log(Ym), and the
squared and cross product
terms are W11, W12,..., Q11,Q12,..., W1Q1,...,W4Q5, T, T2, TW1,...,TW4,
TQ1,...,TQ5.
- Dahlberg and Johansson Municipal
Expenditure Data, 265 Swedish Municipalities, 9 years
Variables in the file are
ID = Identification, 1,..., 265
YEAR = year, 1979,...,1987
EXPEND = Expenditures
REVENUE = Receipts, taxes and Fees
GRANTS = Government grants and shared tax revenues
See Greene (2003, pp. 551 and elsewhere) for analysis of these data. The
article on which the analysis is based is Dahlberg, M. and E. Johannson,
E., "An Examination of the Dynamic Behavior of Local Governments
using GMM Bootstrapping Methods," Journal of Applied Econometrics,
15, 2000, pp. 401-416. (These data were downloaded from the JAE data
archive.)
- World Gasoline Demand Data, 18 OECD
Countries, 19 years
Variables in the file are
COUNTRY = name of country (Does not appear in the LIMDEP project file)
YEAR = year, 1960-1978
LGASPCAR = log of consumption per car
LINCOMEP = log of per capita income
LRPMG = log of real price of gasoline
LCARPCAP = log of per capita number of cars
See Baltagi (2001, p. 24) for analysis of these data. The article on which the
analysis is based is Baltagi, B. and Griffin, J., "Gasolne Demand in the
OECD: An Application of Pooling and Testing Procedures," European Economic
Review, 22, 1983, pp. 117-137. The data were downloaded from the website
for Baltagi's text.
- Statewide Capital Productivity Data,
lower 48 states, 17 years
Variables in the file are
STATE = state name
ST_ABB = state abbreviation (not in project file)
YR = year, 1970,...,1986
P_CAP = public capital
HWY = highway capital
WATER = water utility capital
UTIL = utility capital
PC = private capital
GSP = gross state product
EMP = employment
UNEMP = unemployment rate
See Baltagi (2001, p. 25) for analysis of these data. The article on which
the analysis is based is Munell, A., "Why has Productivity Declined?
Productivity and Putlic Investment," New England Economic Review,
1990, pp. 3-22. The data were downloaded from the website for
Baltagi's text.
- Cornwell and Rupert Returns to Schooling
Data, 595 Individuals, 7 Years
Variables in the file are
EXP = work experience
WKS = weeks worked
OCC = occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA = 1 if resides in a city (SMSA)
MS = 1 if married
FEM = 1 if female
UNION = 1 if wage set by unioin contract
ED = years of education
BLK = 1 if individual is black
LWAGE = log of wage
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient
Estimation with Panel Data: An Empirical Comparison of Instrumental
Variable Estimators," Journal of APplied Econometrics, 3, 1988, pp.
149-155. See Baltagi, page 122 for further analysis. The data
were downloaded from the website for Baltagi's text.
- German Manufacturing Innovation Data,
1,270 Firms, 5 years
Variables in the file are
YEAR = year, 1994-1998
FIRM = 1,...,1,270
IP = Product or innovation occurrec, 0/1 variable
EMPL = employment
IM = Imports
IMUM = import share in industry
FDIUM = FDI share in industry
PROD = productivity measure
LOGSALES = log of industry sales
RAWMTL = dummy for firm in raw materials industry
INVGOOD = dummy for firm in investment goods industry
CONSGOOD = dummy for firm in consumer goods industry
FOOD = dummy for firm in food industry
These data were analyzed in Bertschek, I. and M. Lechner, "Convenient
Estimators for the Panel Probit Model," Journal of Econometrics, 87, 2,
1998, pp. 329-372. See, also, Greene, Econometric Analysis, 5th ed.,
(2003) for various analyses, and Greene, W. "Convenient Estimators for the
Panel Probit Model: Further Results, Empirical Economics, 2004. These
data are not publicly available. Extracts from the data set will be
provided in class.
- German Health Care Usage Data, 7,293
Individuals, Varying Numbers of Periods
Variables in the file are
Data downloaded from Journal of Applied Econometrics Archive. This is an
unbalanced panel with 7,293 individuals. They can be used for regression,
count models, binary choice, ordered choice, and bivariate binary choice.
This is a large data set. There are altogether 27,326
observations. The number of observations ranges from 1 to 7.
(Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000,
7=987). Note, the variable NUMOBS below tells how many observations
there are for each person. This variable is repeated in each row of
the data for the person. (Downlo0aded from the JAE Archive)
ID = person - identification number
FEMALE = female = 1; male = 0
YEAR = calendar year of the observation
AGE = age in years
HSAT = health satisfaction, coded 0 (low) - 10 (high) Note,
this variable has 40 coding errors. Variable NEWHSAT below fixes them.
HANDDUM = handicapped = 1; otherwise = 0
HANDPER = degree of handicap in percent (0 - 100)
HHNINC = household nominal monthly net income in German marks /
10000
HHKIDS = children under age 16 in the household = 1; otherwise = 0
EDUC = years of schooling
MARRIED = married = 1; otherwise = 0
HAUPTS = highest schooling degree is Hauptschul degree = 1;
otherwise = 0
REALS = highest schooling degree is Realschul degree = 1; otherwise
= 0
FACHHS = highest schooling degree is Polytechnical degree = 1; otherwise =
0
ABITUR = highest schooling degree is Abitur = 1; otherwise = 0
UNIV = highest schooling degree is university degree = 1; otherwise
= 0
WORKING = employed = 1; otherwise = 0
BLUEC = blue collar employee = 1; otherwise = 0
WHITEC = white collar employee = 1; otherwise = 0
SELF = self employed = 1; otherwise = 0
BEAMT = civil servant = 1; otherwise = 0
DOCVIS = number of doctor visits in last three months
HOSPVIS = number of hospital visits in last calendar year
PUBLIC = insured in public health insurance = 1; otherwise = 0
ADDON = insured by add-on insurance = 1; otherswise = 0
NUMOBS = number of observations for this person. Repeated in each
row of data.
NEWHSAT = recoded value of HSAT with coding errors corrected.
- World Health Organization Panel Data on
Health Care Attainment: 191 Countries, 5 Years (Some countries
fewer)
These data have been used by many researchers to study the Health Care
Survey assembled by WHO as part of the Year 2000 World Health Report. On
the course bibliography, see, for example, Greene (2004a). Note,
variables marked * were updated with more recent sources in Greene
(2004a). Missing values for some of the variables in this data set are
filled by using fitted values from a linear regression. To set the
proper sample for panel data analysis, use observations for which SMALL =
0. To obtain the balanced panel, then use only observations with
GROUPTI = 5.
COMP = composite measure of health care attainment; LCOMP = logCOMP
DALE = Disability adjusted life expectancy (other measure); LDALE =
logDALE
YEAR = 1993,...,1997; TIME = 1,2,3,4,5; T93, T94, T95, T96,
T97 = year dummy variables
HEXP = per capita health expenditure; LHEXP = logHEXP; LHEXP2 =
log-squaredHEXP
HC3 = educational attainment; LHC = logHC3; LHC2 = log-squaredHC3; LHEXPHC
= logHEXP * logHC3
SMALL = indicator for states, provinces, etc. SMALL > 0 implies
internal political unit, = 0 implies country observation
COUNTRY = number assigned to country
STRATUM = another country indicator
GROUPTI = number of observations when SMALL = 0. Usually 5, some = 1, one
country = 4.
OECD = dummy variable for OECD country (30 countries)
GINI = gini coefficient for income inequality
GEFF = world bank measure of government effectiveness*
VOICE = world bank measure of democratization of the political process*
TROPICS = dummy variable for tropical location
POPDEN = population density*
PUBTHE = proportion of health expenditure paid by bublic authorities
GDPC = normalized per capita GDP; LGDPC = logGDPC; LGDPC2 =
log-squaredGDPC
Return to course home
page.