Data quality issues in SDC
In “IPO pricing in
the dot-com bubble” (joint with Bill Wilhelm), I use hand-gathered data
for 2,399 IPOs in 1996-2000 because the quality of SDC’s data for some key
variables is questionable. The following summarizes the main problems for shares outstanding, venture
backing, and syndicate size. For problems
with SDC’s overallotment exercise data, see Ellis, Michaely, and O’Hara
(Journal of Finance, 2000). For specific corrections to SDC’s classification of
unit offerings and some accounting numbers, see Jay Ritter’s web site.
A comparison
of shares outstanding as reported by SDC (variables OUT and OUTPF) and
hand-collected from prospectuses for the 2,399 IPOs in 1996-2000 reveals
significant and widespread reporting errors in SDC. In dollar terms, SDC
understates pre-IPO capitalization by $18.6 million on average. SDC overstates
post-IPO capitalization by $34.1 million on average.
These errors
in SDC have the potential to affect empirical work that
i)
uses firm
capitalization to control for differences in size (say to proxy for risk)
ii)
matches
IPO firms by size and book-to-market to non-issuers (long-run performance
studies)
iii)
computes
insider retention or selling ratios or ‘share overhang’ measures
iv)
controls
for ‘free float’ in the after-market
v)
uses
per-share data such as EPS, book value per share etc.
The following
provides an overview of the data problems in SDC.
1. Shares outstanding before the IPO (SDC
variable OUT)
a. SDC reports shares outstanding for
2,197 of the 2,399 firms, missing 202 firms.
b. Only 584 (26.6%) of those covered are
correct.
c. The average error (= SDC / truth – 1)
is 12.5% (median: 0).
d. In 954 cases, SDC’s number is too small.
The average error is –36.3% (median: –29%).
e. In 659 cases, SDC’s number is too
large. The average error is +94.3% (median: +32.2%).
2. Shares outstanding after the IPO (SDC
variable OUTPF)
a. SDC reports shares outstanding for
2,277 of the 2,399 firms, missing 122 firms.
b. Only 537 (23.6%) of those covered are
correct.
c. The average error (= SDC / truth – 1)
is 14.9% (median: 0).
d. In 522 cases, SDC’s number is too
small. The average error is –30.2% (median: –20.6%).
e. In 1,218 cases, SDC’s number is too
large. The average error is +40.8% (median: +4%).
3. Frequent errors
a. Shares outstanding after the IPO should
equal shares outstanding before plus primary shares issued. This basic identity
is violated in 1,423 cases in SDC.
b. Shares outstanding pre-IPO should be
smaller than or equal to shares outstanding post-IPO. This condition is
violated in 115 cases.
c. In 95 cases, SDC reports as shares
outstanding pre-IPO what is in fact the number of shares outstanding post-IPO.
d. In 29 cases, SDC reports as shares
outstanding post-IPO what is in fact the number of shares outstanding pre-IPO.
e. Sampling indicates that SDC collects
information regarding shares outstanding from early S-1 filings rather than IPO
prospectuses. This can lead to two types of errors. First, shares outstanding
will change between the S-1 and the final prospectus as options and warrants
are exercised and preferreds are converted at their holders’ request. This will
normally lead to relatively small differences in shares outstanding. Second,
and more importantly, companies appear to often reverse-split their shares
between their S-1 and the final prospectus, leading SDC to underreport final
shares outstanding by factors like 2-to-1 or worse.
4. Cross-section
a. The percentage error in reported shares
outstanding increases in the offer price and decreases in true shares
outstanding. This holds for pre and post-IPO shares outstanding.
b. The percentage error in reported shares
outstanding post-IPO increases significantly in underpricing.
c. It also increases in Loughran and Ritter’s
(2001b) underwriter reputation measure.
The error rate in the 1996-2000 sample is many times higher than that in
a reference sample that we compiled consisting of all IPOs in the 4th
quarter of 1993. In the reference sample, SDC’s figures for shares outstanding
are correct for 154 of the 185 offerings (83.2%), and the 31 incorrect cases
are, with three exceptions, minor (possibly rounding) errors.
We identify venture-backed IPOs based on a reading of the “Principal shareholders”
and “Recent transactions” sections contained in the IPO prospectuses. Comparing
our identification to SDC’s variable VE, we notice three sources of error:
i)
False negatives: SDC occasionally misses offerings
that are backed by well-known VC funds, such as Golder, Thoma, Cressey and
Rauner.
ii)
False positives: SDC occasionally identifies as
venture capitalists limited partnerships that are in fact owned by an executive
director or his/her family.
iii)
Inconsistent treatment: SDC’s flag identifying venture-backed
IPOs is inconsistent with respect to firms backed by private equity funds, such
as buy-out funds managed by Warburg Pincus, KKR, Blackstone, or Hicks Muse. We
include all such firms in our definition of venture-backed issued.
SDC’s variable NUMAMGR is defined as the “number of managers including
international co-managers”. From 1999 onwards, SDC counts every syndicate
member so NUMAMGR equals syndicate size. Prior to 1999, it appears, SDC
includes in NUMAMGR only lead and co-managers. This definitional change means
that there is no consistent measure of syndicate size in SDC. We therefore
hand-collect syndicate size for 1996-1998. In so doing, we include banks in the
international syndicate but avoid double-counting banks that are included in
both the international and the U.S. syndicates. We also correct 61 errors in
SDC’s NUMAMGR variable for 1999-2000.
Alexander Ljungqvist
New York University
Stern School of Business
Phone 212-998-0304
Fax 212-995-4233
E-mail aljungqv@stern.nyu.edu
1 May 2002