Data quality issues in SDC
In “IPO pricing in the dot-com bubble” (joint with Bill Wilhelm), I use hand-gathered data for 2,399 IPOs in 1996-2000 because the quality of SDC’s data for some key variables is questionable. The following summarizes the main problems for shares outstanding, venture backing, and syndicate size. For problems with SDC’s overallotment exercise data, see Ellis, Michaely, and O’Hara (Journal of Finance, 2000). For specific corrections to SDC’s classification of unit offerings and some accounting numbers, see Jay Ritter’s web site.
A comparison of shares outstanding as reported by SDC (variables OUT and OUTPF) and hand-collected from prospectuses for the 2,399 IPOs in 1996-2000 reveals significant and widespread reporting errors in SDC. In dollar terms, SDC understates pre-IPO capitalization by $18.6 million on average. SDC overstates post-IPO capitalization by $34.1 million on average.
These errors in SDC have the potential to affect empirical work that
i) uses firm capitalization to control for differences in size (say to proxy for risk)
ii) matches IPO firms by size and book-to-market to non-issuers (long-run performance studies)
iii) computes insider retention or selling ratios or ‘share overhang’ measures
iv) controls for ‘free float’ in the after-market
v) uses per-share data such as EPS, book value per share etc.
The following provides an overview of the data problems in SDC.
1. Shares outstanding before the IPO (SDC variable OUT)
a. SDC reports shares outstanding for 2,197 of the 2,399 firms, missing 202 firms.
b. Only 584 (26.6%) of those covered are correct.
c. The average error (= SDC / truth – 1) is 12.5% (median: 0).
d. In 954 cases, SDC’s number is too small. The average error is –36.3% (median: –29%).
e. In 659 cases, SDC’s number is too large. The average error is +94.3% (median: +32.2%).
2. Shares outstanding after the IPO (SDC variable OUTPF)
a. SDC reports shares outstanding for 2,277 of the 2,399 firms, missing 122 firms.
b. Only 537 (23.6%) of those covered are correct.
c. The average error (= SDC / truth – 1) is 14.9% (median: 0).
d. In 522 cases, SDC’s number is too small. The average error is –30.2% (median: –20.6%).
e. In 1,218 cases, SDC’s number is too large. The average error is +40.8% (median: +4%).
3. Frequent errors
a. Shares outstanding after the IPO should equal shares outstanding before plus primary shares issued. This basic identity is violated in 1,423 cases in SDC.
b. Shares outstanding pre-IPO should be smaller than or equal to shares outstanding post-IPO. This condition is violated in 115 cases.
c. In 95 cases, SDC reports as shares outstanding pre-IPO what is in fact the number of shares outstanding post-IPO.
d. In 29 cases, SDC reports as shares outstanding post-IPO what is in fact the number of shares outstanding pre-IPO.
e. Sampling indicates that SDC collects information regarding shares outstanding from early S-1 filings rather than IPO prospectuses. This can lead to two types of errors. First, shares outstanding will change between the S-1 and the final prospectus as options and warrants are exercised and preferreds are converted at their holders’ request. This will normally lead to relatively small differences in shares outstanding. Second, and more importantly, companies appear to often reverse-split their shares between their S-1 and the final prospectus, leading SDC to underreport final shares outstanding by factors like 2-to-1 or worse.
a. The percentage error in reported shares outstanding increases in the offer price and decreases in true shares outstanding. This holds for pre and post-IPO shares outstanding.
b. The percentage error in reported shares outstanding post-IPO increases significantly in underpricing.
c. It also increases in Loughran and Ritter’s (2001b) underwriter reputation measure.
The error rate in the 1996-2000 sample is many times higher than that in a reference sample that we compiled consisting of all IPOs in the 4th quarter of 1993. In the reference sample, SDC’s figures for shares outstanding are correct for 154 of the 185 offerings (83.2%), and the 31 incorrect cases are, with three exceptions, minor (possibly rounding) errors.
We identify venture-backed IPOs based on a reading of the “Principal shareholders” and “Recent transactions” sections contained in the IPO prospectuses. Comparing our identification to SDC’s variable VE, we notice three sources of error:
i) False negatives: SDC occasionally misses offerings that are backed by well-known VC funds, such as Golder, Thoma, Cressey and Rauner.
ii) False positives: SDC occasionally identifies as venture capitalists limited partnerships that are in fact owned by an executive director or his/her family.
iii) Inconsistent treatment: SDC’s flag identifying venture-backed IPOs is inconsistent with respect to firms backed by private equity funds, such as buy-out funds managed by Warburg Pincus, KKR, Blackstone, or Hicks Muse. We include all such firms in our definition of venture-backed issued.
SDC’s variable NUMAMGR is defined as the “number of managers including international co-managers”. From 1999 onwards, SDC counts every syndicate member so NUMAMGR equals syndicate size. Prior to 1999, it appears, SDC includes in NUMAMGR only lead and co-managers. This definitional change means that there is no consistent measure of syndicate size in SDC. We therefore hand-collect syndicate size for 1996-1998. In so doing, we include banks in the international syndicate but avoid double-counting banks that are included in both the international and the U.S. syndicates. We also correct 61 errors in SDC’s NUMAMGR variable for 1999-2000.
New York University
Stern School of Business
1 May 2002