Data Caveats

caveat

I am flattered that there are so many people who are interested in my data and that it is getting used in so many contexts, but I am also a little nervous about how it is being used. So, here are some things that you may want to consider before you use the data.

1. History

I started putting my datasets online in the early 1990s. At the time, I did not expect it to be used by anyone other than the students in my classes. In fact, for the first few years, the only datasets I had were for US companies and I had only a handful of data items- average betas by sector, averages of PE, Price to Book and EV to EBITDA multiples and a few dividend/debt ratio statistics. I did not provide the individual company data for downloads. The only data source I used was Value Line, which included data on 1700 US companies. Over the last decade, two things have happened that are interrelated.

The first is that the number of datasets that I approach each year has exploded: I had 80 sector average datasets to download this years and I have expanded the coverage to include non-US companies. As a consequence, I have had to use data from other data services such as Capital IQ and Bloomberg instead of the Value Line data.
Simultaneously, the number of people using the datasets has also increased exponentially, as have the number of different environments that it is being used in.

2. Data Sources

I am dependent upon my data sources for my data, which gives rise to three issues.

The first is that if the data service makes a mistake, I do as well. I cannot check the raw data on 45,000+ individual companies. To be honest, the effect on sector averages will be small, given the size of the sample, but for individual companies, that is not the case.
The second is that I have to be sensitive to the data service's commercial interests (which is to sell the data to subscribers). Consequently, I try my absolute best (though I sometimes slip up) not to reproduce data on individual companies that the service has provided. That is why I compute all of my multiples (PE, EV/EBITDA etc) and other statistics, rather than reporting the values from the service. It is also the reason that I don't report Capital IQ's industry classifications or betas, but create my own (from Capital IQ's raw data) for companies. It may make your life a little more difficult, but it is the only way I can keep this dance going.
The third is that I have to navigate the different data definitions that the different services used for the same item and make my own judgments. So, it is possible that my estimate of invested capital for a company may not match up to what you obtain from a different service.

3. Sector averages versus Individual company data

My focus has always been on the industry average data for two reasons. The first is that it is the data that I most use in valuations and corporate finance. The second is that you can get much more detailed individual company data from the company's own financial reports. I am not a data service (and I do not have the resources to be one) and the individual company data was never meant to be used as a research database. So, if it is missing items you wish it had, there is a reason.

4. Cross sectional versus Time Series Data

My primary objective each year is to provide updated data for sector averages on different statistics that year. Thus, the 2023 update has the industry averages using the most recent market price data (end of 2021) and the most recent financial statements (for annual data, that may be 2021). I never intended to provide a time series of data. I do provide the archived data sets from prior years, and while I have tried to be consistent in my industry groupings and data definitions, the raw data sources have changed at least five times in the last 25 years, making comparisons dangerous.. I often do change my views on how to compute a statistic and will try to go back over time and change my historical numbers. Thus, do not be surprised, if you go back and look at the 2004 data, to see an average beta for the telecom services businesses that is different from the value you looked up in 2004.

5. Use for data

I am a valuation/corporate finance person. When looking at a company, I am less interested in where it has been and more in where it is going. I look at data as raw material that I can use in making better estimates for the future. Consequently, this valuation mission drives how I come up with my numbers and what I report. For instance, when defining beta, my primary concern is that I get as good a beta estimate I can for the future and not to get the best estimate I can for the past. This remains the best use for my data.
If you are interested in assessing the past (doing a post-mortem), you will be better served using a service that focuses on providing just that: historical time series information. You can try to string together my sector average datasets over time, but it may not serve your purposes as well.
I do know that this data ends up in the legal arena more often than it should. If you are using my data from a prior year to back up your position or repudiate your opponent's in a court of law, please leave me (personally) out of that food fight. While I stand behind my data, it was never my intent to use it for that purpose. In fact, I don't put much weight on two factors that the legal system values, precedence and consistency. Put differently, if I feel that I have been computing a ratio incorrectly for ten years, I have no qualms about changing the way I do it in year 11.
Finally, this data was alsonever meant for public policy debates. In 2011, for instance, the New York Times used the tax rates that I had computed by sector to make a case that the tax code in the US was unfair. That may very well be, but I computed tax rates for a prosaic purpose, which is to value companies. It was not to make judgments on whether companies pay enough in taxes.

6. Description

I know that I have not been very good about providing enough background on how I compute some ratios (say the return on invested capital), expecting users to be familiar with my writings or books. That is not fair and I will try to remedy it over time, by going in and augmenting my variable description section and providing YouTube supporting videos for some of my datasets. It may take me a while to get it done. So, please have patience.

7. Fixing errors

Are there errors in the data? I am sure that there are, just as there are in any large dataset. Some of those errors may come from the data service and some are mine. If you do find an error, please let me know. Remember, though, that this site has a staff of one (me) and I may not get the fix done or get back to you as soon as you would like me to. But I promise that I will, sooner rather than later.