Bundling Information Goods:
Pricing, Profits and Efficiency

Yannis Bakos* and Erik Brynjolfsson**

Current Draft: December, 1996
(First Draft: December, 1995)

ABSTRACT

We analyze pricing strategies for digital information goods, such as those increasingly available via the Internet. Because perfect copies of such goods can be created and distributed almost costlessly, any single positive price for copies is likely to be socially inefficient. However, we show that, under certain conditions, a monopolist selling information goods in large bundles instead of individually may nearly eliminate this inefficiency. In addition, the bundling strategy can extract as profits an arbitrarily large fraction of the area under the demand curve for the individual goods while commensurately reducing consumers' surplus.

The bundling strategy is particularly attractive when the marginal costs of the goods are very low, when the correlation in the demand for different goods is low, and when consumer valuations for the individual goods are of comparable magnitude. We also describe the optimal pricing strategies when these conditions do not hold; show how private incentives for bundling can diverge from social incentives; and describe a mechanism to recover information about the underlying demand for each individual good. The predictions of our analysis appear to be consistent with empirical observations of the markets for Internet and on-line content, cable television programming, and copyrighted music.

________________________________________

We thank Timothy Bresnahan, Frank Fisher, Michael Harrison, Paul Kleindorfer, Thomas Malone, Robert Pindyck, Nancy Rose, Richard Schmalensee, John Tsitsiklis, Hal Varian, Albert Wenger, Birger Wernerfelt, Robert Wilson and seminar participants at the University of California at Berkeley, MIT, New York University, Stanford University, University of Rochester, the Wharton School and the 1995 Workshop on Information Systems and Economics for many helpful suggestions, although we have not been able to implement all of them.

1. Introduction

1.1 Overview

The pricing of information presents many difficulties for conventional markets. In particular, digital copies of information goods are indistinguishable from the originals and can be created and distributed almost costlessly via the emerging information infrastructure. What is the optimal price for each copy? Free (or nearly free) information would assure that all consumers whose marginal benefit is greater than the marginal cost would have access to the good. However, a zero price would not generate revenues to defray development costs and provide incentives for innovation.

Existing theory and practice fail to provide clear guidance on how digital information goods should be priced (Varian 1995), an issue of increasing importance as the Internet provides the infrastructure for a major marketplace for electronic information. Providers of on-line content have adopted varied and contradictory pricing strategies. Some firms charge users incrementally each time they access information, invoke a software subroutine, or download an image, as Quote.com is currently doing for certain types of investment information. Other firms, such as America Online, bundle large collections of content together and offer the bundle for a flat fee. Still other firms, such as Infoseek, have tried both strategies at various times and even concurrently. As of 1996, most information content on the Internet is offered at zero price to the user, and information providers hope to build a user base which they can eventually target for subscription charges, as the Wall Street Journal has done; to recover their production costs by selling advertising; or both.

This paper focuses on the strategy of bundling a large number of information goods for a fixed price, no matter how many goods are actually used by the buyer. We find that in a variety of circumstances, a multiproduct monopolist will extract substantially higher profits by offering a single bundle of information goods than by offering the same goods separately. As the number of information goods in the bundle increases, the seller may be able to appropriate nearly the entire value created by the provision of these goods. Furthermore, bundling can increase economic efficiency by reducing the deadweight loss created when goods are priced above their marginal costs.

The key intuition behind these results is that in many situations, consumers' valuation for a collection of goods has a probability distribution with a lower standard deviation per good compared to the valuations for the individual goods. For instance, consumer valuations for a stock quotation service, an on-line sports scoreboard, a news service, or a piece of software will vary. A monopolist selling these goods separately may maximize profits by charging a high price for each good, thereby excluding consumers with low valuations, rather than charging a low price and selling to most consumers. Alternatively, the seller could offer all the information goods as a bundle. Under reasonable assumptions about the distribution of valuations, the law of large numbers guarantees that the distribution of valuations for the bundle has proportionately more mass near the mean. The more goods included in the bundle, the less likely it is that any given consumer's valuation for the entire bundle will be very low or very high. As Schmalensee (1984) has argued, such a reduction in "buyer diversity" typically helps sellers extract higher profits while reducing the deadweight loss from non-zero prices, as more units are sold than if the goods were offered separately. The benefits of bundling are greatest when the marginal cost of the goods is very low, when the correlation in the demand for different goods is low, and when the valuations for individual goods are of comparable magnitude.

Our analysis of bundling contrasts with the conventional wisdom that the pricing of on-line information will be increasingly fine-grained because new technologies enable metering information in units as small as the article, datum, or bit. Charging very low prices for very small quantities of information seems to create more price discrimination opportunities for sellers, presumably leading to a more profitable and efficient allocation of information goods in the economy. Furthermore, many consumers seem to believe that it is wasteful to pay for access to goods that are not used and assume that a more fine-grained price system would result in savings. However, while such reasoning provides useful heuristics in markets for physical goods, it is misleading for information goods that have zero marginal cost. On the contrary, our analysis demonstrates that a single-price bundling strategy may be optimal, even if the costs of administering multiple prices or delivering separate information goods is zero.

1.2 The bundling literature

There are many potential benefits of bundling, including cost savings in production and transaction costs, complementarities among the bundle components, and sorting consumers according to their valuations (Eppen, Hanson, et al. 1991). We focus on this last benefit of bundling, which was first discussed by Stigler (1963) in a paper showing how bundling could increase sellers' profits when consumer valuations for two goods were negatively correlated. Adams and Yellen (1976) introduced a two-dimensional graphical framework for analyzing bundling as a device for price discrimination. By considering a setting with a multiproduct monopolist, two goods, no reselling, independent and additive consumer valuations, and linear "unit demands" (i.e., consumers buy either zero or one unit) for these two goods, they compare unbundled sales to pure bundling (offering only the complete bundle) and mixed bundling (offering both the complete bundle and subsets of the bundle). Using stylized examples, they illustrate that the relative profitability and efficiency of these pricing strategies depends on the marginal costs and on the distribution of customers' reservation values.

The formal analyses by Schmalensee (1984), McAfee, McMillan and Whinston (1989) and Salinger (1995) also focused on bundles of two goods. Schmalensee assumes a bivariate Gaussian distribution of reservation prices, and through a combination of analytic derivation and numerical techniques extends the results of Stigler and Adams and Yellen. He finds that pure bundling reduces the diversity of the population of consumers because the standard deviation of the consumer valuations for the bundle is less than the sum of the standard deviations of valuations for its components, thus enabling sellers to extract more consumers' surplus. He demonstrates that this is true if the valuations of the two goods are negatively correlated (as suggested by Stigler and Adams and Yellen), but can also be true if the valuations are independent, or even positively but not perfectly correlated. Schmalensee also derives conditions under which bundling goods with Gaussian demand will be profitable and socially efficient in the sense of reducing deadweight loss.

McAfee, McMillan and Whinston analyzed a setting with a multiproduct monopolist and a continuum of consumer valuations, similar to the one employed by Adams and Yellen. They show that mixed bundling will almost always strictly increase the seller's profits when the seller can enforce a price for the bundle that may exceed the sum of the prices of its components. They also derive a general condition under which mixed bundling dominates unbundled sales when the price of the bundle cannot exceed the sum of the prices of its components; this condition is always satisfied when the valuations for the two goods are independently distributed. Salinger develops a graphical framework to analyze the profitability and welfare implications of bundling two goods, primarily in the context of independent linear demand functions. He finds that bundling two goods tends to be profitable when consumer valuations are negatively correlated and high relative to marginal costs

More recently, Armstrong (1996) shows that for a special class of cases, the optimal tariff in the multiproduct case can be determined using the techniques typically used in the single-product case. However, he does not explore the implications of increasing the number of goods and, because he focuses on heterogeneous consumers, concludes that optimal bundle pricing will almost always inefficiently exclude some low-demand consumers.

1.3 Approach in this paper

Information goods may be bundled together because of technological complementarities in their production, consumption, search, or distribution. Our analysis suggests that it is often desirable to bundle information goods simply to take advantage of bundling as a pricing strategy. Therefore, for simplicity of exposition, we assume that there are no technological advantages to bundling; such technological complementarities would only strengthen our results. In this paper we define an "information good" as the smallest logical unit of information that does not exhibit technological complementarities, such as a news story, a photograph, or a song.

In contrast to previous work, we concentrate on bundling strategies that may involve a large number of goods with very low marginal costs of production. This approach is particularly suitable to information goods, which typically have virtually zero marginal cost of reproduction and can be sold in large bundles, with components delivered on demand via the developing telecommunications infrastructure, or distributed via mass storage media such as CD-ROM devices. We focus on pure bundling, which is the typical pricing strategy for bundles of information goods: in the rest of this paper, unless otherwise specified, the term bundling refers to pure bundling. As shown by Hanson and Martin (1990), price-setting for mixed bundling of many goods is an NP-complete problem, requiring the seller to determine a number of prices and quantities that grows exponentially as the size of the bundle increases.

We find that some of the results in the literature for bundles of two goods do not generalize to this setting. For instance, Salinger (1995) shows when consumers have independent linear demands, bundling two goods increases consumers' surplus if marginal costs are low enough for bundling to be profitable; we find that bundles of more than two goods will always reduce consumers' surplus when the goods have zero marginal cost and independent linear demands. Other results from the bundling literature are strengthened when the number of goods is large: bundling is profitable for a broader set of conditions, and its effects on profitability, consumers' surplus, and efficiency can be dramatically enhanced.

By focusing on large numbers of goods with low marginal costs we can formally model a multiproduct setting. This allows us to use well-developed techniques from the statistics literature and derive strong results without making strong assumptions about the initial distribution of consumer valuations. In addition, while valuations for individual goods do not typically conform to a Gaussian distribution, the central limit theorem guarantees that under relatively weak assumptions, the distribution of valuations for bundles of large numbers of goods does converge to a Gaussian distribution. As a result, Schmalensee's (1984) analytical apparatus and some of his results can be invoked to study large bundles of information goods, especially in providing criteria for evaluating the profitability and efficiency of "bundles of bundles."

Section 2 analyzes a simplified setting in which the marginal cost of all information goods is zero; buyers consume either zero or one unit of each information good; and buyers' valuations for all goods are independently and identically distributed. In sections 3 through 5, we relax these conditions. Section 3 shows that while our results are fairly robust for information goods, they are less likely to apply to physical goods because as the marginal cost of the component goods increases, the benefits of bundling eventually vanish. Thus, our model predicts that large bundles of unrelated physical goods should rarely be observed. In section 4, we allow the means, variances and covariance of the valuations for the components to vary, and investigate when adding a new product to a bundle will increase its profitability.

Section 5 considers more closely several types of covariance among components. Like earlier analyses of bundling, our model indicates that the increase in profits from bundling goods is greatest when the correlation in the valuations for the separate goods is small or negative. We also identify a special type of positive correlation for which the seller can capture nearly the entire value created by a set of goods, as long as the number of goods in the bundle is large enough and the correlation is less than perfect. In addition, we present discriminating mechanisms that significantly increase the benefits of bundling for goods with other types of correlated demands, provided the source of the underlying correlation can be identified, either directly, or indirectly through consumers' behavior. In particular, we show that mixed bundling can be more profitable than pure bundling when consumer valuations are not drawn from the same distribution, as it induces consumers to self-select.

Section 6 examines several extensions and implications. Because bundling destroys information about consumer valuations of individual goods, we describe a mechanism for recovering this information to an arbitrary degree of precision, while avoiding most of the deadweight loss associated with conventional single-good pricing. We also briefly characterize some of the ramifications of our analysis for market structure, including the potential for a "winner-take-all" equilibrium, and compare the implications of the model with some empirical evidence. Section 7 provides some concluding remarks.

2. Independent, identically distributed valuations

We begin by considering a setting with a single seller providing n information goods. For each n, let consumer valuations for the goods be denoted by random variables (such a collection is sometimes referred to as a triangular array of random variables and can be denoted by ), and let be the per-good valuation of the bundle of n information goods. Let , and denote the profit-maximizing price per good for a bundle of n goods, the corresponding sales as a fraction of the population, and the seller's resulting profits per good. Assume the following conditions hold:

A1: The marginal cost for copies of all information goods is zero to the seller.

A2: Each buyer can consume either 0 or 1 units of each information good.

A3: For all n, buyer valuations are independent, identically distributed (i.i.d.) with continuous density functions, non-negative support, and finite mean and variance .

A4: Resale is not permitted (or is unprofitable for buyers).

Under these conditions, we find that selling a bundle of all n information goods can be remarkably superior to selling the n goods separately. For the distributions of valuations underlying many common demand functions, bundling substantially reduces average deadweight loss and leads to higher profits for the seller. As n increases, the seller captures an increasing fraction of the total area under the demand curve, correspondingly reducing both the deadweight loss and consumers' surplus relative to selling the goods separately. More formally:

Proposition 1
Given assumptions A1, A2, A3, and A4, as n increases, the deadweight loss per good and the consumers' surplus per good for a bundle of n information goods converges to zero, and the seller's profit per good is maximized.

Proof: All proofs are in Appendix 1.

The intuition behind Proposition 1 is that as the number of information goods in the bundle increases, the distribution for the valuation of the bundle has more consumers with ìmoderateî valuations near the mean of the underlying distribution. Since the demand curve is derived from the cumulative distribution function for consumer valuations, it is more elastic near the mean, and less elastic away from the mean (Figure 1).

Figure 1: Demand for bundles of 1, 2, and 20 information goods with i.i.d. valuations uniformly distributed in [0,1] (linear demand case).

While Proposition 1 shows that for a sufficiently large n, selling goods as a bundle can be significantly more profitable than unbundled sales, and while McAfee, McMillan and Whinston (1989) find that mixed bundling of two goods always dominates unbundled sales when consumer valuations are independent, pure bundling does not necessarily increase profits for small n. For a large number of goods and under the conditions for Proposition 1, however, pure bundling-using a single price-captures nearly the entire value created by the information goods, so mixed bundling cannot do substantially better.

The weak law of large numbers provides an upper bound for the number of goods in the bundle that are needed to enable the seller to capture a given fraction of the total area under the demand curve. Specifically, Corollary 1 follows from the weak law of large numbers as used in the proof of Proposition 1 by choosing :

Corollary 1
Given assumptions A1, A2, A3, and A4, bundling n goods where allows the seller to capture as profits at least a fraction of the area under the demand curve.

For slightly stronger assumptions about the distribution of consumer valuations, the theory of large deviations (e.g., Chernoff's theorem or Lyapounov's theorem for bounded sequences) provides better estimates of the number of goods needed for a seller to extract as profits a given fraction of the area under the demand curve. Thus, a useful heuristic for the desirability of bundling for a seller is to consider the seller's ability to convert potential surplus to profits; if this can be effectively accomplished by selling the good separately, bundling is less likely to increase profits, especially for a small number of goods.

To further study the behavior of bundles, we assume the following condition, which implies a kind of "single crossing" property for the per-good demands :

A5: The distribution of valuations is such that for all n and e.

In this case, if it is more profitable to bundle a certain number of goods, say , than to sell them separately, and if the optimal price per good for the bundle is less than the mean valuation , then bundling any number of goods greater than will further increase profits. More formally:

Proposition 2
Given assumptions A1, A2, A3, A4, and A5, if and , then bundling any number of goods will monotonically increase the seller's profits.

Since bundling two goods with independent linear demands is profit maximizing for the seller (Salinger 1995), and the uniform distribution of valuations underlying linear demand satisfied Assumption A5, the following corollary follows from Proposition 2:

Corollary 2a: With independent linear demands for the individual goods, bundling any number of goods with zero marginal cost increases the seller's profits.

Propositions 1 and 2 show that sellers may increase profits and efficiency by reducing their strategy set: a seller sets a single price and sells equal quantities for all goods, instead of using n prices and quantities. In the limit, this achieves nearly perfect price discrimination. This is because the area under the demand curve for the bundle equals the sum of the areas under the demand curves for the individual goods (Salinger, 1995), but its shape is different, allowing the seller to capture more of the potential surplus created by the goods (Figure 2).

Figure 2: As n increases, the area of the inscribed rectangle p*q* that maximizes revenue and profits (normalized for n) increases, and for n>2, the mean deadweight loss and mean consumers' surplus also decrease.

As the number of i.i.d. goods in the bundle increases, total profit and profit per good increase. The profit-maximizing price per good for the bundle steadily increases, gradually approaching the per-good expected value of the bundle to the consumers, as shown in Figure 3 for the case of linear demand. The number of goods necessary to make bundling desirable, and the speed at which deadweight loss and profit converge to their limiting values, depend on the distribution of consumer valuations.

Figure 3: Profit as a function of price per good for bundles of varying number of goods n (steeper curves reflect larger n). The profit-maximizing price is the point at the maximum of each curve. In the limit, the price per good approaches the mean valuation 0.5.

The potential efficiency gains from bundling a large number of goods that we identify contrast with the more limited benefits identified in previous work, principally as a result of our focus on bundles of more than two goods and on goods with zero marginal costs, conditions that favor bundling. In fact, an important implication of our analysis is that the benefits of bundling grow as the number of goods in the bundle increases. This implies an aspect of superadditivity to bundling: bigger bundles will be more profitable than smaller bundles, even when the goods involved are identical:

Corollary 2b: Assuming that bundles of goods and goods are profitable (as per Proposition 2), then selling a bundle of goods is more profitable than selling two separate bundles of and goods respectively.

When and are sufficiently large, the central limit theorem guarantees that A5 will hold for almost any initial demand function for the individual goods, making Corollary 2b fairly general.

Corollary 2b shows that bundling can create significant economies of scope distinct from economies in production, distribution, or consumption. Strikingly, profits under the bundling strategy can be an arbitrary multiple of the maximum profits obtainable when the same information goods are sold separately. To see this, assume that demand for the individual goods is approximated by a log-log (constant elasticity) function. For a sufficiently large number of goods, Proposition 1 shows that bundling can convert a large fraction of the area under the demand curve into profits. In contrast, if such goods are sold separately, total profits become an arbitrarily small fraction of the area under the demand curve as elasticity increases. An implication is that a monopolist selling an inferior good (one with lower mean valuation) as part of bundle may enjoy higher profits and a greater market share than could be obtained by selling a superior good separately.

3. The Role of Marginal Costs

3.1 Marginal costs reduce the benefits of bundling

The attractiveness of selling large bundles of information goods depends critically on the assumption that their marginal cost is very low. For ordinary goods and services, whose marginal costs are non-trivial relative to consumer valuations, bundling is less likely to be attractive. Since consumers must buy all the goods in a bundle, the probability that a consumer will value any of the components of the bundle at less than their marginal cost is reduced if individual marginal costs are low. In general, if the marginal cost for certain goods is less than or equal to the lowest possible valuation for them, then Propositions 1 and 2 apply, and bundling these goods can increase profits and social surplus. Proposition 3 shows that, as expected, bundling goods with sufficiently high marginal costs is neither profitable nor socially efficient.

Proposition 3
Under assumptions A2, A3, and A4, there is a marginal cost for each information good that renders bundling less profitable than selling the goods separately.

Whenever consumers can freely dispose of goods or avoid consuming them, then their valuations will not be less than zero. In such cases, bundling a large number of goods is efficient and profitable if the goods have zero marginal cost. For information goods, disposal costs are typically insignificant and digital reproduction and transmission are bringing marginal costs close to zero as well. At mid-1996 prices for hard disk storage, a typical one-page news story can be stored indefinitely for a cost of approximately $0.0004 , and can be transmitted over digital fiber in about one ten-thousandth of a second (20Kbits at 200Mbits/sec).

In contrast, if marginal costs are large, the seller will want to increase, rather than decrease, the dispersion of valuations. For example, if the marginal cost is greater than the mean valuation, bundling will decrease profits because it decreases the fraction of buyers with valuations far from the mean. Schmalensee (1984) used numerical techniques to demonstrate that for a Gaussian distribution of consumer valuations, bundling two goods will be less profitable as long as c > m -1.253s, where c denotes the marginal costs, m is the mean valuation for each good, and s is the standard deviation of the distribution of valuations. In general, the threshold at which bundling becomes less profitable than unbundled sales depends on the form of the demand for the individual goods and, in the absence of other factors such as economies of distribution, never exceeds the mean valuation of a homogeneous population. Proposition 4 derives the threshold for uniformly distributed i.i.d. consumer valuations:

Proposition 4
If consumer valuations for information goods are i.i.d. and uniformly distributed in , and if the marginal cost is c, bundling is less profitable than selling the goods separately if .

Even with zero marginal costs, bundling may still be less profitable if the valuations of some goods are negative for some consumers. For instance, the availability of some information goods, such as pornography or articles espousing certain political views, may have negative valuations for some consumers. In addition, while technology is rapidly reducing the marginal costs of reproduction and transmission, the time and energy a user must spend to identify an information good can present a barrier to the limiting result of Proposition 1. Adding items to a bundle can make it more difficult for consumers to locate the items of value to them, thereby decreasing the value obtainable from all items. This will reduce the expected valuation of the bundle, may create inefficiencies by inducing consumers to settle for second best goods in the bundle, and eventually can make the bundle unusable (Bakos, in press). For example, the dominant cost of using a new software program or information service is often the cognitive cost of learning a new set of commands; the value of the specific features may actually be less important to the purchase decision than this cost (Brynjolfsson and Kemerer, 1996).

When any of the above conditions apply, the benefits from bundling will be limited. Consequently, the availability of increasingly sophisticated search and filtering mechanisms will increase the profitability of bundles of information goods both directly and indirectly. Such mechanisms will create value directly by allowing consumers of large bundles of information goods to find the goods they desire and eliminate the goods they wish to avoid; indirectly, they will reduce the marginal cognitive cost for coping with additional information goods in the bundle, thus increasing the optimal size of bundles and further reducing deadweight loss.

3.2 Congestion costs and network externalities

Bundling tends to increase the fraction of the population that purchases most goods (Proposition 1). If congestion costs increase with the number of consumers of the good, then the benefits of bundling could be tempered. Congestion is probably not an important concern, however, since bundling only requires access to, rather than the physical distribution of all goods in the bundle.

Goods with positive network externalities could be modeled as having negative congestion costs, so the welfare maximization might require a subsidy to increase adoption by consumers, some of whom might prefer not to purchase the goods even at a price of zero (Farrell & Saloner, 1985). Consequently, bundling can be especially beneficial for such goods: if the goods are provided separately and consumer valuations are private information, then universal adoption can be guaranteed only if the subsidy is based on the lowest possible valuation of all consumers for each such good. Bundling can reduce the cost of achieving nearly universal adoption. With a large number of goods in the bundle, nearly universal adoption can be achieved as long as the bundle price approximates consumers' mean (private) valuation for the goods with network externalities, which can reduce or eliminate the need to subsidize consumers with low valuations for individual goods. Profit-maximizing sellers might therefore find bundling a cost-effective way to build a network of users for goods such as Internet browser software and browser extensions.

Similarly, sellers of goods with high switching costs, or sellers of "experience goods" for which the consumer's valuation is only known after he or she tries the good, can use bundling to introduce their products to a broader set of consumers. A strategy of periodically updating the composition of the bundle so that any given consumer would face a stochastically attractive mix of both new and old goods could be more effective in overcoming buyer resistance than offering the goods separately. As Varian notes (1996a, p.11), information is fundamentally such an experience good, and this will also tend to favor bundling over unbundled sales as a way for sellers to leverage reputation effects.

3.3 Private and social incentives for bundling

When marginal costs are zero and consumers have non-negative valuations (or, equivalently, free disposal), bundling increases efficiency when it increases the fraction of consumers purchasing the bundle, since each purchase creates some benefit at no additional cost. As shown earlier, a monopolist who bundles a sufficiently large number of goods with i.i.d. valuations can capture an arbitrarily large fraction of the area under the demand curve, reducing mean deadweight loss and converting consumers' surplus into producers' surplus in the process.

However, even when bundling increases the fraction of the population served, socially inefficient bundling may occur if there are positive marginal costs associated with provision of each good. Figure 4 shows why. Consider a collection of monopolistically provided goods with independent linear demands and the same marginal cost c, leading to a profit-maximizing price P* and quantity Q* for each good when sold separately. When marginal costs are positive, some consumers will value some of the goods at less than their marginal costs. In particular, consumers that fall between Q' and Q" for a given good derive a total benefit from that good equal to area F, but cost the seller G+F to service, resulting in a net social cost of G.

Figure 4: Private incentives for bundling will be socially inefficient if D < G < A+D

A multiproduct monopolist will pursue a bundling strategy when it is more profitable than selling the goods separately. For a sufficiently large number of goods n, the monopolist can capture nearly the entire area under the demand curve by bundling and selling to almost all consumers, but will also incur the marginal costs of serving the entire population, so the net profit per good is approximately equal to A+B+D-G. (Recall that the sum of the areas under the unbundled goods' demand curves exactly equals the area under the demand curve for the bundle, although the shapes will differ.) The profit from selling the goods individually is B. Thus, the monopolist will choose to bundle only if A+B+ D-G > B. In contrast, a social planner will consider consumers' surplus as well as producers' surplus and thus will prefer bundling only if A+B+D-G > A+B. Therefore, when D<G<A+D, the seller's private incentives will induce bundling that is socially inefficient.

Example: With linear demand (i.e. consumer valuations i.i.d. and uniformly distributed in , D is smaller than G if and only if the marginal cost c is greater than . In this case, according to Proposition 4 the seller will benefit from bundling if . Thus when , bundling will be beneficial to the seller but socially undesirable.

When the goods are not i.i.d., the private incentives for bundling may diverge even further from the social incentives. For example, a multiproduct monopolist can increase profits by adding a new good to the bundle if that good has even a slight positive valuation for the marginal consumer of the bundle. If the new good has a significantly negative value to all inframarginal consumers, this will not diminish the private incentive to bundle as long as their reservation values do not drop below that of the marginal consumer. Thus, in principle, the loss of consumers' surplus from bundling can exceed the increase in seller's profits by an arbitrarily large amount.

4. Asymmetric Bundling

In Proposition 1, all information goods are assumed to have identically distributed valuations. In practice, information goods will have different means or variances. Even the same information good may have different valuations at different times: a movie or a news story is likely to command higher valuations when first released than a year later. Because of the generality of the weak law of large numbers, relaxing the assumption of identically distributed in does not affect the results of Proposition 1, although may converge to more slowly. The following more general proposition directly follows from the proof of Proposition 1 and Lyapounov's theorem on stochastic convergence to the mean for bounded sequences:

Proposition 1A
The results of Proposition 1 hold if Assumptions A1, A2, and A4 are satisfied, and buyer valuations are independent and uniformly bounded with continuous density functions and non-negative support.

Schmalensee (1984) points out that bundling increases a seller's profits "by reducing buyer diversity, thus facilitating the capture of consumers' surplus." In his paper, buyer diversity is indexed by the coefficient of variation of buyer valuations. As long as all goods are drawn from the same distribution, the weak law of large numbers guarantees that adding more goods to the bundle will reduce this coefficient of variation, while the central limit theorem implies that the distribution of valuations for the bundle will converge to the Gaussian distribution to which Schmalensee originally applied this criterion.

Although Proposition 1A implies that bundling generally increases seller's profits for large numbers of goods with zero marginal cost, it is not always optimal to add an additional information good to a bundle. Adding a good to a bundle can increase the sales and resulting profits from this good, especially if the demand curve for the individual good makes it difficult to extract a significant fraction of the potential surplus as profits, as is the case for goods with high and constant elasticity of demand. Conversely, if potential surplus can be effectively extracted as profits when a good is sold separately, there is little to be gained by adding it to a bundle, as is the case for goods with only two possible valuations, 0 and (see footnote 5).

Even when adding a good to a bundle does not affect the good's own profitability, it may affect the seller's ability to earn profits on the other goods in the bundle. For example, when goods are asymmetric, the coefficient of variation of a bundle does not necessarily decrease when an additional good is included in the bundle. If a good with high variance is added to a bundle, this may decrease the profitability of the bundle. Adding a new information good i to an existing bundle B will decrease the expected diversity of demand, as indexed by the coefficient of variation, if and only if .

Example: If the valuations of i and B are uncorrelated and , the coefficient of variation will decrease if .

The above discussion may explain why a typical cable TV bundle from providers like HBO or Cinemax offers access to hundreds of movies, but prize fights and other "special events" are typically offered on a "pay-per-view" basis. The cable companies may have established that valuations for the prize fight are concentrated among a small fraction of consumers willing to pay very high prices to watch the fight; thus, the potential surplus of these consumers can be effectively extracted by selling the price fight outside the bundle, while including the fight in the regular bundle might increase the bundle's coefficient of variation.

5. Correlated Demands and Price Discrimination

While Proposition 1 assumes that valuations of information goods are independent, in practice they may be positively or negatively correlated. This section explores how such correlation affects the profit-maximizing strategy of a monopolist who bundles information goods.

5.1 Negative correlations, implicit budgets and product category aggregation

There is evidence that consumers have implicit budget constraints for different categories of expenses (Thaler, 1990). For instance, while it is exceedingly difficult to predict which games, on-line services, and articles a particular consumer will purchase, one can predict that consumers are typically willing to spend about $30 per month on all types of on-line entertainment. As the cost of goods purchased approaches this ìbudget constraint," it becomes less likely that any additional goods will be purchased. Similarly, because human information processing capacity is finite, a time-budget constraint may prevent the consumption of a very large number of goods; even the most ardent football fan cannot watch all the games played on any given Sunday, and the most dedicated academic cannot read all the on-line publications that might be relevant. Such budget constraints create a negative correlation in the valuations of successive information purchases.

When there are explicit or implicit budget constraints, the average variance of valuations for the bundle declines more rapidly as new goods are added to the bundle. As a result, it is easier for the seller to predict demand for the bundle, and the expected size of the deadweight loss declines more rapidly. If the budget constraint is "hard," the full efficiency benefits of bundling may be achieved with a finite number of goods n, which are selected to collectively exhaust a given budget category.

5.2 Positive correlations

Two distinct types of positive correlation need to be analyzed because they have different on the profitability and efficiency of bundling. In the first case, valuations for the information goods are positively correlated, but not to the same underlying variables. For example, a consumer with a high valuation for an article about the Boston Red Sox may also put high value on subsequent articles about baseball or sports in general. Similarly, a trader's valuations for a sequence stock quotations may be serially correlated over time or across industries. If these correlations become lower the more distant one gets from the initial topic or item, eventually converging to zero, then the law of large numbers and the central limit theorem apply, and the limiting results obtained in earlier sections hold. For example, the following more general proposition directly follows from the proof of Proposition 1 and the law of large numbers for stationary (in the wide sense) sequences:

Proposition 1B
The results of Proposition 1 hold if Assumptions A1, A2, and A4 are satisfied, and the sequences of buyer valuations are identically distributed, not perfectly correlated, and stationary in the wide sense for all n, with continuous density functions, non-negative support, and finite mean and variance .

Thus, bundling of information goods can substantially increase profits even when the valuations of individual goods are highly correlated, but not to the same underlying variables. However, the distribution for the bundle converges more slowly to a Gaussian distribution, and the number of goods required to achieve a given level of profits and efficiency gains generally increases.

In the second type of positive correlation, the valuations for all goods are correlated to one or more underlying variables. For instance, if business users have higher valuations than home users for both a financial news story and a research report, they will also have a higher valuation for a bundle of both these goods. In this case, the distribution of consumer valuations for the bundle does not converge to a Gaussian distribution as more goods are added. Instead, the limiting distribution of valuations reflects the distribution of the underlying variable, in this example the probability that a consumer uses the computer for fun or profit. No matter how many goods are added to the bundle, the demand curve always reflects the difference in valuations by home and business users, preventing the seller from capturing the entire surplus with a single price. In general, when valuations are correlated with underlying variables, bundling does not necessarily reduce deadweight loss even for very large bundles, and a simple bundling strategy may not be the profit-maximizing strategy for sellers of information goods.

Example: Suppose that consumers are equally divided between home and business users, and that both types have either a high or a low valuation for each information good, respectively denoted by and (). Home users value each good at with probability and at with probability , while business users value each good at with probability and at with probability . Marginal costs are negligible. In this setting, consumer valuations are positively correlated with consumer type ("business" or "home" user). Without bundling, if the seller will set a price equal to , and sell to the consumers for a profit per consumer per good. If , the seller will price at , and sell to all consumers for a profit of per consumer per good. Bundling a large number of goods results in average valuations of for business users and for home users. The seller can price the bundle either for the business users, resulting in a maximum profit of per consumer per good, or sell to everybody for a maximum profit of per consumer per good. Thus, when , bundling is strictly less profitable than unbundled sales.

Similarly, when consumer valuations are correlated to the same underlying variable, mean deadweight loss may not be eliminated by bundling and may even increase, depending on the actual distribution of the valuations. Since consumers' preferences are often correlated with underlying variables, the strong results of propositions 1 and 2 may be limited in practice.

5.3 Market segmentation: Third-degree price discrimination

The results of Proposition 1 can be restored if the market can be segmented according to the underlying variable. The strategy is to create submarkets defined by different values of the underlying variable, so that consumers' demands are i.i.d. conditional to a given value of the underlying variable.

For instance, while home and business users may have different valuations for a bundle, valuations for individual information goods may be i.i.d. within each category of users (as in the previous example). By identifying a given consumer's market segment ex ante, a seller can predict that consumer's expected value for the bundle. The seller can maximize profits by offering an appropriately priced bundle for each type of consumer-third degree price discrimination. For instance, it may be optimal to offer an identical bundle of all goods to both types of users while providing a rebate to home users or imposing a surcharge on business users. In the previous example, the seller could charge business users a price of per bundled good, and home users a price of , allowing the seller to maximize profits while eliminating deadweight loss and consumers' surplus.

Such price discrimination is common among software and information vendors (Varian, 1996b). For example, McAfee Associates, Inc. has separate price schedules for home and business users for identical collections of anti-virus software and updates of information about new computer viruses. Similarly, journals commonly charge different prices for collections of articles depending on the organizational affiliation of the subscriber (Varian 1996c).

Figure 5: Bundling with third-degree price discrimination

In principle, demand might be segmented into an arbitrary number of subcategories, with separate prices for each subcategory as illustrated in Figure 5 and by Proposition 5. To demonstrate this approach, we assume that each consumer is characterized by a type parameter w, such as appetite for information goods, computer literacy, or income, uniformly distributed in . More formally, assume:

A6: Consumers are characterized by a type w. Given w, the valuations for all information goods are i.i.d. and uniformly distributed in . Consumer types w are i.i.d. and uniformly distributed in .

If consumer valuations for individual goods are correlated to a common underlying variable such as consumer type, but are i.i.d. conditional on this variable, then bundling increases profits, reduces deadweight loss, and reduces consumers' surplus if the seller can segment the market through third-degree price discrimination. Proposition 5 illustrates this in the setting introduced above:

Proposition 5 (Third-degree price discrimination in one variable)
Given assumptions A1, A2, A4, and A6, bundling does not eliminate deadweight loss in the limit as . However if third-degree price discrimination is possible, then the results of Proposition 1 still hold in the limit, and the seller charges each consumer of type w a price for the bundle of per good.

The third-degree price discrimination strategy can be generalized to multiple underlying variables. If a seller segments consumers using one variable, and then finds that consumer valuations remain correlated to a different common variable, the process can be repeated to remove this residual correlation. With a sufficiently large collection of variables, the distribution of valuations may become i.i.d., or nearly so, conditional to a given value for the set of underlying variables. For instance, it might be possible to segment consumers by business vs. home use, zip code, educational background, age, sex, credit rating, etc. Databases providing such demographic information are readily available, although legal and ethical issues may limit the use of some of this data for price discrimination. Third-degree price discrimination strategies will be facilitated by widespread computer networking and public key encryption and authentication technologies that enable the cost effective delivery of non-transferable rebate coupons to individual consumers. The rebate amount can be a function of the underlying variables that are correlated with the targeted consumers' expected valuation for the bundle.

5.4 Using mixed bundling to reveal consumer types

Third-degree price discrimination requires that any underlying variables correlated with consumer valuations be observable, so that the seller can segment the market based on these variables; this is often infeasible. However, if a consumer's type is correlated with an observable behavior, such as time spent on-line or willingness to wait for "stale" information, then this behavior can be used to segment the market. This enables a form of second-degree price discrimination (Varian, 1996b), in which consumers self-select by purchasing different versions of the bundle. In particular, the monopolist can pursue a mixed bundling strategy of offering several bundles, each including a subset of the available information goods; this menu of bundles will screen consumers by type.

To model this type of price discrimination, in the setting above assume there is a feature of the bundle (or an array of goods in the bundle) which is costless to the seller, and which has a value that is monotonically increasing with consumer type. More formally, assume that:

A7: There is a feature d of the bundle, costless to the seller, that, without loss of generality, can take any value in .

A8: A consumer of type would prefer to ìconsumeî of d; a lower level of consumption would result in a linear utility loss of , while no benefit is derived from a consumption level higher than .

In this setting, the seller can use a bundling strategy to increase profits and reduce the total deadweight loss, but the seller must provide incentives to prevent consumers with high valuations from mimicking low-valuation consumers. This need to maintain incentive compatibility typically reduces the efficiency benefits of bundling-some low valuation consumers are inefficiently excluded from some goods-and introduces some rent spillover-surplus is not completely extracted from some high-valuation consumers (Wilson 1993 ch.10). As Armstrong (1996) shows, the inefficient exclusion of low-demand consumers is also common in multidimensional mechanism design, when consumers' private information cannot be captured in a single scalar variable. In the above setting, where consumer valuations are correlated with the underlying variable w, the following proposition applies:

Proposition 6:
Given assumptions A1, A2, A4, A6, A7, and A8, for a large enough , bundling results in higher seller's profits, lower consumers' surplus, and a smaller deadweight loss. The optimal price schedule is for and otherwise.

Proposition 6 implies that a seller can price a bundle contingent on the level of feature d chosen by each consumer (and the corresponding implied type w), thereby making the bundling strategy profitable even when consumers are not homogeneous. The sellerís strategy is similar to the third-degree price discrimination strategy, except that the seller must satisfy incentive compatibility constraints in setting the price schedule, because consumers can strategically modify their behavior.

Strategies that may lead consumers to reveal their types include charging a lower price for delayed stock quotations, news stories or movies; for images with lower resolution; for less comprehensive search results; or for having access restricted to certain hours. For instance, Lexis/Nexis offers lower prices for access to a standard bundle of electronic data to users who do not need access during regular business hours. Another way to degrade the bundle is to leave certain items out; for example, to sell a "basic" bundle that is a subset of the "premium" bundle. Such a mixed bundling strategy forces consumers to signal their valuations by their choice of bundles. While the degraded bundles need not be any less expensive to create or provide, offering them can increase profits by reducing the rents that the seller does not capture from high-valuation consumers to induce them not to disguise their types (Deneckere and McAfee, 1994).

In summary, sellers of information goods will often find it advantageous to segment their markets based on observable characteristics or revealed behavior to reduce or eliminate the correlation of values across products. In practice, this may involve offering different bundles to different groups, a strategy that can be interpreted as mixed bundling. In this context, Proposition 6 can be interpreted as showing how mixed bundling can dominate pure bundling when consumer valuations are correlated to an underlying type (and thus consumers are heterogeneous), even if marginal costs are zero.

6. Extensions and Implications

6.1 Effects of technical complementarities or distributions costs

An interesting class of extensions would be to relax the assumption that the value (or production cost) of the bundle is equal to the sum of the values of the component goods. Not only may the goods be complements or substitutes, but there may also be costs and benefits associated with producing, distributing or consuming the bundle as a whole, such as economies of scale in creating a distribution channel, administering prices, and making consumers aware of each product's existence. For instance, technological complementarities affect the collective valuation of the millions of parts flying in close formation that comprise a Boeing 777, and economies of scale make it cheaper to distribute newspaper or journal articles in groups rather than individually. One implications is for certain types of dependencies, the distribution for the valuation of the bundle will not converge to a Gaussian. For instance, if the adding a good (or feature) to the bundle has a multiplicative effect on the value of other goods in the bundle, as is commonly assumed in hedonic models of product valuation (Fisher and Shell, 1971), then the value of the bundle will approach a log-normal distribution.

Complementarities and economics of scale in distribution can obviously create additional incentives for bundling, and if the savings are sufficiently large, can lead monopolists to bundle goods that would not otherwise be bundled (Eppen, Hanson, et al., 1991). Such economies underlie most large "bundles" of physical goods, and would tend to add to the advantages of bundling we have already identified. However, one of the effects of the emerging information infrastructure is to dramatically decrease distribution costs for goods that can be digitally encoded. As noted by Metcalfe (1995) and others, this may be enough to make it profitable to unbundle certain goods, such as magazine and journal articles, packaged software and songs, to the extent they were formerly bundled simply to reduce distribution costs.

6. 2 A mechanism for recovering information about the value of individual goods

A drawback of bundling is that information about the demand for individual components is lost, because only a single price and quantity are observed. This loss of information could impose a substantial cost: if total revenues are divided among the producers of the individual information goods without regard to how much value they each contributed, there will be significant underincentive for the development of new goods because of the "free rider" problem (Holmstrom, 1982). A multiproduct monopolist who makes the investment decision for all the components will face a similar problem: where should resources be allocated within the firm? One frequently mentioned benefit of the traditional price system is its ability to provide the information required for optimal production incentives.

To address this problem, information about the valuations for individual goods can be recovered by offering relatively small random samples of consumers access to selected information goods separately, rather than as part of a bundle. Under realistic conditions, this mechanismóbundling together information goods and statistically sampling consumersócan preserve most of the informational benefits of having separate prices for each good, while substantially increasing the profits and total surplus generated from a given set of information goods. Details are outlined in Appendix 2.

This approach decouples the two basic functions of prices: to allocate goods among consumers, and to allocate investment among producers. The price at which most consumers decide to buy a given information good need not be the same as the price that guides production decisions.

6.3 Implications for market structure

Our analysis shows that a multiproduct monopolist of information goods can often achieve higher profits and greater efficiency by using a bundling strategy than by selling the goods separately. If it would be difficult (or illegal) for a collection of single-good monopolists to coordinate on a unified bundling strategy and price, our analysis suggests that they may benefit from merging or from selling their information goods to a single firm. Thus, bundling creates non-technological economies of scope; for instance, an information good that is unprofitable (net of development costs) if sold separately could become profitable when sold as part of a larger bundle. These effects of bundling have implications for at least four different market structures.

First, the dynamics of bundling could create a winner-take-all market. Multiproduct firms that successfully sell a suite of information goods may find it more profitable to introduce new information goods than will single-product firms. Even if an information good introduced by as part of a bundle by the multiproduct firm is intrinsically less valuable to consumers than a similar product that might be sold by single-product firms, our analysis shows that the multiproduct firm may, in equilibrium, achieve a higher market share and earn higher profits from that good. Bundling may therefore enable a multiproduct firm to charge lower prices while remaining profitable. Furthermore, a single-product firm may find it profitable to sell all rights to its product to the multiproduct firm to reap a share of the benefits from bundling. This suggests that an equilibrium with a single multiproduct monopolist will be stable in the face of the introduction of new information goods or even small bundles of new information goods. This winner-take-all effect from bundling is distinct from technological economies of scope or scale or learning (e.g. Spence, 1981), network externalities (e.g. Farrell and Saloner, 1985), or financial market imperfections (e.g. Bolton and Scharfstein, 1987).

A variety of alternative market structures might also emerge. Bundling could be implemented by a broker that remarkets goods produced by information "content" producers. This is essentially the strategy of on-line services like America Online. Alternatively, a consortium or club of consumers could purchase access to a variety of information goods and make them available to all members for a fixed fee. Some user groups or certain site licensing arrangements for software resemble this approach. Finally, the government could fund the creation and distribution of information goods through taxes that do not depend on which individual goods are consumed, but only on access to the whole set, as is done for some television programming in some countries. For instance, the United Kingdom funds public television programming via a use tax on television sets. Each of these institutional approaches is likely to have somewhat different welfare consequences, and the analysis becomes even more complex when multiple brokers, consortia, and producers simultaneously compete.

6.4 Empirical evidence

Our models for bundling information goods can help explain some empirical phenomena. For instance, a sharp contrast in pricing and bundling strategies is evident at two commercial sites on the World Wide Web: the Internet Shopping Network (http://internet.net) and E-library (http://www.elibrary.com). At first glance, these two sites look similar: each has colorful icons representing a variety of products for sale. However, at Internet Shopping Network, which sells physical goods like computer accessories, each item is associated with a distinct price; at E-library, all of the items displayed are available when the consumer pays a single price for access to the bundle. The goods sold by E-library are information goods with nearly zero marginal costs of reproduction. Since both companies market their products over the Internet, it is reasonable to assume that they face similar transaction costs; our theory of bundling as a pricing strategy for information goods provides a clear explanation for the difference in pricing strategies.

Many sellers of on-line information, such as America Online, CompuServe, and Lexis/Nexis sell their goods in large bundles. For instance, when Reuters sells information about prices for various financial securities, the standard contract involves a large bundle of quotations for different securities over an extended period. While transaction costs encourage some degree of "bundling" simply to reduce administrative overhead, these considerations are not central to the Reuters strategy (Dhebar, 1995).

Cable and direct satellite broadcast television firms each sell goods with nearly zero marginal costs of reproduction. In general, pay-per-view has been less common than bundling-oriented pricing schemes. Typically, a few standard bundles are offered, as predicted by our theory, in an attempt to achieve some degree of price discrimination. For example, these firms typically offer a "basic" bundle from which certain goods are excluded. The pay-per-view approach has been used mainly for unusual special events such as boxing matches; this can be explained as a strategy of excluding "big" goods from the bundle and charging for them separately if some aspects of the nature of consumers' demand for these goods is known a priori.

Interestingly, Microsoft has often incorporated into its operating systems applications and functionality that were developed by other firms and previously sold separately; this may be consistent with our model. In 1992, Microsoft's Windows operating system incorporated most of the capabilities of Artisoft's Lantastic; in 1993, it incorporated memory management similar to Quarterdeck's QEMM product, disk compression like Stac's Double Space, and faxing like Delrina's Winfax product; and in 1995, email like Lotus's cc:mail. Current versions of Windows 95 include web-browsing software similar to Netscape's Navigator. Similarly, Wordperfect and Lotus have also sought to compete by bundling their products with applications that previously were sold separately.

Finally, while the bundling and statistical sampling mechanism we propose in section 6.2 may at first seem unlike any practice in existing markets, it resembles the mechanism that has evolved for determining how royalties should be apportioned to composers and songwriters from the revenues paid by nightclubs, restaurants, and other venues. ASCAP and BMI , two music associations, charge flat rates to organizations based on factors such as the number of seats in the establishment, but do not consider which songs are played. They then sample radio play lists and other sources to estimate how popular each song is, and divide the total revenues earned among the composers and songwriters in proportion to the estimated current popularity of their songs.

7. Conclusion

A strategy of selling a bundle of many distinct information goods for a single price often yields higher profits and greater efficiency than selling the same goods separately. The bundling strategy takes advantage of the law of large numbers to "average out" unusually high and low values for goods, and can therefore result in a demand curve that is more elastic near the mean valuation of the population and more inelastic away from the mean. As a result, profits can be increased, even as inefficiency (deadweight loss) is reduced. While the profitability and efficiency benefits of bundling are most apparent when the consumer valuations are identically distributed and not closely correlated for different products, a bundling strategy can be profitable in a variety of situations.

Our analysis implies that optimal pricing strategies for information goods with a marginal cost of reproduction close to zero are likely to be quite different from strategies for goods and services with non-zero marginal costs. This suggests that further analysis of the pricing of such goods is desirable, as many long-standing results and intuitions about the costs and benefits of various pricing strategies may not apply to information goods.

References

Armstrong, M. "Multiproduct Nonlinear Pricing," Econometrica, Vol. 64, No. 1, January 1996, pp. 51-75.

Adams, W.J. and Yellen, J.L. "Commodity bundling and the burden of monopoly," Quarterly Journal of Economics 90 (August): 1976. 475-98.

Avery, A., Resnick P. & Zeckhauser, R. "The Market for Evaluations," Working Paper, Harvard Kennedy School of Government, 1996.

Bakos, Y. "Reducing Buyer Search Costs: Implications for Electronic Marketplaces," University of California, Irvine working paper, December 1995 (forthcoming in Management Science).

Balderston, J., "Online Cash: A Penny for Your Thoughts?" Infoworld, 3 (July 8, 1996).

Bolton, P. and D. Scharfstein, "Long-Term Financial Contracts and the Theory of Predation," Harvard University, mimeo (1987).

Brogden, S. L., "Letter to the Editor," Infoworld (February 19, 1996).

Brynjolfsson, E. and C. F. Kemerer, ìNetwork Externalities in Microcomputer Software: An Econometric Analysis of the Spreadsheet Market,î Management Science, in press.

Deneckere, R. J. and McAfee, R. P. (1994). "Damaged goods," University of Texas at Austin.

Dhebar, A., "Reuters Holdings PLC. 1850-1987: A (selective) history," Harvard Business School, HBS Case 9-595-113, (May 1995).

Eppen, G. D., W. A. Hanson, et al. (1991). "Bundling-New Products, New Markets, Low Risk." Sloan Management Review 32(4): pp. 7-14.

Farrell, J. and G. Saloner, ìStandardization, compatibility, and innovation,î Rand Journal of Economics, 16 (1): 442-455, (1985).

Fisher, F. M. and K. Shell, ìTaste and Quality Change in the Pure Theory of the True Cost-of-Living Indexî, pp. 16-54 in Price Indexes and Quality Change, Griliches, Z. (ed.) Harvard University Press, Cambridge, MA, (1971).

Gal-Or, E., ìFirst Mover Disadvantages with Private Information,î Review of Economic Studies, 54 279-292, (1987).

Hanson, W. and Martin, K., "Optimal Bundle Pricing," Management Science, 32(2) (February 1990).

Harmon, S., "Media Comparables Uncovered: What's An Access Subscriber Worth Anyway?", iWORLD, (http://netday.iworld.com/stocks/index.shtml) (August 20, 1996).

Holmstrom, B., ìMoral Hazard in Teams,î Bell Journal of Economics, 13 324-340, (1982).

McAfee, R.P., McMillan, J., and Whinston, M.D. "Multiproduct monopoly, commodity bundling, and correlation of values." Quarterly Journal of Economics 114 (May): 1989. 371-84.

Metcalfe, R., "On-line Services for Small Change on the Next Generation Internet," Infoworld, (December 25, 1995).

Metcalfe, R., "A penny for my thoughts is more than I could hope for on the next Internet", Infoworld, (January 22, 1996).

Ross, Stephen A. "The Arbitrage Theory of Capital Asset Pricing". Journal of Economic Theory, 13 (3) December, (1976) 22-531

Salinger, M. A., ìA Graphical Analysis of Bundlingî, Journal of Business, 68 (1): 85-98, (1995).

Schmalensee, R.L. "Gaussian demand and commodity bundling." Journal of Business 57 (January): 1984. S211-S230.

Spence, M., ìNonlinear Prices and Welfareî, Journal of Public Economics, 8 1-18, (1977).

Spence, A. M., ìThe Learning Curve and Competitionî, Bell Journal of Economics, 12 49-70, (1981).

Stigler, G.J. "United States v. Loew's, Inc.: A note on block booking." Supreme Court Review, 1963, pp. 152-157.

Thaler, R. H., ìSaving, Fungibility, and Mental Accounts,î Journal of Economic Perspectives, 4 (1): 193-205, (1990).

Urban, G. L., B. D. Weinberg and J. R. Hauser, ìPremarket Forecasting of Really-New Productsî, Journal of Marketing, 60 (January): 47-60, (1996).

Varian, H. "Pricing Information Goods." Proceedings of Scholarship in the New Information Environment Symposium. Harvard Law School. May 1995.

Varian, H. "Economic Issues Facing the Internet", SIMS working paper, Berkeley, September, 1996a.

Varian, H. "Differential Pricing and Efficiency", SIMS working paper, Berkeley, June, 1996b.

Varian, H. "Pricing Electronic Journals", D-Lib Magazine, June, 1996c.

Weitzman, M. "Recombinant Growth", mimeo, Harvard University, 1995.

Wilcox, J., "Pricing of Content on the Internet: The Aggregator Model," unpublished MIT Masters thesis, (June, 1996).

Wilson, R. Nonlinear Pricing. Oxford University Press. New York, 1993.

Appendix 1: Proofs of Propositions

Proposition 1

Consider a bundle of zero marginal cost goods, each with i.i.d. valuations with mean and standard deviation . Let be the probability density function for a consumerís valuation for this bundle, and letand be the mean and standard deviation for the valuation of the bundle adjusted for n; i.e., and. Denote by , the optimal mean price for the bundle (adjusted for n) and the corresponding quantity (), and let be the resulting profits per good . Let and . We show that and . (If these limits do not exist, the same reasoning can be applied to convergent subsequences of and , as is bounded, and so is because of the finite variance assumption.)

If P>m, there exists some e>0 such that for all large enough n, . By the weak law of large numbers, , where or . Thus if P>m, , and since is bounded, , which contradicts the optimality of and .

If P<m, there exists some e>0 such that for all large enough n, . Let , and the corresponding quantity. The weak law of large numbers implies that , and . Since for large enough n, , it follows that , which again contradicts the optimality of and . Thus .

If , let and , so that . Since converges to Q and , there exists some such that for all . Choose such that , which is satisfied for , and let be the quantity sold at price . By the weak law of large numbers, , and thus there exists some such that for all . Finally, since converges to m as shown above, there exists some such that for . Let . Then for , setting a price yields corresponding sales and revenues . Since e was chosen so that , we get , contradicting the optimality of and .

Proposition 2

Using the same notation as in Proposition 1, we assume that, for all integer and all , .

This assumption implies that the quantity of the bundle of goods sold at price per good will increase compared to the bundle of goods, i.e., . This guarantees that . Adding the th good to the bundle is desirable for the seller, because a bundle of goods is more profitable than a bundle of goods plus a single good sold separately, since .

Assumption A5 also implies that (otherwise would not be optimal), allowing the reasoning above to be applied inductively, which proves the proposition for all .

Proposition 3

If the marginal cost is higher than the mean valuation, it is easily seen that bundling is unprofitable at the limit as . Separate sales are still profitable as long as some consumers' valuations are higher than the marginal cost.

Proposition 4

Without bundling, the seller faces a downward sloping demand function for each individual good, resulting in a monopolistic equilibrium price of and corresponding profit of . Bundling allows the seller to capture the entire consumer valuations, thus resulting in average profit . Bundling becomes unattractive when , or . As , this condition is met when .

Proposition 5

Given a consumer's type w, valuations are i.i.d. for all goods and uniformly distributed in , i.e., , where .

The probability that a consumer of type w will value any particular good at x is for . Thus the sum of valuations for consumers with valuation at level x equals , and consequently the unbundled demand at price p is

and thus .

As n increases, the mean valuation for a bundle of n goods by a consumer of type converges stochastically to . Thus at a price per good for the bundle , the seller will sell to a fraction of the consumers; i.e., those with type . The resulting demand curve is , and thus the profit-maximizing bundle price is per good, and the corresponding average profits are and the deadweight loss is . If third-degree price discrimination is feasible, however, the seller will set , resulting in profits of , no deadweight loss, and full extraction of consumer surplus.

Proposition 6

If a consumer of type with preferred consumption level for the discriminating feature chooses a lower consumption level , which is the preferred level for type , the resulting utility loss is . That consumer values the bundle at , and it can be shown that the optimal price schedule is linear: if a consumer's consumption level for the discriminating feature implies type w, the seller charges that consumer price , which results in a truth-telling equilibrium in which each consumer selects the level d implied by his or her type w.

The resulting demand function is , with sales at price . The seller realizes profit , where w* characterizes the marginal consumer that will purchase the bundle. This profit can be calculated to be , and solving yields . Substituting dW for w in the price schedule above yields the result in the Proposition.

Thus, unless , the optimal pricing strategy for the bundle involves taking advantage of the feature d to price-discriminate. If , then the seller is able to achieve third-degree price discrimination, charging each consumer their reservation value for the bundle, and extracting the entire consumer surplus, resulting in higher profits and no deadweight loss.

Appendix 2: A mechanism for recovering information about the valuations of individual goods

The proposed mechanism works as follows:

1. For each good i, expose a random subsample of si potential consumers to prices that make them reveal their demand for this good. These consumers will not have access to good i, which is normally in the bundle, unless they pay an additional price, pi.

2. Extrapolate the information from the subsamples to the rest of the population. If these si consumers are sufficiently representative, then their choices will provide a (noisy) signal of what the demand of the whole population, S, would have been for good i.

This mechanism requires preventing arbitrage among consumers, a condition that can be enforced through technical means, such as public key encryption and authentication; legal means, such as copyrights and patents; social sanctions, such as norms against piracy; or combinations of the three. This mechanism will lead to a deadweight loss for those si consumers who are included in the sample, since some of them may choose to forgo consumption of the good. If si = S, then the mechanism provides exactly the same information as the conventional price system at exactly the same cost. However, it is likely that for most purposes a sufficiently accurate estimate of demand can be calculated for si << S, because of the rapidly declining informativeness (O()) of additional draws from the sample, as shown in Figure 6.

While the conventional price system provides only a binary signal of whether a given consumer's valuation is greater than or less than the market price, by offering different prices to different consumers one could estimate the shape of the entire demand curve, rather than just the portion near the market price. It may be too costly to experiment with prices far from the equilibrium price if all consumers must be offered the same price (Gal-Or, 1987), but if only a few consumers face off-equilibrium prices, then the costs can be kept manageable. Moreover, the shape of demand far from the equilibrium price is an important determinant of the total social surplus created by a good, and therefore the optimal investment policy regarding which types of goods should be created. For these reasons, our mechanism is likely to provide information about consumers' demand at a significantly lower social cost than the conventional price system, and it will never do worse.

Figure 6: Declining marginal benefits of larger samples

This statistical mechanism resembles the way investment decisions about certain information goods are actually made. For instance, information about consumers' valuations of individual television programs is rarely obtained by forcing them to pay for particular programs. Instead, television content producers provide a bundle of the goods for free (broadcast TV) or for a fixed price (cable or direct satellite TV) and rely on statistical sampling by firms like Nielsen and Arbitron to estimate audience size and quality. Advertising rates are based on these estimates, and indirectly determine which types of new television content will be produced. As discussed in section 6.4, this mechanism also resembles how royalties are apportioned to composers and songwriters from the revenues paid by nightclubs, restaurants, and other venues.

Finally, test-marketing of new products using focus groups also has similarities with the mechanism we describe. In fact, any signal that is reliably correlated with consumers' expected valuation for a good can serve as a substitute for the information provided by the conventional price system. These indicators could include prices from related product markets or populations, time spent visiting a site on the World Wide Web, the number of keystrokes made while in a particular application, survey answers on what users say they like, the expert opinion of product specialists, or ratings generated by collaborative filtering mechanisms (see, e.g., Avery, Resnick & Zeckhauser, 1995; Urban, Weinberg & Hauser, 1996).