On Behalf Of Naïveté

by William H. Starbuck

New York University

(Published in J. A. C. Baum and J. V. Singh (eds.), Evolutionary Dynamics of Organizations; Oxford University Press, 1994, pp. 205-220)

PEERING INTO MIRRORS

The phenomena we see reflect ourselves. When we report what we see, our reports tell much about ourselves-they may even tell more about us than about the phenomena we claim to be observing (Starbuck and Milliken, 1988).

In research, the phenomena we see reflect our analytic procedures. When we report research findings, our reports tell much about our analytic procedures. Our reports may tell more about our analytic procedures than about the phenomena we claim to be analyzing (Starbuck, 1981; Webster and Starbuck, 1988).

This chapter advocates changes in our theorizing and our testing of theories. These changes would help us to formulate more meaningful theories and to evaluate theories more rigorously. Although the issues and prescriptions apply generally, the discussion emphasizes time series because these are central in studies of evolutionary dynamics. A time series is a sequence of observations collected over time-for example, annual counts of steel mills over 30 years.

This first section explains why time series so readily support multiple interpretations, including spurious or deceptive inferences. This ambiguity implies that we should use tough criteria to test theories about series. This section also points out that sustaining a null hypothesis is often more useful than rejecting one, but journals favor studies that do the opposite and they encourage scientists to lie. The subsequent section reviews six reasons organizational scientists should pay serious attention to null or naïve hypotheses that describe behaviors as having large random components. We are trying too hard to invent and show the superiority of causal theories, often complex ones; and we too quickly reject simple hypotheses that are more parsimonious. The third section points out how often social scientists test their theories against null hypotheses that they can always reject. Statistical tests would have more import if scientists would test their theories against null models or against naïve hypotheses. (Insert footnote here.)

Ambiguous Reflections of Time

"While the past entertains, ennobles, and expands quite readily, it enlightens only with delicate coaxing." (Fischhoff, 1982: 335)

Those who analyze time series are especially likely to see what their methods dictate. Most time series have high autocorrelations, and autocorrelations foster spurious relations between series.

Ames and Reiter (1961) studied autocorrelations in actual socioeconomic series. They plucked one hundred series at random from Historical Statistics for the United States. Each series spanned the 25 years from 1929 to 1953.

Five sixths of the series had autocorrelations above .8 for a one-year lag, and the mean autocorrelation was .837 for a one-year lag. Even after Ames and Reiter removed linear trends from the series, the mean autocorrelation was .675 for a one-year lag.

Social scientists often calculate correlations between series-for example, the correlation between the number of operating steel mills and Gross National Product. But, high autocorrelations produce high correlations between series that have nothing to do with causal links between those series.

This explains why social scientists find it easy to discover high correlations between series even when series have no direct causal relations (Lovell, 1983; Peach and Webb, 1983). Ames and Reiter correlated randomly selected series. They found a mean (absolute value) correlation of .571 between two series. For 46 percent of the pairs, there existed a time lag of zero to five years that made the two series correlate at least .7.

Ames and Reiter simulated the widespread practice of searching for highly correlated pairs of series. They picked a target series at random, and then compared this series with other series that they also picked randomly. On average, they needed only three trials to draw a second series that appeared to "explain" at least half the variance in a target series. Even after they corrected series for linear trends, they needed only five trials on average to draw a series that seemed to "explain" at least half the variance in a target series.

Ecologists often decompose series conceptually into repeated trials and then try to draw inferences about the processes that generate these series. However, series provide very treacherous grounds for such inferences.

A series that depends causally on its own past values amplifies and perpetuates random disturbances. The process that generates a self-dependent series does not forget random disturbances instantly. Instead, it reacts to random disturbances when generating future outcomes. Replications of such a process produce very different series that diverge erratically from their expected values. An implication is that observed series provide poor evidence about the processes that generated them. A single series or a few replications are very likely to suggest incorrect inferences (Pant and Starbuck, 1990). Gould (1989, 1991) has repeatedly argued that biologists have drawn erroneous inferences about evolution on the basis of improbable, non-recurring, non-replicatable events.

Wold (1965) used computer simulations to show how far self-dependent series diverge from their expected values. He assumed three very simple models, generated series, and then tried to infer which model had generated the series. When he looked at a single instance of a series, inference was hopeless. He could, however, make fairly accurate estimates of central tendencies when he made 200 replications with 100 events in each series-20,000 observations.

Wold used very simple one-variable equations to generate series. Such simple processes are uncommon in socioeconomic analyses. To simulate the kinds of statistical inferences that social scientists usually try to draw from longitudinal data, I have extended Wold's work by generating autocorrelated series with the properties noted by Ames and Reiter: Each series included a linear time trend and was second-order autocorrelated with an autocorrelation coefficient above .6. Each series included 25 events, a length typical of published studies. Each analysis involved three series: Series Y depended causally on series X; but series W was causally independent of both X and Y.

Using accepted procedures for time-series, I then estimated the coefficients of an equation that erroneously hypothesized that Y depended upon both X and W.

The coefficient estimates nearly always showed a statistically significant correlation between Y and W-a reminder of the often ignored injunction that correlation does not equal causation.

The modal coefficient estimates were reasonably accurate; most of the errors fell between 10 percent and 50 percent. But estimates of absolutely small coefficients had errors as high as 4,000,000 percent.

Because Wold had shown that replications led to better estimates of central tendencies, I expected replications to allow better estimates of the coefficients. I wanted to find out how many replications one might need to distinguish causal dependence from independence with a misspecified model. To my surprise, replications proved harmful almost as often as helpful: Nearly forty percent of the time, the very first series analyzed produced more-accurate-than-average coefficient estimates, and so replications made the average errors worse. Thus, replication often fostered confidence in the coefficient estimates without increasing the estimates' average accuracy.

These challenges imply that analysts of series should consider alternative theories and draw conservative inferences. Carroll and Delacroix set an excellent example in this respect when they analyzed newspapers' mortality. They considered several alternative explanations for the observed death rates-ecological, economic, political, and idiosyncratic. Then they (1982: 191) warned readers: "On the one hand, our analysis clearly demonstrates that organizational mortality rates vary across a wide range of environmental dimensions, including industry age, economic development, and political turbulence. On the other hand, as in many historical explorations, data often did not allow us to choose among alternative explanations of these findings."

Nearly all socioeconomic series look like artificial series that have three properties (Pant and Starbuck, 1990): First, each new value of a series is a weighted average of the series' previous value and an increment. Second, some series give past values little weight, but most series give past values much weight. Third, the increments are utterly random. Because autocorrelation makes it easy to discover high correlations between such series, it is especially important to use tough criteria for concluding that relations exist.

Then, should not every analysis take as a benchmark the hypothesis that observed events arise primarily from inertia and random perturbation? Should not scientists discard hypotheses that fit the data no better than this naïve one?

In this regard, Administrative Science Quarterly deserves praise for publishing an article in which Levinthal (1991) showed that a type of random walk can generate data in which new organizations have higher failure rates than old ones. Such a random walk does not explain all aspects of organizational survival, but it makes a parsimonious benchmark. When espousing more complex theories, organizational scientists should prove them superior to a naïve model such as Levinthal's.

Warped Reflections in Print

Francis Bacon, Platt (1964), and Popper (1959) have argued persuasively that science makes progress mainly by showing some hypotheses to be incorrect, not by showing that other hypotheses might be correct. Observations may be consistent with many, many hypotheses, only a few of which have been stated. Showing a specific hypothesis to be consistent with the observations only indicates that it is one of the many plausible hypotheses. This should do little to reduce ambiguity, but it is likely to create a premature belief that the tested hypothesis is the best hypothesis.

More dependable progress comes from eliminating poor hypotheses than from sustaining plausible ones. For scientists, it is better to rule out what is not true than to find support for what might be true.

Translated to the domain of conventional statistical tests, this reasoning implies that rejecting a null hypothesis is a weak contribution. Indeed, rejecting a null hypothesis is often trivial, especially in studies with many observations (Webster and Starbuck, 1988). Further, in the social and economic sciences, theories are often so vague and open to so many interpretations that it may be impossible to identify implications of a rejected null hypothesis. A stronger contribution comes from failing to reject a null hypothesis insofar as this rules out some hypotheses.

Of course, in the social and economic sciences, journals show bias in the opposite direction. Journals regularly refuse to print studies that fail to reject null hypotheses, and there is reason to believe many published articles reject null hypotheses that are true (Blaug, 1980; Greenwald, 1975). The only effective way to expose such errors is by failing to replicate prior findings, but journals also decline to publish replications.

Even worse, editors and reviewers regularly urge authors to misrepresent their actual research processes by inventing "hypotheses" after-the-fact, and to portray these "hypotheses" falsely as having been invented beforehand. There is, of course, nothing wrong with inventing hypotheses a posteriori. There would be no point in conducting research if every scientist could formulate all possible true hypotheses a priori. What is wrong is the falsehood. For others to evaluate their work properly, scientists must speak honestly.

It is as if journals were striving to impede scientific progress.

I know a man who has made two studies that failed to reject null hypotheses. In both studies, he devoted much effort to formulating a priori theories about the phenomena. In the second case, this man revised his two a priori theories to accommodate critiques by many colleagues, so the stated hypotheses had rather general endorsement. In both studies, he felt strong commitments to his a priori theories, he tried very hard to confirm them, and he ended up rejecting them only after months of reanalysis.

In the first case, he also made reanalyses to "test" a posteriori hypotheses that journal reviewers proposed: The reviewers advised him to portray these hypotheses falsely as having been formulated a priori; and they told him to portray the null hypothesis falsely as an alternative a priori hypothesis.

Because his first study had met such resistance, he did not even attempt to describe the second study forthrightly as a test of two alternative theories against a null hypothesis: Instead, convinced that journals do not want honest reports, he wrote his report as if he had entertained three alternative theories from the outset.

The man has so far not succeeded in publishing either study, although one prominent journal asked for three sets of revisions before finally rejecting the manuscript. Reviewers have complained that the studies failed to reject the null hypotheses, not because the alternative hypotheses are wrong, but because the basic data are too noisy, because the researcher used poor measures, or because the stated hypotheses poorly represent valid general theories.

These complaints are not credible, however. In the first case, the researcher reprocessed the raw data several times, both to improve the accuracy of measures and to meet the objections of reviewers. He also tested, but did not confirm, hypotheses that the reviewers themselves had proposed. In the second case, before gathering data, the researcher had sought extensive input from colleagues so as to make his hypotheses as defensible as possible. Thus, the reviewers seem to be giving methodological reasons for rejecting manuscripts that contradict their substantive beliefs (Mahoney, 1977).

In both studies, after-the-fact reflection suggests that the null hypotheses make very significant statements about the phenomena. That is, after one accepts (albeit reluctantly) the idea that the null hypotheses describe the phenomena well, one sees the phenomena quite differently than past research has done, and one sees opportunities for innovative research in the future. Thus, the reviewers have rejected innovative works having profound implications.

WILLY-NILLY MOVES

There are many reasons to expect organizations' behaviors to appear somewhat random. Hannan and Freeman (1984: 150) remarked that organizational changes may be "random with respect to future value." Changes may also look random with respect to their nature. This section surveys reasons for this apparent randomness, and thus, reasons why null or naïve hypotheses often fit data well.

The Red Queen's Hypothesis

"Ultimately, evolutionary success for each competitor comes from acquiring tricks, skills, and strategies to evolve faster and more effectively than the competition." (Campbell, 1985:139)

An organization has advantages only in comparison to other organizations. Communication and imitation destroy advantages. When an innovative property spreads throughout a population of organizations, none of the individual organizations has gained an advantage over the others, and no organization has a higher probability of survival. In fact, making organizations more alike would likely lower their survival probabilities by intensifying competition. Similarly, competitors' responses to innovation destroy advantages. When one type of organization adopts an innovative property, competing types adapt to this property so as to neutralize its advantages.

Van Valen (1973) labeled this aspect of biological evolution the Red Queen's Hypothesis. In Lewis Carroll's Alice Through the Looking Glass, the Red Queen tells Alice that she should not expect to have gone anywhere even though she was running just as fast as she could. In Looking Glass Land, said the Red Queen, "you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"

The analogy of Looking Glass Land applies even more aptly to organizations than to biology: The more visible a property's advantages, the more organizations that can perceive these advantages and the stronger their motivations to adopt similar properties. For instance, Mansfield (1963) found that more profitable innovations get adopted more quickly by more firms. Therefore, the clearer the advantages of an innovative property, the more rapidly it will attract imitators and the more rapidly it will cease offering advantages. The properties most likely to confer persistent advantages are those having highly debatable advantages. When choosing properties to imitate, organizations face tradeoffs of risk versus expected return that resemble the tradeoffs with financial investments.

Organizations may react to the Red Queen's Hypothesis either by imitating other organizations or by trying to innovate. Both reactions make the behavior and performance differences among organizations look more random.

As Carroll (1984: 72) observed: "Recent ecological theory . . . emphasizes the multilineal probabilistic nature of evolution. Thinking has shifted so much in this direction that, as with bioecology, evolution is no longer equated with progress, but simply with change over time." However, ecologists have been trying to achieve this reorientation by treating survival-neutral and survival-degrading changes as random errors (Carroll, 1984: 72-73). Since survival-neutral and survival-degrading changes may themselves be systematic, ecologists' current approach confounds these changes with survival-enhancing ones.

It may prove helpful to classify changes as either systematic, temporary, or random. Besides adjusting to systematic, long-term opportunities and threats, organizations adapt to temporary fads, transient jolts, and accidental disturbances. For instance, Delacroix and Swaminathan (1991) concluded that nearly all organizational changes in the California wine industry are cosmetic, pointless, or speculative.

Organizations' members and environmental evaluators may be unable to distinguish fads and fashions from significant ideas and innovations: These too compete for adherents, and they originate and spread through much the same processes. Just as clothing buyers choose different colors and styles from year to year, so may organizations try new ideas or alter properties for the sake of change, and so may the evaluators of organizations pursue fleeting opportunities or espouse new myths about organizational effectiveness (Abrahamson, 1990).

Pursuing Illusory Opportunities

From the viewpoint of a single organization, the Red Queen's Hypothesis turns opportunities into transient illusions. The organization perceives an opportunity and moves to exploit it. If only one organization were to act alone, the opportunity might be real. However, communication and imitation convert the opportunity into an illusion, and the organizations that try to exploit the opportunity end up no better off-perhaps worse off.

Two theories of firm growth have emphasized illusory opportunities. Andrews (1949) pointed out that firms may expand to obtain short-run cost savings that never become real. From a short-run perspective, managers see some costs as "fixed", meaning that they will not increase if the amount of output goes up incrementally. These fixed costs seem to create opportunities to produce somewhat more output without incurring proportional costs, so managers expect the average cost per unit to decrease as output rises. Yet over the long run, all costs do vary with output. The long-run cost per unit might stay constant or increase as output goes up. Thus, managers might endlessly expand output because they expect growth to decrease average cost, while growth is actually yielding the opposite result.

Penrose (1959: 2) similarly contrasted short-run and long-run perspectives, but she argued, "There may be advantages in moving from one position to another quite apart from the advantages of being in a different position." She (1959: 103) wrote: "The growth of firms may be consistent with the most efficient use of society's resources; the result of past growth-the size attained at any time-may have no corresponding advantages. Each successive step in its growth may be profitable to the firm and, if otherwise under-utilized resources are used, advantageous to society. But once any expansion is completed, the original justification for the expansion may fade into insignificance as new opportunities for growth develop and are acted upon."

Because organizations cannot foretell the distant future, a change that promises benefits today often proves damaging the day after tomorrow. Thus, a change today will likely stimulate still other changes to correct its unexpected results, but these may in turn produce more unexpected results (Pant and Starbuck, 1990).

Solving Unsolvable Problems

One interesting and prevalent type of change is an effort to solve an unsolvable problem. Unsolvable problems exist because societies espouse values that are mutually inconsistent. Since organizations embody societal values, they are trying perpetually to satisfy inconsistent demands. Organizational properties that uphold some values conflict with properties that uphold contrary values.

Hierarchical dominance affords an example. Western societies advocate democracy and equality, but they also advocate hierarchical control, unity, and efficiency. People in these societies expect organizations to adopt hierarchical structures, and to use these structures to coordinate their actions and to eliminate waste. But of course, hierarchical control is undemocratic and unequal. Everyone understands why subordinates do not do as their superiors dictate, and everyone also understands why organizations have to eliminate this inefficient disunity.

So, organizations try to solve the "problem" of resistance to hierarchical control-by making hierarchical control less visible or by aligning subordinates' goals with superiors' goals. In the late 1940s, the solution was for managers to manage "democratically." But after a while, most subordinates inferred that their superiors' democracy was insincere, and this solution failed. In the early 1950s, the solution was for managers to exhibit "consideration" while nevertheless controlling task activities. But after a while, most subordinates inferred that their superiors' consideration was insincere, and this solution failed. In the late 1950s, the solution was Management-By-Objectives, in which superiors and subordinates were to meet periodically and to formulate mutually agreed goals for the subordinates. But after a while, most subordinates inferred that their superiors were using these meetings to dictate goals, and this solution failed. In the 1960s, the solution was "participative management," in which workers' representatives were to participate in managerial boards that made diverse decisions about methods, investments, staffing, strategies, and so on. But after a while, most workers inferred (a) that managers were exerting great influence upon these boards' decisions and (b) that the workers' representatives were benefiting personally from their memberships in these boards, and this solution failed. In the early 1980s, the solution was "organizational culture," by which organizations were to produce unity of goals and methods. But after a while, most managers learned that general solidarity did not translate into operational goals and methods, and employees resisted homogenization, and this solution failed. In the late 1980s, the solution became "quality circles," which broadened into "total quality management." But after a while, . . .

Thus, one fad has followed another. From a short-run perspective, many organizations adopted very similar "solutions"; and from a long-run perspective, many organizations have adopted loosely similar "solutions". Although the various solutions have affected superior-subordinate relations, these effects have been negative as often as positive; and the fundamental "problem" persists. Long-run changes in the fundamental problem and in the various solutions seem to have arisen from economics, education, social structure, and technologies rather than from intraorganizational practices. So the fads' effects look weak and largely random. From a very-long-run perspective, organizations seem to have tried a series of unsuccessful practices.

Unsolvable problems also exist because organizations' overall goals encompass inconsistent subgoals: To achieve more general goals, organizations must pursue subgoals that call for conflicting actions.

An example is profit maximization. Firms try both to obtain as much revenue as possible and to keep costs as low as possible. To maximize revenues, the marketing personnel want their firms to produce customized products that are just what the customers want and to have these available whenever the customers want them. To minimize costs, the production personnel want to minimize inventories and machine downtime, so they want to deliver the same product to every customer, or at least to produce different products in optimal quantities on efficient schedules. One outcome is that the marketing personnel and the production personnel often disagree about what to do and when to do it. Conflicts are unpleasant, however, so firms attempt "conflict resolution". Conflict resolution can at best improve short-run symptoms because these conflicts are intrinsic to profit maximization.

Such intraorganizational conflicts tend to generate solutions over time that vary around long-run equilibria. Deviations from equilibrium occur as first one side and then the other scores a point. Should one side score repeatedly, pushing the joint solution well off-center, higher management has to intervene and rebalance power. Although there is a need for long-run balance, the short-run moves around that balance point may look erratic and make no sense from a long-run perspective. Thus, a random walk might describe the short-run moves well. A random walk would be even more likely to describe the moves well if these were aggregated across several firms: Even if all the firms were dealing with the same technologies and selling in the same market, the intraorganizational conflicts in different firms would generate different solutions at each time.

Multiple Forces

A random walk may also accurately and efficiently describe moves produced by the interactions of multiple forces that act independently. Such moves resemble Brownian motion-the erratic moves of a dust particle in the air, which one can see in a sunbeam. Molecules of air collide with a dust particle constantly and from all sides: During an instant, unequal numbers of molecules hit the particle from different directions, and the dust particle responds by moving in one direction or another. The dust particle's moves are not literally random. One might, in principle, explain them with a complex model that accounts for multitude air molecules. It is more practical, however, to describe the moves as a random walk about a slowly drifting equilibrium. Randomness serves as a concise summary of complexity.

To generate moves that generally resemble a random walk, there need not be multitude forces. Fewer forces have the effect of many forces if each force varies in intensity from time to time. Moves in one dimension-say, higher or lower-might look random if just a few forces act in each direction. Causal processes of this sort pervade the behavioral and social sciences.

Self-dependent Iteration

Self-dependent, inertial causal processes propagate random perturbations, and may even amplify them. A random event does not merely affect a single period; it becomes part of the foundation for future periods. The effects of random perturbations may accumulate over time until they dominate the behavior of a causal process. Thus, one random fleeting perturbation may instigate a persistent series of consequences that fabricate the appearance of a systematic pattern over a fairly long period. The more inertial a process the longer each random perturbation can affect its actions, and Ames and Reiter found that socioeconomic series have a mean autocorrelation of .837.

Iterative processes can also produce the appearance of randomness even though no random events affect them. One property that can yield this result is nonlinear feedback. For example, computers use completely deterministic, simple calculations to generate pseudo-random numbers. These numbers are not actually random; each repetition of a calculation produces precisely the same pseudo-random numbers. Yet these impostors are what we use to create the appearance of randomness. Indeed, in nature, nonlinear deterministic processes may generate many of the phenomena that appear to be stochastic (Mandelbrot, 1983).

Action Generators

Many organizational actions do not reflect current, identified needs or goals. Although some actions arise from problem solving, other actions arise from action generating. Indeed, action generating is probably much more prevalent than problem solving. In their action generating mode, organizations follow programs thoughtlessly. The programs may be triggered by calendars, clocks, job assignments, or unimportant information (Starbuck, 1983).

For instance, strategic planning departments gather and distribute information, make forecasts, hold meetings, set goals, and discuss alternatives whether or not they face specific strategic problems, whether or not their current strategies seem to be succeeding or failing, whether or not strategic planning is likely to prove useful to them. They probably do these things according to an annual calendar that is independent of the timing or appearance of strategic issues. Although one might expect to observe relations between the efforts devoted to strategic planning and the contexts in which it occurs, such relations in the short-run might be quite independent of such relations over the long run. That is, in the long run, firms might discard planning practices that appear useless or harmful, so practices might reflect properties of industries, markets, or technologies. Yet, such selection would be quite noisy because of the long delays between strategic actions and their results, and because of the loose connections between planning practices and strategic actions. In the short run, firms might try ideas experimentally, so their practices would be influenced by interpersonal networks, consultants, or the press. Thus, short-run changes in planning practices ought to have insignificant long-run effects and to be independent of their long-run value.

Summary

Organizations regularly make changes that look partially random. In some cases, it is short-run changes that look random; in other instances, systematic short-run patterns seem to produce random long-run changes. In some cases, performance outcomes seem to be random; in other instances, it is behaviors that look random.

Because organizations' environments react in ways that counteract any gains one organization makes at the expense of others, organizations' behaviors ought to appear random when interpreted in frameworks that emphasize competitive advantages. Because organizations cannot forecast accurately far into the future, their behaviors ought to appear random when interpreted in frameworks that emphasize long-run goals. Because problem analyses assume illusory gains and because some problems have no solutions, organizations repeatedly take actions that have no long-run results. Because actions reflect multiple independent forces, short-run actions may look erratic and inexplicable. Because causal processes feedback into themselves, random events have long-lasting effects and nonrandom processes may produce the appearance of randomness. Because some actions arise from action generators, they have no bases in immediate problems or goals. Because organizations attend to different issues, see different environments, and act at different times, aggregating across several organizations makes behaviors look more random, at least when interpreted in frameworks that emphasize short-run causes or effects.

Thus, organizational scientists should take seriously null or naïve hypotheses that describe behaviors as having large random components. Such hypotheses do not imply that organizations see their actions as random. Hypotheses that emphasize randomness may be parsimonious even when complete, deterministic descriptions would be possible. If simple hypotheses can describe behaviors rather accurately, one must then decide whether the gains from complex, causal descriptions are worthwhile.

Scientists should also take seriously the possibility that complex and seemingly causal relations occur by accident. Because so few empirical "findings" are replicated successfully, scientists need to remain aware of the high probability that observed patterns result from idiosyncratic data. Null or naïve hypotheses that describe behaviors as having large random components provide insurance against idiosyncratic data because they make weak assumptions about statistical properties such as symmetry, independence, and uncorrelated residuals.

WHY PLAY CROQUET WHEN YOU CAN'T LOSE?

Very often, social scientists "test" their theories against ritualistic null hypotheses (a) that the scientists can always reject by collecting sufficient data and (b) that the scientists would not believe no matter what data they collect and analyze. As proofs of knowledge, such statistical significance tests look ridiculous. Such tests are not merely silly, however, because they turn research into a noise generator that fills journals, textbooks, and courses with substantively meaningless "findings."

Scientists can be certain of rejecting any point null hypothesis that defines an infinitesimal point on a continuum. The hypothesis that two sample means are exactly equal is a point hypothesis. Other examples include these null hypotheses:

correlation = 0

frequency = 0

rate = 0

regression coefficient = 0

variance1 = variance2.

All two-tailed hypotheses about continuous variables are tested against point null hypotheses.

The key property of a point hypothesis is that the probability of rejecting it goes to 1 as observations accumulate. If a point hypothesis has not already been rejected, the scientist has only to make more observations (or to make more simulation runs). Thus, passing a "hypothesis test" against such a null hypothesis tells little about the alternative hypothesis, but much about either the scientist's ability to formulate meaningful statements or the scientist's perseverance and effort.

Also, point null hypotheses usually look implausible if one treats them as genuine descriptions of phenomena (Gilpin and Diamond, 1984). For instance, some contingency theorists have assumed that randomly different organizational structures are the only alternative to structures that vary with environmental uncertainty. Do these contingency theorists really believe that no other factors-such as technology-influence organizational structures nonrandomly?

Ritualistic hypothesis tests resemble the croquet game in Wonderland: The Queen of Hearts, said Alice, "is so extremely likely to win that it is hardly worthwhile finishing the game." If a theory can only win a ritualistic competition, it would be better to leave the theory unpublished.

A Proposal from Bioecology

Bioecologists have been debating whether to replace null hypotheses. Connor and Simberloff (1983, 1986) argued that interactions within ecological communities make statistical tests based upon simple null hypotheses too unrealistic. They proposed that bioecologists replace null hypotheses with "null models." Connor and Simberloff (1986: 160) defined: "A null model is an attempt to generate the distribution of values for the variable of interest in the absence of a putative causal process." That is, one uses a "null model" to generate a statistical distribution, and then one asks whether the observed data have high or low probabilities according to this distribution.

For example, different islands in the Galapagos hold different numbers of species of land birds: These numbers might reflect competition between species, physical differences among the islands, or vegetation differences. Using a "null model" that ignored competition between species, Connor and Simberloff (1983) estimated the statistical distributions of the numbers of species pairs on islands in the Galapagos: Their estimates assumed that each island held the number of species observed on it and that each species inhabited as many islands as observed, but that species were otherwise distributed randomly and independently. All the observed numbers of species pairs fell within two standard deviations of the means in the distributions implied by this null model, and most observations were close to the expected values. So, Connor and Simberloff inferred that competition between species had little effect on the numbers of species of Galapagos land birds.

Not surprisingly, some bioecologists have voiced strong reservations about "null models" (Harvey et al., 1983). Among several points of contention, Gilpin and Diamond (1984) argued (a) that "null models" are not truly null because they make implicit assumptions and (b) that they are difficult to reject because fitting coefficients to data removes randomness. Gilpin and Diamond do have a point, in that describing such models as "null" might create false expectations. Connor and Simberloff's "null model", for instance, took as premises the observed numbers of species on each island and of islands inhabited by each species. These numbers allow for some physical and vegetation differences among the islands, and Gilpin and Diamond noted that these numbers might reflect competition between species as well. On the other hand, Connor and Simberloff (1986: 161) pointed out that scientists can choose null models that virtually guarantee their own rejection: "For this null model, and for null models in general, if one is unwilling to make assumptions to account for structure in the data that can reasonably be attributed to causal processes not under investigation, then posing and rejecting null hypotheses will be trivially easy and uninteresting."

Computers play a key role in this debate. The distributions computed by Connor and Simberloff would have required superhuman effort before 1950. One of the original reasons for using point null hypotheses was algebraic feasibility. Because statisticians had to manipulate statistical distributions algebraically, they built analytic rationales around algebraically amenable distributions. Computers, however, give scientists means to generate statistical distributions that represent more complicated assumptions. It is no longer necessary use the distributions published in textbooks.

Rather than compare the data with two-standard-deviation confidence limits, as Connor and Simberloff did, however, it is more sensible to compute a likelihood ratio of the kind described below. Although Connor and Simberloff's approach has the advantage of looking like a traditional hypothesis test, it entails the parallel disadvantage of treating truth as a binary variable. A model is either true or false. Likelihood ratios allow one to treat truth as a continuous variable-one model may be more true than another, and yet both of the compared models may be unlikely.

Naïve Hypotheses

An alternative and related proposal derives from forecasting. Because they regularly confront autocorrelated time series, forecasting researchers usually disdain null hypotheses and instead compare their forecasts with "naïve forecasts".

For example, a naïve person might advance either of two hypotheses about a series. One naïve hypothesis-the no-change hypothesis-says that the next value will be the same as the current value. This hypothesis makes no specific assertions about the causal processes that generate the series. It merely expresses the idea that most causal processes are inertial: What happens tomorrow will resemble what is happening today. The second naïve hypothesis-the linear-trend hypothesis-says the trend observed since yesterday will continue until tomorrow: The next value will differ from the current value by the same amount that the current value differs from the previous value. This hypothesis expresses the idea that most causal processes are inertial in trend as well as in state.

Neither of these naïve hypotheses says anything profound. Either could come from a young child who has no understanding of the causal processes that generate a series. So, one should expect more accurate predictions from a complicated forecasting technique-or from a scientific theory that supposedly does say something profound.

Comparing focal hypotheses with naïve hypotheses instead of null hypotheses gives the comparisons more credibility and more substantive significance. In this volume, Ginsberg and Baum (1993) compare their theory to the foregoing naïve hypotheses. Had they stopped after merely testing the null hypotheses that acquisition rates do not vary with diverse variables, we would know only that their theory is better than nothing. However, they also show that their theory fits the data distinctly better than naïve statements about inertia. I find this impressive.

Note that Ginsberg and Baum might have made an even more useful contribution if it had turned out that their theory was no more accurate than the naïve models. Such an outcome would have shown that a simple explanation-inertia-works very well. As it happens, of course, this is not the case. Inertia is not a powerful explanation for these data. But that is the main inference we ought to draw from the superiority of Ginsberg and Baum's theory. We should not infer that their theory is correct. Not only may their theory not be correct in detail, it may not even take account of the most important causal factors. There may be several other explanations that would be even more effective.

Also note that some naïve hypotheses, including the two above, are point hypotheses. If one uses them like null hypotheses, conventional statistical tests too will inevitably disconfirm them. So instead, one calculates the likelihood ratio (Jeffreys and Berger, 1992):

Probability (Data if the focal hypotheses are true)
Probability (Data if the naïve hypothesis is true)

If the focal hypotheses work better than the naïve hypothesis, the ratio will be substantially greater than one. One must then ask whether the ratio is large enough to justify the greater complexity associated with the focal hypotheses.

Such comparisons usually have more meaning if one states the likelihood ratios on a per-trial or per-period basis. For example, one can use the nth root of

Probability (Data if the focal hypotheses are true)
Probability (Data if the naïve hypothesis is true)

where n denotes the number of time periods.

Simple Competitors May Win in the Future

Ginsberg and Baum's comparisons with naïve hypotheses make their theory look impressive because improving on naïve hypotheses is difficult. Substantive theories often turn out to be no better than naïve hypotheses.

Ginsberg and Baum do not, however, test their theory with predictions. Naïve hypotheses generally look much better when used to make genuine predictions about the future than when used to rationalize the past (Pant and Starbuck, 1990; Starbuck 1983).

Since the 1950s, macroeconomists have invested enormous resources in trying to create complex, mathematical, statistically estimated theories that predict short-run phenomena well. The teams that developed these models included some of the world's most respected economists, and they spent hundreds of man-years. They used elegant statistical methods. They did not lack financial or computation resources, for the U. S. Government has spent many millions of dollars for data gathering and research grants. Major industrial firms pay large sums for the predictions generated by these models. So, these models represent the very best in economic or social forecasting.

Elliott (1973) tested the predictive accuracies of four of the best-known macroeconomic models. Of course, all these models had been published originally with evidence of their predictive accuracies, but these demonstrations had involved postdicting the very data from which the models' coefficients had been estimated, and each model had been fitted against data from different periods. Elliott fitted all four models to data from the same period, then measured their accuracies in predicting subsequent events. Three models turned out to be as accurate as the no-change hypothesis. The simplest of the models, which was the most accurate, was as accurate as the linear-trend hypothesis.

The findings of Makridakis and colleagues (1979; 1982) resemble those of Elliott. They compared 24 statistical forecasting methods by forecasting 1001 series. They found that no-change hypotheses beat others 38 to 64 percent of the time. Also, no-change hypotheses were less likely to make large errors than any other method. Yet, the most accurate forecasts came from exponential smoothing, which beat every other method at least 50 percent of the time. Exponential smoothing is a version of the linear-trend hypothesis; it assumes data include random noise and it filters this noise by averaging. The averaging usually gives more weight to newer data.

METAPHYSICO-THEOLOGICO-COSMOLO-NIGOLOGY

Synopsis

Research findings may tell more about analytic procedures than about the phenomena being studied. We need to state theories more meaningfully and to evaluate theories more rigorously.

Series are central in studies of evolutionary dynamics, and we should use especially tough criteria to test theories about them. Observed series provide poor evidence about the processes that generated them, and they offer many opportunities for spurious or deceptive inferences. A self-dependent series does not forget random disturbances instantly; it reacts to random disturbances when generating future outcomes. Replications produce very different series that diverge erratically from their expected values.

Sustaining a null hypothesis is more useful than rejecting a null hypothesis. Rejecting a null hypothesis does little to reduce ambiguity, and it is often a trivial achievement. It does not prove the value of the alternative hypothesis. A stronger contribution comes from failing to reject a null hypothesis insofar as this rules out at least one ineffective alternative hypothesis. Yet, journals work in the opposite direction. They reject studies that do not reject null hypotheses, and they do not publish replications. Editors and reviewers urge authors to lie by portraying after-the-fact "hypotheses" as having been invented beforehand.

Null or naïve hypotheses often fit organizational data well because organizations make changes having insignificant long-run effects. Organizations' behaviors should appear random with respect to competitive advantages because organizations' environments try to counteract the gains one organization makes at the expense of others. Organizations' behaviors should appear random with respect to long-run goals because organizations cannot forecast far into the future. Short-run actions may look erratic and inexplicable because actions reflect multiple independent forces. Random events may have long-lasting effects and nonrandom processes may produce apparent randomness because causal processes feedback into themselves. Organizations repeatedly take actions having no long-run results because problem analyses seek illusory gains and because some problems have no solutions. Many actions have no bases in immediate problems or goals because they come from action generators. Aggregating across organizations makes behaviors look more random because organizations attend to different issues, see different environments, and act at different times.

Much too often, social scientists "test" their theories against null hypotheses that the scientists can always reject by collecting enough data. Such tests turn research into a generator of substantively meaningless "findings."

Scientists who are willing to gather sufficient data can reject any null hypothesis that specifies an infinitesimal point on a continuum. The probability of rejecting a point hypothesis goes to 1 as the observations grow numerous. Also, most point null hypotheses look implausible if one treats them as genuine descriptions of phenomena.

One response to this situation is to formulate "null models" that incorporate some simple assumptions about the data. One uses the null models to generate statistical distributions and compares the data with these distributions.

An alternative response is to compare focal hypotheses with naïve hypotheses. A naïve hypothesis represents one or two basic ideas of the sort that a naïve person might advance. Theories often turn out to be no better than naïve hypotheses-especially when both are used to predict future events. However, some naïve hypotheses are also point hypotheses. In such cases, rather than formulate the analysis as a significance test, scientists should use likelihood ratios to compare the alternative hypotheses.

Living in the Best of All Possible Worlds

We are trying too hard to show the superiority of complex causal theories, while too quickly rejecting simple null or naïve hypotheses that say behavior has large random components. We are, indeed, so eager to discern causality that we embrace a hollow statistical methodology, we spurn replication, and we refuse to publish articles that interpret events simply. Although social construction of reality is a pervasive phenomenon, it may not be a useful foundation for scientific research.

"Master Pangloss taught the metaphysico-theologico-cosmolo-nigology. He could prove to admiration that there is no effect without a cause; and, that in this best of all possible worlds, the Baron's castle was the most magnificent of all castles, and My Lady the best of all possible baronesses.

"It is demonstrable, said he, that things cannot be otherwise than they are; for as all things have been created to some end, they must necessarily be created for the best end. Observe, for instance, the nose is formed for spectacles, therefore we wear spectacles. The legs are visibly designed for stockings, accordingly we wear stockings. Stones were made to be hewn, and to construct castles, therefore My Lord has a magnificent castle; for the greatest baron in the province ought to be the best lodged. Swine were intended to be eaten, therefore we eat pork all the year round; and they, who assert that everything is right, do not express themselves correctly; they should say that everything is best.

"Candide listened attentively, and believed implicitly. . . ." (Voltaire, Chapter I, Part I of Candide; or, the Optimist, 1756)

FOOTNOTE

This manuscript has benefited from the useful suggestions of Eric Abrahamson, Joel Baum, Jacques Delacroix, Charles Fombrun, Theresa Lant, Jim March, and John Mezias.

REFERENCES

Abrahamson, E. (1990). Fads and Fashions in Administrative Technologies. Doctoral dissertation, New York University.

Ames, E., & Reiter, S. (1961). Distributions of correlation coefficients in economic time series. Journal of the American Statistical Association, 56: 637-656.

Andrews. P. W. S. (1949). A reconsideration of the theory of the individual business. Oxford Economics Papers, 1: 54-89.

Blaug, M. (1980). The Methodology of Economics: Or How Economists Explain. Cambridge: Cambridge University Press.

Campbell, J. H. (1985). An organizational interpretation of evolution. In D. J. Depew and B. H. Weber (Eds.), Evolution at a Crossroads: The New Biology and the New Philosophy of Science: 133-168. Cambridge, MA: MIT Press.

Carroll, G. R. (1984). Organizational ecology. Annual Review of Sociology, 10: 71-93.

Carroll, G. R., & Delacroix, J. (1982). Organizational mortality in the newspaper industries of Argentina and Ireland: An ecological approach. Administrative Science Quarterly, 27: 169-198.

Connor, E. F., & Simberloff, D. (1983). Interspecific competition and species co-occurrence patterns on islands: Null models and the evaluation of evidence. Oikos, 41: 455-465.

Connor, E. F., & Simberloff, D. (1986). Competition, scientific method, and null models in ecology. American Scientist, 74, 155-162.

Delacroix, J., & Swaminathan, A. (1991) Cosmetic, speculative, and adaptive organizational change in the wine industry: A longitudinal study. Administrative Science Quarterly, 36: 631-661.

Elliott, J. W. (1973). A direct comparison of short-run GNP forecasting models. Journal of Business, 46, 33-60.

Fischhoff, B. (1980). For those condemned to study the past: Heuristics and biases in hindsight. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment Under Uncertainty: Heuristics and Biases: 335-351. Cambridge: Cambridge University Press.

Gilpin, M. E., & Diamond, J. M. (1984). Are serious co-occurrences on islands non-random, and are null hypotheses useful in community ecology? In D. R. Strong & others, Ecological Communities: Conceptual Issues and the Evidence: 297-315. Princeton: Princeton University Press.

Ginsberg, A., and Baum, J. A. C. (1993?). Evolutionary processes and patterns of core business change. In Evolutionary Dynamics of Organizations, J. A. C. Baum and J. V. Singh (Eds.). New York: Oxford University Press. (This volume)

Gould, S. J. (1989). Wonderful Life: The Burgess Shale and the Nature of History. New York: W. W. Norton.

Gould, S. J. (1991). Bully for Brontosaurus: Reflections on Natural History. New York: W. W. Norton.

Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82: 1-20.

Hannan, M. T., and Freeman, J. H. (1984). Structural inertia and organizational change. American Sociological Review, 29: 149-164.

Harvey, P. H., Colwell, R. K., Silvertown, J. W., & May, R. M. (1983). Null models in ecology. Annual Review of Ecology and Systematics, 14: 189-211.

Jeffreys, W. H., & Berger, J. O. (1992). Ockham's Razor and Bayesian analysis. American Scientist, 80: 64-72.

Levinthal, D. (1991). Random walks and organizational mortality. Administrative Science Quarterly, 36: 397-420.

Lovell, M. C. (1983). Data mining. Review of Economics and Statistics, 65, 1-12.

Mahoney, M. J. (1977). Publication prejudices: An experimental study of confirmatory bias in the peer review system. Cognitive Therapy and Research, 1: 161-175.

Makridakis, S., & Hibon, M. (1979). Accuracy of forecasting: An empirical investigation. Journal of the Royal Statistical Society, Series A, 142, 97-145. (Reprinted in S. Makridakis, A. Andersen, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen, & R. Winkler, The Forecasting Accuracy of Major Time Series Methods: 35-101. Chichester: Wiley, 1984.)

Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R., Newton, J., Parzen, E., & Winkler, R. L. (1982). The accuracy of extrapolation (time series) methods: Results of a forecasting competition. Journal of Forecasting, 1, 111-153. (Reprinted in S. Makridakis, A. Andersen, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen, & R. Winkler, The Forecasting Accuracy of Major Time Series Methods: 103-165. Chichester: Wiley, 1984.)

Mandelbrot, B. B. (1983). The Fractile Geometry of Nature. New York: W. H. Freeman.

Mansfield, E. (1963). The speed of response of firms to new techniques. Quarterly Journal of Economics, 77: 290-311.

Pant, P. N., & Starbuck, W. H. (1990). Innocents in the forest: Forecasting and research methods. Journal of Management, 16: 433-460.

Peach, J. T., & Webb, J. L. (1983). Randomly specified macroeconomic models: Some implications for model selection. Journal of Economic Issues, 17, 697-720.

Penrose. E. T. (1959). The Theory of the Growth of the Firm. New York: Wiley.

Platt, J. R. (1964). Strong inference. Science, 146: 347-353.

Popper, K. R. (1959). The Logic of Scientific Discovery. New York: Basic Books.

Starbuck, W. H. (1981). A trip to view the elephants and rattlesnakes in the garden of Aston. In Perspectives on Organization Design and Behavior, A. H. Van de Ven and W. F. Joyce (Eds.): 167-198. New York: Wiley-Interscience.

Starbuck, W. H. (1983). Organizations as action generators. American Sociological Review, 48: 91-102.

Starbuck, W. H., & Milliken, F. J. (1988). Executives' perceptual filters: What they notice and how they make sense. In D. C. Hambrick (Ed.) The Executive Effect: Concepts and Methods for Studying Top Managers: 35-65. Greenwich, CT: JAI Press.

Van Valen, L. (1973). A new evolutionary law. Evolutionary Theory, 1: 1-30.

Webster, J., & Starbuck, W. H. (1988). Theory building in industrial and organizational psychology. In C. L. Cooper, & I. Robertson (Eds.) International Review of Industrial and Organizational Psychology 1988: 93-138. Chichester: Wiley.

Wold, H. O. A. (1965). A graphic introduction to stochastic processes. In H.O.A. Wold (Ed.) Bibliography on Time Series and Stochastic Processes: 7-76. Edinburgh: Oliver & Boyd.