THEORY BUILDING IN INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY

 

Jane Webster

and

William H. Starbuck

New York University

Pages 93-138 in C. L. Cooper and I. T. Robertson (eds.), International Review of Industrial and Organizational Psychology 1988; Wiley, 1988.

SUMMARY

I/O psychology has been progressing slowly. This slowness arises partly from a three-way imbalance: a lack of substantive consensus, insufficient use of theory to explain observations, and excessive confidence in induction from empirical evidence. I/O psychologists could accelerate progress by adopting and enforcing a substantive paradigm.

Staw (1984: 658) observed:

The micro side of organizational behavior historically has not been strong on theory. Organizational psychologists have been more concerned ‘with research methodology, perhaps because of the emphasis upon measurement issues in personnel selection and evaluation. As an example of this methodological bent, the I/O Psychology division of the American Psychological Association, when confronted recently with the task of improving the field’s research, formulated the problem as one of deficiency in methodology rather than theory construction.... It is now time to provide equal consideration to theory formulation.

This chapter explores the state of theory in I/O psychology and micro-Organizational Behavior (OB).[1] The chapter argues that these fields have progressed very slowly, and that progress has occurred so slowly partly because of a three-way imbalance: a lack of theoretical consensus, inadequate attention to using theory to explain observations, coupled with excessive confidence in induction from empirical evidence. As a physicist, J. W. N. Sullivan (1928; quoted by Weber, 1982, p. 54), remarked: ‘It is much easier to make measurements than to know exactly what you are measuring.’

Well-informed people hold widely divergent opinions about the centrality and influence of theory. Consider Dubin’s (1976, p. 23) observation that managers use theories as moral justifications, that managers may endorse job enlargement, for example, because it permits more complete delegation of responsibilities, raises morale and commitment, induces greater effort, and implies a moral imperative to seek enlarged jobs and increased responsibilities. Have these consequences anything to do with theory? Job enlargement is not a theory, but a category of action. Not only do these actions produce diverse consequences, but the value of any single consequence is frequently debatable. Is it exploitative to elicit greater effort, because workers contribute more but receive no more pay? Or is it efficient, because workers contribute more but receive no more pay? Or is it humane, because workers enjoy their jobs more? Or is it uplifting, because work is virtuous and laziness despicable? Nothing compels managers to use job enlargement; they adopt it voluntarily. Theory only describes the probable consequences if they do use it. Furthermore, there are numerous theories about work redesigns such as job enlargement and job enrichment, so managers can choose the theories they prefer to espouse. Some theories emphasize the consequences of work redesign for job satisfaction; others highlight its consequences for efficiency, and still others its effects on accident rates or workers’ health (Campion and Thayer, 1985).

We hold that theories do make a difference, to non-scientists as well as to scientists, and that theories often have powerful effects. Theories are not neutral descriptions of facts. Both prospective and retrospective theories shape facts. Indeed, the consequences of actions may depend more strongly on the actors’ theories than on the overt actions. King’s (1974) field experiment illustrates this point. On the surface, the study aimed at comparing two types of job redesign: a company enlarged jobs in two plants, and began rotating jobs in two similar plants. But the study had a 2 x 2 design. Their boss told two of the plant managers that the redesigns ought to raise productivity but have no effects on industrial relations; and he told the other two plant managers that the redesigns ought to improve industrial relations and have no effects on productivity. The observed changes in productivity and absenteeism matched these predictions: productivity rose significantly while absenteeism remained stable in those two plants, and absenteeism dropped while productivity remained constant in the other two plants. Job rotation and job enlargement, however, yielded the same levels of productivity and absenteeism. Thus, the differences in actual ways of working produced no differences in productivity or absenteeism, but the different rationales did induce different outcomes.

Theories shape facts by guiding thinking. They tell people what to expect, where to look, what to ignore, what actions are feasible, what values to hold. These expectations and beliefs then influence actions and retrospective interpretations, perhaps unconsciously (Rosenthal, 1966). Kuhn (1970) argued that scientific collectivities develop consensus around coherent theoretical positions- paradigms. Because paradigms serve as frameworks for interpreting evidence, for legitimating findings, and for deciding what studies to conduct, they steer research into paradigm-confirming channels, and so they reinforce themselves and remain stable for long periods. For instance, in 1909, Finkelstein reported in his doctoral dissertation that he had synthesized benzocyclobutene (Jones, 1966). Finkelstein’s dissertation was rejected for publication because chemists believed, at that time, such chemicals could not exist, and so his finding had to be erroneous. Theorists elaborated the reasons for the impossibility of these chemicals for another 46 years, until Finkelstein’s thesis was accidentally discovered in 1955.

Although various observers have argued that the physical sciences have stronger consensus about paradigms than do the social sciences, the social science findings may be even more strongly influenced by expectations and beliefs. Because these expectations and beliefs do not win consensus, they may amplify the inconsistencies across studies. Among others, Chapman and Chapman (1969), Mahoney and DeMonbreun (1977) and Snyder (1981) have presented evidence that people holding prior beliefs emphasize confirmatory strategies of investigation, they rarely use disconfirmatory strategies, and they discount disconfirming observations: these confirmatory strategies turn theories into self-fulfilling prophecies in situations where investigators’ behaviors can elicit diverse responses or where investigators can interpret their observations in many ways (Tweney et al., 1981). Mahoney (1977) demonstrated that journal reviewers tend strongly to recommend publication of manuscripts that confirm their beliefs and to give these manuscripts high ratings for methodology, whereas reviewers tend strongly to recommend rejection of manuscripts that contradict their beliefs and to give these manuscripts low ratings for methodology. Faust (1984) extrapolated these ideas to theory evaluation and to review articles, such as those in this volume, but he did not take the obvious next step of gathering data to confirm his hypotheses.

Thus, theories may have negative consequences. Ineffective theories sustain themselves and tend to stabilize a science in a state of incompetence, just as effective theories may suggest insightful experiments that make a science more powerful. Theories about which scientists disagree foster divergent findings and incomparable studies that claim to be comparable. So scientists could be better off with no theories at all than with theories that lead them nowhere or in incompatible directions. On the other hand, scientists may have to reach consensus on some base-line theoretical propositions in order to evaluate adequately the effectiveness of these base-line propositions and the effectiveness of newer theories that build on these propositions. Consensus on base-line theoretical propositions, even ones that are somewhat erroneous, may also be an essential prerequisite to the accumulation of knowledge because such consensus leads scientists to view their studies in a communal frame of reference (Kuhn, 1970). Thus, it is an interesting question whether the existing theories or the existing degrees of theoretical consensus have been aiding or impeding scientific progress in I/O psychology.

Consequently and paradoxically, this chapter addresses theory building empirically, and the chapter’s outline matches the sequence in which we pose questions and seek answers for them.

First we ask: How much progress has occurred in I/O psychology? If theories are becoming more and more effective over time, they should explain higher and higher percentages of variance. Observing the effect sizes for some major variables, we surmise that I/O theories have not been improving.

Second, hunting an explanation for no progress or negative progress, we examine indicators of paradigm consensus. To our surprise, I/O psychology does not look so different from chemistry and physics, fields that are perceived as having high paradigm consensus and as making rapid progress. However, physical science paradigms embrace both substance and methodology, whereas I/O psychology paradigms strongly emphasize methodology and pay little attention to substance.

Third, we hypothesize that I/O psychology’s methodological emphasis is a response to a real problem, the problem of detecting meaningful research findings against a background of small, theoretically meaningless, but statistically significant relationships. Correlations published in the Journal of Applied Psychology seem to support this conjecture. Thus, I/O psychologists may be de-emphasizing substance because they do not trust their inferences from empirical evidence.

In the final section, we propose that I/O psychologists accelerate the field’s progress by adopting and enforcing a substantive paradigm. We believe that I/O psychologists could embrace some base-line theoretical propositions that are as sound as Newton’s laws, and using base-line propositions would project findings into shared perceptual frameworks that would reinforce the collective nature of research.

 

PROGRESS IN EXPLAINING VARIANCE

Theories may be evaluated in many ways. Webb (l961) said good theories exhibit knowledge, skepticism and generalizability. Lave and March (1975) said good theories are metaphors that embody truth, beauty and justice; whereas unattractive theories are inaccurate, immoral or unaesthetic. Daft and Wiginton (1979) said that influential theories provide metaphors, images and concepts that shape scientists’ definitions of their worlds. McGuire (1983) noted that people may appraise theories according to internal criteria, such as their logical consistency, or according to external criteria, such as the statuses of their authors. Miner (1984) tried to rate theories’ scientific validity and usefulness in application. Landy and Vasey (1984) pointed out tradeoffs between parsimony and elegance and between literal and figurative modeling.

Effect sizes measure theories’ effectiveness in explaining empirical observations or predicting them. Nelson et al. (1986) found that psychologists’ confidence in research depends primarily on significance levels and secondarily on effect sizes. But investigators can directly control significance levels by making more or fewer observations, so effect sizes afford more robust measures of effectiveness.

According to the usual assumptions about empirical research, theoretical progress should produce rising effect sizes-for example, correlations should get larger and larger over time. Kaplan (1963: 351-5) identified eight ways in which explanations may be open to further development; his arguments imply that theories can be improved by:

1.         taking account of more determining factors,

2.         spelling out the conditions under which theories should be true,

3.         making theories more accurate by refining measures or by specifying more precisely the relations among variables,

4.         decomposing general categories into more precise subclasses, or aggregating complementary subclasses into general categories,

5.         extending theories to more instances,

6.         building up evidence for or against theories’ assumptions or predictions,

7.         embedding theories in theoretical hierarchies, and

8.         augmenting theories with explanations for other variables or situations.

 

The first four of these actions should increase effect sizes if the theories are fundamentally correct, but not if the theories are incorrect. Unless it is combined with the first four actions, action (5) might decrease effect sizes even for approximately correct theories. Action (6) could produce low effect sizes if theories are incorrect.

Social scientists commonly use coefficients of determination, r2, to measure effect sizes. Some methodologists have been advocating that the absolute value of r affords a more dependable metric than r2 in some instances (Ozer, 1985; Nelson et al., 1986). For the purposes of this chapter, these distinctions make no difference because r2 and the absolute value of r increase and decrease together. We do, however, want to recognize the differences between positive and negative relationships, so we use r.

Of the nine effect measures we use, six are bivariate correlations. One can argue that, to capture the total import of a stream of research, one has to examine the simultaneous effects of several independent variables. Various researchers have advocated multivariate research as a solution to low correlations (Tinsley and Heesacker, 1984; Hackett and Guion, 1985). However, in practice, multivariate research in I/O psychology has not fulfilled these expectations, and the articles reviewing I/O research have not noted any dramatic results from the use of multivariate analyses. For instance, McEvoy and Cascio (1985) observed that the effect sizes for turnover models have remained small despite research incorporating many more variables. One reason is that investigators deal with simultaneous effects in more than one way: they can observe several independent variables that are varying freely; they can control for moderating variables statistically; and they can control for contingency variables by selecting sites or subjects or situations. It is far from obvious that multivariate correlations obtained in uncontrolled situations should be higher than bivariate correlations obtained in controlled situations. Indeed, the rather small gains yielded by multivariate analyses suggest that careful selection and control of sites or subjects or situations may be much more important than we have generally recognized.

Scientists’ own characteristics afford another reason for measuring progress with bivariate correlations. To be useful, scientific explanations have to be understandable by scientists; and scientists nearly always describe their findings in bivariate terms, or occasionally trivariate terms. Even those scientists who advocate multivariate analyses most fervently fall back upon bivariate and trivariate interpretations when they try to explain what their analyses really mean. This brings to mind a practical lesson that Box and Draper (1969) extracted from their efforts to use experiments to discover more effective ways to run factories: Box and Draper concluded that practical experiments should manipulate only two or three variables at a time because the people who interpret the experimental findings have too much difficulty making sense of interactions among four or more variables. Speaking directly of the inferences drawn during scientific research, Faust (1984) too pointed out the difficulties that scientists have in understanding four-way interactions (Meehl, 1954; Goldberg, 1970). He noted that the great theoretical contributions to the physical sciences have been distinguished by their parsimony and simplicity rather than by their articulation of complexity. Thus, creating theories that psychologists themselves will find satisfying probably requires the finding of strong relationships among two or three variables.

To track progress in I/O theory building, we gathered data on effect sizes for five variables that I/O psychologists have often studied. Staw (1984) identified four heavily researched variables: job satisfaction, absenteeism, turnover and job performance. I/O psychologists also regard leadership as an important topic: three of the five annual reviews of organizational behavior have included it (Mitchell, 1979; Schneider, 1985; House and Singh, 1987).

Other evidence supports the centrality of these five variables for I/O psychologists. De Meuse (1986) made a census of dependent variables in I/O psychology, and identified job satisfaction as one of the most frequently used measures; it had been the focus of over 3000 studies by 1976 (Locke, 1976). Psychologists have correlated job satisfaction with numerous variables: Here, we examine its correlations with job performance and with absenteeism. Researchers have made job performance I/O psychology’s most important dependent variable, and absenteeism has attracted research attention because of its costs (Hackett and Guion, 1985). We look at correlations of job satisfaction with absenteeism because researchers have viewed absenteeism as a consequence of employees’ negative attitudes (Staw, 1984).

Investigators have produced over 1000 studies on turnover (Steers and Mowday, 1981). Recent research falls into one of two categories: turnover as the dependent variable when assessing a new work procedure, and correlations between turnover and stated intentions to quit (Staw, 1984).

Although researchers have correlated job performance with job satisfaction for over fifty years, more consistent performance differences have emerged in studies of behavior modification and goal setting (Staw, 1984). Miner (1984) surveyed organizational scientists, who nominated behavior modification and goal setting as the two of the most respected theories in the field. Although these two theories overlap (Locke, 1977; Miner, 1980), they do have somewhat different traditions, and so we present them separately here.

Next to job performance, investigators have studied leadership most often (Mitchell, 1979; De Meuse, 1986). Leadership research may be divided roughly into two groups: theories about the causes of leaders’ behaviors, and theories about contingencies influencing the effectiveness of leadership styles. Research outside these two groups has generated too few studies for us to trace effect sizes over time (Van Fleet and Yukl, 1986).

Many years ago, psychologists seeking ways to identify effective leaders focused their research on inherent traits. This work, however, turned up very weak relationships, and no set of traits correlated consistently with leaders’ effectiveness. Traits also offended Americans’ ideology espousing equality of opportunity (Van Fleet and Yukl, 1986). Criticisms of trait approaches directed research towards contingency theories (Lord et al., 1986). But these studies too turned up very weak relationships, so renewed interest in traits has surfaced (Kenny and Zaccaro, 1983; Schneider, 1985). As an example of the trait theories, we examine the correlations of intelligence with perceptions of leadership, because these have demonstrated the highest and most consistent relationships.

It is impossible to summarize the effect sizes of contingency theories of leadership in general. First, even though leadership theorists have proposed many contingency theories, little research has resulted (Schriesheim and Kerr, 1977), possibly because some of the contingency theories may be too unclear to suggest definitive empirical studies (Van Fleet and Yukl, 1986). Second, different theories emphasize different dependent variables (Campbell, 1977; Schriesheim and Kerr, 1977; Bass, 1981). Therefore, one must focus on a particular contingency theory. We examine Fiedler’s (1967) theory because Miner (1984) reported that organizational scientists respect it highly.

 

Sources

A manual search of thirteen journals[2] turned up recent review articles concerning the five variables of interest; Borgen et al. (1985) identified several of these review articles as exemplary works. We took data from articles that reported both the effect sizes and the publication dates of individual studies. Since recent review articles did not cover older studies well, we supplemented these data by examining older reviews, in books as well as journals. In all, data came from the twelve sources listed in Table 1; these articles reviewed 261 studies.

Table 1 – Review Article Sources

Job satisfaction

Iaffaldano and Muchinsky (1985)

 

Vroom (1964)

 

Brayfield and Crockett (1955)

 

 

Absenteeism

Hackett and Guion (1985)

 

Vroom (1964)

 

Brayfield and Crockett (1955)

 

 

Turnover

McEvoy and Cascio (1985)

 

Steel and Ovalle (1984)

 

 

Job Performance

Hopkins and Sears (1982)

 

Locke et al. (1980)

 

 

Leadership

Lord et al. (1986)

 

Peters et al. (1985)

 

Mann (1959)

 

Stogdill (1948)

 

Measures

Each research study is represented by a single measure of effect: for a study that measured the concepts in more than one way, we averaged the reported effect sizes.

To trace changes in effect sizes over time, we divided time into three equal periods. For instance, for studies from 1944 to 1983, we compare the effect sizes for 1944-57, 1958-70 and 1971-83.

 

Results

Figures 1-4 present the minimum, maximum and average effect sizes for the five variables of interest. Three figures (1(a), 3(b) and 4) seem to show that no progress has occurred over time; and four figures (1(b), 2(a), 2(b) and 3(a)) seem to indicate that effect sizes have gradually declined toward zero over time. The largest of these correlations is only .22 in the most recent time period, so all of these effects account for less than five per cent of the variance.

 

Moreover, four of these relationships (2(a), 2(b), 3(a) and 3(b)) probably incorporate Hawthorne effects: They measure the effects of interventions. Because all interventions should yield some effects, the differential impacts of specific interventions would be less than these effect measures suggest. That is, the effects of behavior modification, for example, should not be compared with inaction, but compared with those of an alternative intervention, such as goal setting.

Figure 2(c) is the only one suggesting significant progress. Almost all of this progress, however, occurred between the first two time periods: Because only one study was conducted during the first of these periods, the apparent progress might be no more than a statement about the characteristics of that single study. This relationship is also stronger than the others, although not strong enough to suggest a close causal relationship: The average correlation in the most recent time period is .40. What this correlation says is that some of the people who say in private that they intend to quit actually do quit.

Progress with respect to Fiedler’s contingency theory of leadership is not graphed. Peters et al. (1985) computed the average correlations (corrected for sampling error) of leadership effectiveness with the predictions of this theory. The absolute values of the correlations averaged .38 for the studies from which Fiedler derived this theory (approximately 1954-65); but for the studies conducted to validate this theory (approximately 1966-78), the absolute values of the correlations averaged .26. Thus, these correlations too have declined toward zero over time.

I/O psychologists have often argued that effects do not have to be absolutely large in order to produce meaningful economic consequences. (Zedeck and Cascjo, 1984; Schneider, 1985). For example, goal setting produced an average performance improvement of 21.6 per cent in the seventeen studies conducted from 1969 to 1979. If performance has a high economic value and goal setting costs very little, then goal setting would be well worth doing on the average. And because the smallest performance improvement was 2 per cent, the risk that goal setting would actually reduce performance seems very low (Cascjo, 1984; Schneider, 1985). For example, goal setting produced an average performance improvement of 21.6 per cent in the seventeen studies conducted from 1969 to 1979. If performance has a high economic value and goal setting costs very little, then goal setting would be well worth doing on the average. And because the smallest performance improvement was 2 per cent, the risk that goal setting would actually reduce performance seems very low.

This chapter, however, concerns theoretical development; and so the economic benefits of relations take secondary positions to identifying controllable moderators, to clarifying causal links, and to increasing effect sizes. In terms of theoretical development, it is striking that none of these effect sizes rose noticeably after the first years. This may have happened for any of five reasons, or more likely a combination of them:

(a)                Researchers may be clinging to incorrect theories despite disconfirming evidence (Staw, 1976). This would be more likely to happen where studies’ findings can be interpreted in diverse ways. Absolutely small correlations nurture such equivocality, by making it appear that random noise dominates any systematic relationships and that undiscovered or uninteresting influences exert much more effect than the known ones.

(b)               Researchers may be continuing to elaborate traditional methods of information gathering after these stop generating additional knowledge. For example, researchers developed very good leadership questionnaires during the early 1950s. Perhaps these early questionnaires picked up all the information about leadership that can be gathered via questionnaires. Thus, subsequent questionnaires may not have represented robust improvements; they may merely have mistaken sampling variations for generalities.

(c)                Most studies may fail to take advantage of the knowledge produced by the very best studies. As a sole explanation, this would be unlikely even in a world that does not reward exact replication, because research journals receive wide distribution and researchers can easily read reports of others’ projects. However, retrospective interpretations of random variations may obscure real knowledge in clouds of ad hoc rationalizations, so the consumers of research may have difficulty distinguishing real knowledge from false.

Because we wanted to examine as many studies as possible and studies of several kinds of relationships, we did not attempt to evaluate the methodological qualities of studies. Thus, we are using time as an implicit measure of improvement in methodology. But time may be a poor indicator of methodological quality if new studies do not learn much from the best studies. Reviewing studies of the relationship between formal planning and profitability, Starbuck (1985) remarked that the lowest correlations came in the studies that assessed planning and profitability most carefully and that obtained data from the most representative samples of firms.

(d)               Those studies obtaining the maximum effect sizes may do so for idiosyncratic or unknown reasons, and thus produce no generalizable knowledge. Researchers who provide too little information about studied sites, subjects, or situations make it difficult for others to build upon their findings (Orwin and Cordray, 1985); several authors have remarked that many studies report too little information to support meta-analyses (Steel and Ovalle, 1984; Iaffaldano and Muchinsky, 1985; Scott and Taylor, 1985). The tendencies of people, including scientists, to use confirmatory strategies mean that they attribute as much of the observed phenomena as possible to the relationships they expect to see (Snyder, 1981; Faust, 1984; Klayman and Ha, 1987). Very few studies report correlations above .5, so almost all studies leave much scope for misattribution and misinterpretation.

(e)                Humans’ characteristics and behaviors may actually change faster than psychologists’ theories or measures improve. Stagner (1982) argued that the context of I/O psychology has changed considerably over the years: the economy has shifted from production to service industries, jobs have evolved from heavy labor to cognitive functions, employees’ education levels have risen, and legal requirements have multiplied and changed, especially with respect to discrimination. For instance, Haire et al. (1966) found that managers’ years of education correlate with their ideas about proper leadership, and education alters subordinates’ concepts of proper leadership (Dreeben, 1968; Kunda, 1987). In the US, median educational levels have risen considerably, from 9.3 years in 1950 to 12.6 years in 1985 (Bureau of the Census, 1987). Haire et al. also attributed 25 per cent of the variance in managers’ leadership beliefs to national differences: so, as people move around, either between countries or within a large country, they break down the differences between regions and create new beliefs that intermingle beliefs that used to be distinct. Cummings and Schmidt (1972) conjectured plausibly that beliefs about proper leadership vary with industrialization; thus, the ongoing industrialization of the American south-east and southwest and the concomitant deindustrialization of the north-east are altering Americans’ responses to leadership questionnaires.

Whatever the reasons, the theories of I/O psychology explain very small fractions of observed phenomena, I/O psychology is making little positive progress, and it may actually be making some negative progress. Are these the kinds of results that science is supposed to produce?

 

PARADIGM CONSENSUS

Kuhn (1970) characterized scientific progress as a sequence of cycles, in which occasional brief spurts of innovation disrupt long periods of gradual incremental development. During the periods of incremental development, researchers employ generally accepted methods to explore the implications of widely accepted theories. The researchers supposedly see themselves as contributing small but lasting increments to accumulated stores of well-founded knowledge; they choose their fields because they accept the existing methods, substantive beliefs and values, and consequently they find satisfaction in incremental development within the existing frames of reference. Kuhn used the term paradigm to denote one of the models that guide such incremental developments. Paradigms, he (1970, p. 10) said, provide ‘models from which spring particular coherent traditions of scientific research’.

Thus, Kuhn defined paradigms, not by their common properties, but by their common effects. His book actually talks about 22 different kinds of paradigm (Masterman, 1970), which Kuhn placed into two broad categories: (a) a constellation of beliefs, values and techniques shared by a specific scientific community; and (b) an example of effective problem-solving that becomes an object of imitation by a specific scientific community.

I/O psychologists have traditionally focused on a particular set of variables: the nucleus of this set would be those examined in the previous section-job satisfaction, absenteeism, turnover, job performance and leadership. Also, we believe that substantial majorities of I/O psychologists would agree with some base-line propositions about human behavior. However, Campbell et al. (1982) found a lack of consensus among American I/O psychologists concerning substantive research goals. They asked them to suggest ‘the major research needs that should occupy us during the next 10-15 years (p. 155): 105 respondents contributed 146 suggestions, of which 106 were unique. Campbell et al. (1982, p. 71) inferred: ‘The field does not have very well worked out ideas about what it wants to do. There was relatively little consensus about the relative importance of substantive issues.’

 

Shared Beliefs, Values and Techniques

I/O psychologists do seem to have a paradigm of type (a)-shared beliefs, values, and techniques, but it would seem to be a methodological paradigm rather than a substantive one. For instance, Watkins et al.’s (1986) analysis of the 1984-85 citations in three I/O journals revealed that a methodologist, Frank L. Schmidt, has been by far the most cited author. In this methodological orientation, I/O psychology fits a general pattern: numerous authors have remarked on psychology’s methodological emphasis (Deese, 1972; Koch, 1981; Sanford, 1982). For instance, Brackbill and Korten (1970, p. 939) observed that psychological ‘reviewers tend to accept studies that are methodologically sound but uninteresting, while rejecting research problems that are of significance for science or society but for which faultless methodology can only be approximated.’ Bakan (1974) called psychology ‘methodolatrous’. Contrasting psychology’s development with that of physics, Kendler (1984, p. 9) argued that ‘Psychological revolutions have been primarily methodological in nature.’ Shames (1987, p. 264) characterized psychology as ‘the most fastidiously committed, among the scientific disciplines, to a socially dominated disciplinary matrix which is almost exclusively centred on method.’

I/O psychologists not only emphasize methodology, they exhibit strong consensus about methodology. Specifically, I/O psychologists speak and act as if they believe they should use questionnaires, emphasize statistical hypothesis tests, and raise the validity and reliability of measures. Among others, Campbell (1982, p. 699) expressed the opinion that 110 psychologists have been relying too much on ‘the self-report questionnaire, statistical hypothesis testing, and multivariate analytic methods at the expense of problem generation and sound measurement’. As Campbell implied, talk about reliability and especially validity tends to be lip-service: almost always, measurements of reliability are self-reflexive facades and no direct means even exist to assess validity. I/O psychologists are so enamored of statistical hypothesis tests that they often make them when they are inappropriate, for instance when the data are not samples but entire sub-populations, such as all the employees of one firm, or all of the members of two departments. Webb et al. (1966) deplored an overdependence on interviews and questionnaires, but I/O psychologists use interviews much less often than questionnaires (Stone, 1978).

An emphasis on methodology also characterizes the social sciences at large. Garvey et al. (1970) discovered that editorial processes in the social sciences place greater emphasis on statistical procedures and on methodology in general than do those in the physical sciences; and Lindsey and Lindsey (1978) factor analysed social science editors’ criteria for evaluating manuscripts and found that a quantitative-methodological orientation arose as the first factor. Yet, other social sciences may place somewhat less emphasis on methodology than does I/O psychology. For instance, Kerr et al. (1977) found little evidence that methodological criteria strongly influence the editorial decisions by management and social science journals. According to Kerr et al., the most influential methodological criterion is statistical insignificance, and the editors of three psychological journals express much stronger negative reactions to insignificant findings than do editors of other journals.

Mitchell et al. (1985) surveyed 139 members of the editorial boards of five journals that publish work related to organizational behavior, and received responses from 99 editors. Table 2 summarizes some of these editors’ responses.[3] The average editor said that ‘importance’ received more weight than other criteria; that methodology and logic were given nearly equal weights, and that presentation carried much less weight. When asked to assign weights among three aspects of ‘importance’, most editors said that scientific contribution received much more weight than practical utility or readers’ probable interest in the topic. Also, they assigned nearly equal weights among three aspects of methodology, but gave somewhat more weight to design.

Table 2 compares the editors of two specialized I/O journals-Journal of Applied Psychology (JAP) and Organizational Behavior and Human Decision Processes (OBHDP)-with the editors of three more general management journals- Academy of Management Journal (AMJ), Academy of Management Review (AMR) and Administrative Science Quarterly (ASQ). Contrary to our expectations, the average editor of the two I/O journals said that he or she allotted more weight to ‘importance’ and less weight to methodology than did the average editor of the three management journals. It did not surprise us that the average editor of the I/O journals gave less weight to the presentation than did the average editor of the management journals. Among aspects of methodology, the average I/O editor placed slightly more weight on design and less on measurement than did the average management editor. When assessing ‘importance’, the average I/O editor said that he or she gave distinctly less weight to readers’ probable interest in a topic and more weight to practical utility than did the average management editor. Thus, the editors of I/O journals may be using practical utility to make up for I/O psychologists’ lack of consensus concerning substantive research goals: if readers disagree about what is interesting, it makes no sense to take account of their preferences (Campbell et al., 1982).

Table 2 – Review Article Sources

 

 

 

 

Relative weights among four general criteria

 

All five journals

JAP and OBHDP

AMJ, AMR, and ASQ

‘Importance’

35

38

34

Methodology

26

25

27

Logic

24

24

24

Presentation

15

13

16

 

 

 

 

Relative weights among three aspects of importance

 

All five journals

JAP and OBHDP

AMJ, AMR, and ASQ

Scientific contribution

53

54

53

Practical utility

28

31

26

Readers’ interest in topic

19

14

21

 

 

 

 

Relative weights among three aspects of methodology

 

All five journals

JAP and OBHDP

AMJ, AMR, and ASQ

Design

38

39

37

Measurement

31

30

32

Analysis

31

31

31

 

 

Editors’ stated priority of ‘importance’ over methodology contrasts with the widespread perception that psychology journals emphasize methodology at the expense of substantive importance. Does this contrast imply that the actual behaviors of journal editors diverge from their espoused values? Not necessarily. If nearly all of the manuscripts submitted to journals use accepted methods, editors would have little need to emphasize methodology. And if, like I/O psychologists in general, editors disagree about the substantive goals of I/O research, editors’ efforts to emphasize ‘importance’ would work at cross-purposes and have little net effect. Furthermore, editors would have restricted opportunities to express their opinions about what constitutes scientific contribution or practical utility if most of the submitted manuscripts pursue traditional topics and few manuscripts actually address ‘research problems that are of significance for science or society’.

 

Objects of Imitation

I/O psychology may also have a few methodological and substantive paradigms of type (b) examples that become objects of imitation. For instance, Griffin (1987, pp. 82-3) observed:

The [Hackman and Oldham] job characteristics theory was one of the most widely studied and debated models in the entire field during the late 1970s. Perhaps the reasons behind its widespread popularity are that it provided an academically sound model, a packaged and easily used diagnostic instrument, a set of practitioner-oriented implementation guidelines, and an initial body of empirical support, all within a relatively narrow span of time. Interpretations of the empirical research pertaining to the theory have ranged from inferring positive to mixed to little support for its validity. (References omitted.)

Watkins et al. (1986) too found evidence of interest in Hackman and Oldham’s (1980) job-characteristics theory: five of the twelve articles that were most frequently cited by I/O psychologists during 1984-85 were writings about this theory, including Roberts and Glick’s (1981) critique of its validity. Although its validity evokes controversy, Hackman and Oldham’s theory seems to be the most prominent current model for imitation. As well, the citation frequencies obtained by Watkins et al. (1986), together with nominations of important theories collected by Miner (1984), suggest that two additional theories attract considerable admiration: Katz and Kahn’s (1978) open-systems theory and Locke’s (1968) goal-setting theory. It is hard to see what is common among these three theories that would explain their roles as paradigms; open-systems theory, in particular, is much less operational than job-characteristics theory, and it is more a point of view than a set of propositions that could be confirmed or disconfirmed.

To evaluate more concretely the paradigm consensus among I/O psychologists, we obtained several indicators that others have claimed relate to paradigm consensus.

 

Measures

As indicators of paradigm consensus, investigators have used: the ages of references, the percentages of references to the same journal, the numbers of references per article, and the rejection rates of journals.

Kuhn proposed that paradigm consensus can be evaluated through literature references. He hypothesized that during normal-science periods, references focus upon older, seminal works; and so the numbers and types of references indicate connectedness to previous research (Moravcsik and Murgesan, 1975). First, in a field with high paradigm consensus, writers should cite the key works forming the basis for that field (Small, 1980). Alternatively, a field with a high proportion of recent references exhibits a high degree of updating, and so has little paradigm consensus. One measure of this concept is the Citing Half-Life, which shows the median age of the references in a journal. Second, referencing to the same journal should reflect an interaction with research in the same domain, so higher referencing to the same journal should imply higher paradigm consensus. Third, since references reflect awareness of previous research, a field with high paradigm consensus should have a high average number of references per article (Summers, 1979).

Journals constitute the accepted communication networks for transmitting knowledge in psychology (Price, 1970; Pinski and Narin, 1979), and high paradigm consensus means agreement about what research deserves publication. Zuckerman and Merton (1971) said that the humanities demonstrate their pre-paradigm states through very high rejection rates by journals, whereas the social sciences exhibit their low paradigm consensus through high rejection rates, and the physical sciences show their high paradigm consensus through low rejection rates. That is, paradigm consensus supposedly enables physical scientists to predict reliably whether their manuscripts are likely to be accepted for publication, and so they simply do not submit manuscripts that have little chance of publication.

 

Results

Based partly on work by Sharplin and Mabry (1985), Salancik (1986) identified 24 ‘organizational social science journals’. He divided these into five groups that cite one another frequently; the group that Salancik labeled Applied corresponds closely to I/O psychology.[4] Figure 5 compares these groups with respect to citing half-lives, references to the same journal, and numbers of references per article. The SSCI Journal Citation Reports (Social Science Citation Index, Garfield, 198 1-84b) provided these three measures, although a few data were missing. We use four-year averages in order to smooth the effects of changing editorial policies and of the publication of seminal works (Blackburn and Mitchell, 1981). Figure 5 also includes comparable data for three fields that do not qualify as ‘organizational social science’-chemistry, physics, and management information systems (MIS). [5] Data concerning chemistry, physics and MIS hold special interest because they are generally believed to be making rapid progress; MIS may indeed be in a pre-paradigm state.

 

Seven of the eight groups of journals have average citing half-lives longer than five years, the figure that Line and Sandison (1974) proposed as signaling a high degree of updating. Only MIS journals have a citing half-life below five years; this field is both quite new and changing with extreme rapidity. I/O psychologists update references at the same pace as chemists and physicists, and only slightly faster than other psychologists and OB researchers.

Garfield (1972) found that referencing to the same journal averages around 20 per cent across diverse fields, and chemists and physicists match this average. All five groups of ‘organizational social science’ journals average below 20 per cent references to the same journal, so these social scientists do not focus publications in specific journals to the same degree as physical scientists, although the OB researchers come close to the physical-science pattern. The I/O psychologists, however, average less than 10 per cent references to the same journal, so they focus publications even less than most social scientists. MIS again has a much lower percentage than the other fields.

Years ago, Price (1965) and Line and Sandison (1974) said 15-20 references per article indicated strong interaction with previous research. Because the numbers of references have been increasing in all fields (Summers, 1979), strong interaction probably implies 25-35 references per article today. I/O psychologists use numbers of references that fall within this range, and that look much like the numbers for chemists, physicists and other psychologists.

We could not find rejection rates for management, organizations and sociology journals, but Jackson (1986) and the American Psychological Association (1986) published rejection rates for psychology journals during 1985. In that year, I/O psychology journals rejected 82.5 per cent of the manuscripts, which is near the 84.3 per cent average for other psychology journals. By contrast, Zuckerman and Merton (1971) reported that the rejection rates for chemistry and physics journals were 31 and 24 per cent respectively. Similarly, Garvey et al. (1970) observed higher rejection rates and higher rates of multiple rejections in the social sciences than in the physical sciences. However, these differences in rejection rates may reflect the funding and organization of research more than its quality or substance: American physical scientists receive much more financial support than do social scientists, most grants for physical science research go to rather large teams, and physical scientists normally replicate each others’ findings. Thus, most physical science research is evaluated in the process of awarding grants as well as in the editorial process, teams evaluate and revise their research reports internally before submitting them to journals, and researchers have incentives to replicate their own findings before they publish them. The conciseness of physical science articles reduces the costs of publishing them. Also, since the mid-1950s, physical science journals have asked authors to pay voluntary page charges, and authors have characteristically drawn upon research grants to pay these charges.

Peters and Ceci (1982) demonstrated for psychology in general that a lack of substantive consensus shows up in review criteria. They chose twelve articles that had been published in psychology journals, changed the authors’ names, and resubmitted the articles to the same journals that had published them: The resubmissions were evaluated by 38 reviewers. Eight per cent of the reviewers detected that they had received resubmissions, which terminated review of three of the articles. The remaining nine articles completed the review process, and eight of these were rejected. The reviewers stated mainly methodological reasons rather than substantive ones for rejecting articles, but Mahoney’s (1977) study suggests that reviewers use methodological reasons to justify rejection of manuscripts that violate the reviewers’ substantive beliefs.

Figure 6 graphs changes in four indicators from 1957 to 1984 for the Journal of Applied Psychology and, where possible, for other I/O psychology journals.[6] Two of the indicators in Figure 6 have remained quite constant; one indicator has risen noticeably; and one has dropped noticeably. According to the writers on paradigm consensus, all four of these indicators should rise

as consensus increases. If these indicators actually do measure paradigm consensus, I/O psychology has not been developing distinctly more paradigm consensus over the last three decades.