In general, the role of WFI as a mediator has been suggested by many studies (Frone et al., 1992; Geurts et al., 1999; Kinnunen and Mauno, 1998; Montgomery et al.,

40 In general, the role of WFI as a mediator has been suggested by many studies (Frone et al., 1992; Geurts et al., 1999; Kinnunen and Mauno, 1998; Montgomery et al.,

2003, Parasuraman et al., 1996; Stephens et al., 1997). However, the link between emotions, WFI and burnout has not been clearly indicated to date. Such a situation can

be explained somewhat by the fact that while there is broad acceptance that employers have the right to ask employees to perform physical behaviors or engage in cognitive activities, emotional behavior might be outside what employers can reasonably demand (Briner and Totterdell, 2002). Therefore, it seems more likely that emotional demands will spillover from work to family. Empirically, this is indicated in the work of Schulz et al. (2004), which demonstrated negative emotional spillover on a daily basis. Consistently, Maslach (1982), pays special attention to making the transition from work to home by introducing the notion of “decompression”. Maslach argues that people working in an emotional and demanding environment need to “decompress” before moving into the normal pressure of their private life. Additionally, Grandey (2000) views emotional labor and display rules as a proximal predictor of stress, and this is consistent with the idea that people bring the emotional stress from work to home. Additionally, the idea that WFI may serve as a mediator between emotional labor and burnout is indicated by the study of Wharton and Erickson (1995), who found that women’s well-being on the job was threatened more by their involvement in family emotion work than by their actual performance of emotional labor.

In the present research, it follows logically that WFI is an important variable that mediates the ability of individuals to “decompress” from the work domain to the home domain.

H3a. WFI will mediate the relationship between showing positive emotions and

burnout/psychosomatic complaints. H3b. WFI will mediate the relationship between hiding negative emotions and

burnout/psychosomatic complaints. H3c. WFI will mediate the relationship between surface acting and

burnout/psychosomatic complaints. H3d. WFI will mediate the relationship between emotion deep acting and

cynicism/psychosomatic complaints. All the afore-mentioned analyses were carried controlling for sex, age and having

children. Traditionally, no systematic differences have been found with regard to WFI and sex (Geurts and Demerouti, 2003), but there is a need to control for demographic variables such as sex and age with regard to burnout and psychosomatic complaints. Finally, controlling for a more structural variable such as the presence of children allows us to evaluate whether adds WFI any variance to the prediction of both burnout and psychosomatic compliants, beyond the need to attend to parenting demands.

Method

Work-family

Participants and procedures

interference

The study sample consisted of employees from the Dutch Governmental Organization.

A total of 500 employees were contacted to participate in this study. 174 employees returned completed questionnaires. Given the nature of WFI, only employees who lived with someone and or had children living at home were included in the study (N ¼ 155, response rate ¼31 percent). This compares favourably with the average response rate

for published research in the managerial and behavioural sciences (55.6 percent overall,

36.1 percent for studies concerning top managers or representatives, see Baruch, 1999 for a review). Participants ranged in age from 22 to 64 years of age (M ¼ 44, SD ¼ 9:9),

53 percent were men. The majority of people (83 percent) lived with a partner, and 47 percent of people had a supervisory position. A total of 63 percent of respondents reported having a partner with a paid function and 50 percent of the respondents had children living at home. No statistical differences were found between men and women with regard to any of the study variables, even when respondents with working spouses were compared with respondents without working spouses. Confidentiality prevented direct comparison between the sample age and gender breakdown against the total sample to assess demographic differences between responders and non-responders. However, the organization being studied carried out the afore-mentioned analysis and concluded that no statistically significant differences existed.

Measures Work-family interference (WFI). WFI was measured using the scale recommended by the Sloan Work-Family Researchers Electronic Network (MacDermid, 2000). The WFI scale is a nine-item work-home interference measure developed by virtual think tank comprising recognised experts in the field of work/life. The items in the scale represent

a synopsis of the best published scales in the field, such as the scales of Netermeyer et al. (1996) and Gutek et al. (1991). All items are scored on a five-point frequency scale ranging from “1” (never) to “5”(always). The scale has been used in previous research with Greek health care professionals (Montgomery et al., 2005; a ¼ 0:90 for both 180 doctors and 84 nurses). In the present study, internal consistency was good (a ¼ 0:89).

Burnout. The Dutch version of the Maslach Burnout Inventory: General Survey (MBI-GS) was used to assess burnout (Schaufeli et al., 1996). Two sub-scales of the MBI-GS were assessed: Exhaustion (five items;, e.g. “I feel used up at the end of the workday”, a ¼ 0:87), and Cynicism (four items;, e.g. “I have become less enthusiastic about my work”, a ¼ 0:79). All items are scored on a seven-point frequency rating scale ranging from “0” (never) to “6” (daily). High scores on the exhaustion and cynicism sub-scales are indicative of burnout.

Psychosomatic health. Psychosomatic health complaints were measured with a Dutch questionnaire on subjective health (VOEG: Vragenlijst Onderzoek Ervaren Gezondheid (Questionnaire on Experienced Health)) developed by Dirken (1969). In this study, the 13-item version was used (Jansen and Sikkel, 1981), explaining 95 percent of the variance in the 21-item version. All items were scored on a four-point scale ranging from “1” (seldom or never) to “4” (often). The VOEG consists of items asking whether one suffers from a range of psychosomatic complaints, such as headaches, backache, an upset stomach, fatigue, dizziness, and pain in the chest or heart area (a ¼ 0:84). The

JMP

13-item VOEG is used by the Dutch census office for monitoring psychosomatic health

in the Dutch population. Job focused emotional labor: perceived display rules. The Emotion Work Requirements Scale (Best et al., 1997), a five-point scale (1 ¼ not at all, 5 ¼ always required), tapped the level to which employees reported that their emotional displays were controlled by their jobs. Items ask the extent to which the employee is required to

42 show (or hide) emotion in order to be effective on the job. These items form two factors (Grandey, 1998; Jones and Best, 1995), which taps the requirement to display positive emotions (four items, a ¼ 0:78) and hide negative emotions (two items, a ¼ 0:54). The alpha coefficient for the second scale was poor, but it was used and calculated on theoretical grounds.

Employee-focused emotional labor. Items measuring surface and deep acting came from the Emotional Labor Scale (Brotheridge and Lee, 1998; Brotheridge and Grandey, 2002). These ideas tapped the ideas of regulating emotions by hiding feelings, faking feelings, and modifying feelings as part of the work role. Three items measures surface acting (sample item: “Resist expressing my true feelings”) and three items measured deep acting (sample item: “Make an effort to actually feel the emotions that I need to display to others”). Items were scored on a five-point scale (1 ¼ not at all, 5 ¼ always required) and alpha’s were appropriate (a ¼ 0:74, a ¼ 0:90, respectively).

Results Confirmatory factor analyses and analysis strategy As a prerequisite to addressing the central hypotheses in this study, we examined the factor structures of the WFI and emotional labor scales using confirmatory factor analysis. To examine the appropriateness of computing uni-dimensional scores for each of the major constructs included in the study, each scale was submitted to a principal components analysis (results can be obtained from the first author). Examination of both the number of eigenvalues greater than 1 and factor loadings supported a decision to treat the hypothesized scales as postulated.

Descriptive statistics Table I provides the means, standard deviations and correlation coefficients of the study variables. As expected, WFI is positively correlated with exhaustion (r ¼ 0:64,

Variable

Mean SD

1 WFI

2 Surface acting

3 Deep acting

0.26* 4 Display rules: show positive 2.98 0.64 0.29* 0.41* 0.46*

5 Display rules: hide negative

Table I. 1.21 0.64* 0.28* 0.09 0.18** 0.38*

6 Exhaustion

2.44 0.11 Means, standard

deviations and 0.39 0.51* 0.33* 0.07 0.17** 0.29* 0.61* 0.53**

8 Psychosomatic complaints

correlations Notes: * p , 0:01, ** p , 0:05 correlations Notes: * p , 0:01, ** p , 0:05

Work-family

p , 0:01). The four emotional labor variables were related (r # 0:26 $ 0:57, p , 0:01).

interference

H1. Emotional display rules H1a and H1b proposed relationships between emotional display rules and burnout and psychosomatic complaints. Table I provides the zero-order correlations. With regard to burnout, the display rule to hide negative emotions correlated significantly with

exhaustion (r ¼ 0:38, p , 0:01) and cynicism (r ¼ 0:36, p , 0:01), while the display rule to show positive emotions was significantly related to exhaustion (r ¼ 0:18, p , 0:05), but not cynicism. With regard to psychosomatic complaints, both hiding negative emotions (r ¼ 0:29, p , 0:01) and showing positive emotions (r ¼ 0:17, p , 0:05) were related.

H2. Emotion focused labor Table I provides the zero-order correlations. With regard to the second set of hypotheses, only H2a and H2c were supported. Surface acting was significantly related to exhaustion (r ¼ 0:28, p , 0:01), cynicism (r ¼ 0:37, p , 0:01) and psychosomatic complaints (r ¼ 0:33, p , 0:01). H2b and H2d were not supported.

H3. WFI as a mediator Table II shows the results of the mediation analyses, carried out in line with the methodology suggested by Baron and Kenny (1986). Accordingly, a prerequisite for mediation is that the predictor, mediator and dependant variables must be significantly related. Mediation is demonstrated by a reduction in the impact of the predictor on the

dependant measure after controlling for the mediator (see column b t in Table II). In addition, the statistical significance of the mediation was calculated using the Sobel Test (Preacher and Leonardelli, 2001).

Sex, age and having children have been entered as control variables. Using the methodology recommended by Eckenrode et al. (1995) reduction of the coefficient to

Step 2 Variable

Sex (male ¼ 1, female ¼ 0)

Having children (1 ¼ yes, 0 ¼ no)

Surface acting

0.19* Deep acting

Display: show positive

Display: hide negative

0.49* R 2 2 0.13 0.47 0.19 0.26 0.13 0.35 Table II. R Change

0.34 0.07 0.22 Mediational analysis of F Change

50.66* WFI, between emotional

labor and Notes: * p , 0:01; ** p , 0:05; b

, final beta weight after burnout/psychosomatic WFI entered

i , initial beta weight when first entered; b t

complaints (N ¼ 155)

JMP

zero equals full mediation and reduction of the coefficient is equivalent to partial

mediation. This is consistent with the view of Baron and Kenny (1986) who suggest that as most areas of psychology have multiple causes, a more realistic goal is to seek mediators that significantly reduce the relationship between the predictor and dependant measure.

No support was found for H3a. With regard to H3b, Table II indicates that WFI

44 partially mediated the relationship between hiding negative emotions and exhaustion

(from b ¼ 0:35 to b ¼ 0:32, Sobel test, z ¼ 3:01, p , 0:01), and the relationship between hiding negative emotions and cynicism (from b ¼ 0:31 to b ¼ 0:29, Sobel test, z¼2 :28, p , 0:01). For H3c, WFI was found to partially mediate between surface acting and both cynicism (from b ¼ 0:27 to b ¼ 0:22, Sobel test, z ¼ 2:33, p , 0:05) and psychosomatic complaints (from b ¼ 0:29 to b ¼ 0:19, Sobel test, z ¼ 3:02, p , 0:01). No support was found for H3d.

Discussion The current research made the following contributions:

it expanded on our knowledge of the relationship between emotional labor, WFI and burnout;

it examined the mediational role of WFI between emotional labor and burnout/psychosomatic complaints; and

it provided data on a rarely studied phenomena within governmental workers. In terms of the first hypothesis, the need for employees to hide negative emotions was

consistently related to all three outcomes. The results in the present study were in contradiction to the Brotheridge and Grandey (2002) study, which found that the hiding of negative emotions was only related to the personal accomplishment dimension of the MBI. This difference in findings may be a reflection of the different samples studied, with the Brotheridge and Grandey (2002) focusing on a variety of different occupations and the present study looking at “ceremonial” workers, who operate in a highly idiosyncratic and highly formalized environment. Additionally, in the present study, showing positive emotions was related to exhaustion and psychosomatic complaints, but the correlations were weak. Taking these results together indicates that the need to regulate the display of emotions was most probably experienced at a proximal level.

In terms of the second hypothesis, surface acting was significantly related to all three outcomes. Such a result is consistent with the research of Brotheridge and Grandey (2002), who found surface acting significantly related to emotional exhaustion. Additionally, this result is consistent with the idea that suppressing anger can be costly to physiological and immune functioning (Gross and Levenson, 1997; Pennebaker and Beall, 1986). The consistent relationship with all three outcomes confirms that idea that surface acting represents a way of detaching from others while at work. In contrast, deep acting was not related to any of the outcomes. This is consistent with work of Brotheridge and Grandey (2002), who did find that surface acting was related to emotional exhaustion beyond deep acting. In terms of the occupational group studied, this result makes sense as compared to other occupational groups (e.g. police officers) the need to use impression management in a formal setting

(e.g. government offices) is more important than the need to internalize emotionally

Work-family

difficult situations (e.g. a police officer who needs to internalize difficult interactions).

interference

Limited support was found for the third hypothesis. This result is consistent with the considerable body of literature denoting the role of WFI as a mediator (see Geurts and Demerouti, 2003 for a review). The mediating role found for both surface acting and hiding negative emotions provides a new perspective regarding the way that work can spill-over into non-work. This suggests that for employees inhibited emotional

states are more likely to be decompressed within the family environment, which presents more opportunities for emotional expression, relative to their workplace. This is consistent with recent research that has found a positive association between emotional labor and WFI (Montgomery et al., 2005), and with the work of Wharton and Erickson (1995), who suggest that emotional demanding environments (at both work and home) contribute to WFI.

Emotional labor as a demand Until recently, most studies concerning the relationship between job demands and job stress have focused on quantitative demands (e.g. workload). One of the most prominent models in this area, Karasek’s (1979) demand-control model has received critical attention with regard to the possible multifaceted nature of job demands. Different types of job demands have been rarely examined within the framework of the model (with the exception of some examples; De Jonge et al., 1999; So¨derfeldt et al., 1996, 1997). The aforementioned evidence is consistent with recent reviews in the WFI literature that has called on researchers to identify more specific antecedents of WFI (Geurts and Demerouti, 2003). The need to evaluate a range of demands is prompted by the fact that the nature of work is changing and some professions have a specifically emotional component. Such a contention is supported by researchers who have identified three types of emotional displays required by jobs: integrative, differentiating and masking (Jones and Best, 1995; Wharton and Erickson, 1993). Such work prompts us to more carefully consider emotional labor as a significant job demand.

Limitations With regard to the assessment of mediation, in the present study all variables measured (emotional labor, WFI, burnout and psychosomatic complaints) are in fact appraised, and thus measured subjectively. Therefore, we should keep in mind that it is difficult to demonstrate a mediational effect in time, as suggested by the S-O-R model of Woodworth (1928). Additionally, cross-sectional data prevents us from assessing causal relationships. For example, it is possible that surface acting may cause individuals to suffer from burnout and increased psychosomatic complaints and thus this may be associated with higher levels of WFI.

The present study did not assess affectivity, which has been identified as an important component of emotional regulation (Morris and Feldman, 1996). However, the relationship between emotional labor and affectivity is not entirely clear, with Brotheridge and Grandey (2002) finding significant (but weak) relationships between NA and hiding negative emotions and surface acting, but not with either displaying positive emotions or deep acting. Within the wider literature on the stressor-strain relationship, some researchers call for the inclusion of NA (e.g. Watson and Clark,

JMP

1984), while some are opposed to it (e.g. Moyle, 1995; Schonfield, 1996). Indeed, Dollard

and Winefield (1998) even warn against its inclusion as this may lead to an under-estimation of the impact of the work environment on strain.

Ashforth and Humphrey (1993) have suggested that employees who strongly identify with their job role will feel more authentic in complying with display rules and hence are likely to find it less effortful to display the required emotion. The present

46 study did not assess level of identification with job, which may have been an important individual difference.

Finally, the present study is cross-sectional and thus the postulated relationships cannot be interpreted causally. Longitudinal studies and/or quasi-experimental research designs are needed to further validate the hypothesised causality of the relationships.

Practical implications for both employees and managers The practical importance of WFI is highlighted by a national US study (Bond et al., 1998), which found that 85 percent of employees have some day-to-day family responsibility, and virtually identical proportions of men and women report WFI problems. Additionally, this research suggests that emotional labor is an important antecedent of both WFI and burnout. In terms of the literature on job demands, the results highlight the need to recognise that emotions and job-related emotional regulation (e.g. customer interaction or management) is an increasingly important part of work cultures. In terms of training and/or interventions, the need for employees to decompress from their job before going home is particularly important in jobs with high demands for emotional displays (e.g. civil servants in jobs that are highly ceremonial). Such decompression can be aided by training, and indeed Smith (1999) argues that the skill involved in displaying emotions be should valued in exactly the same way as any other skill. Finally, Briner and Totterdell (2002) point out that interventions focused on how employees feel (e.g. anger or contempt) is more likely to focus interventions more precisely than knowing they are “stressed”.

There is considerable evidence demonstrating that managers and supervisors can influence the emotional experiences of their employees (Ashkanasy, 2003; McColl-Kennedy and Anderson, 2002). Managers can help employees to internalize their roles and reduce the need for employees to feel compelled to fake or hide genuine emotion (Ashforth and Humphrey, 1993). This is consistent with research showing that leaders who express a clear vision and positive expectations for performance affected employees’ identificantion with their work (Bono and Judge, 2003). A recent paper by Vinson et al. (2005) indicates that employees experience fewer positive emotions when interacting with their supervisors, except when interacting with supervisors rated high on transformational leadership style. All of the aforementioned evidence points to the fact managerial and supervisory training should include awareness regarding their influence on employee’s emotional experiences.

References Aryee, S. (1992), “Antecedents and outcomes of work-family conflict among married professional

women: evidence from Singapore”, Human Relations, Vol. 45, pp. 813-37. Ashforth, B.E. and Humphrey, R.H. (1993), “Emotional labor in service roles: the influence of

identity”, Academy of Management Review, Vol. 18, pp. 88-115.

Ashkanasy, N. (2003), “Emotions in organizations: a multilevel perspective”, in Danasereau, F.

Work-family

and Yammarino, F.J. (Eds), Research in Multilevel Issues, 2: Multi-level Issues in Organizational Behavior and Strategy, Elsevier Science, Oxford, pp. 9-54.

interference

Baron, R.M. and Kenny, D.A. (1986), “The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations”, Journal of Personality and Social Psychology, Vol. 51, pp. 1173-82.

Barsade, S.G., Brief, A.P. and Spataro, S.E. (2003), “The affective revolution in organizational

behavior: the emergence of a paradigm”, in Greenberg, J. (Ed.), Organizational Behavior: The State of the Science, Lawrence Erlbaum Publishers, Hillsdale, NJ, pp. 3-52.

Baruch, Y. (1999), “Response rate in academic studies: a comparative analysis”, Human Relations, Vol. 52, pp. 421-38.

Best, R.G., Downey, R.G. and Jones, R.G. (1997), “Incumbent perceptions of emotional work requirements”, paper presented at the 12th annual conference of the Society for Industrial Organizational Psychology, St Louis, MO, April.

Bond, J.T., Galinsky, E. and Swanberg, J.E. (1998), The 1997 National Study of the Changing Workforce, Families and Work Institute, New York, NY.

Bono, J.E. and Judge, T.A. (2003), “Self-concordance at work: toward understanding the motivational effects of transformational leader”, Academy of Management Journal, Vol. 46, pp. 554-71.

Bono, J.E. and Vey, M.A. (2004), “Toward understanding emotional management at work: a quantitative review of emotional labor research”, in Ashkanasy, N. and Hartel, C. (Eds), Understanding Emotions in Organizational Behavior, Erlbaum, Mahwah, NJ, pp. 212-33.

Brief, A.P. and Weiss, H.M. (2002), “Organizational behavior: affect in the workplace”, Annual Review of Psychology, Vol. 53, pp. 279-307.

Briner, R.B. and Totterdell, P. (2002), “The experience, expression and management of emotion at work”, in Warr, P. (Ed.), Psychology at Work, Penguin Books, Harmondsworth, pp. 229-52.

Brotheridge, C.M. (1999), “Unwrapping the black box: a test of why emotional labor may lead to emotional exhaustion”, in Miller, D. (Ed.), Proceedings of the Administrative Sciences Association of Canada (Organizational Behaviour Division) Saint John, New Brunswick, pp. 11-20.

Brotheridge, C.M. and Grandey, A.A. (2002), “Emotional labor and burnout: comparing two perspectives of ‘people work’”, Journal of Vocational Behavior, Vol. 60, pp. 17-39.

Brotheridge, C.M. and Lee, R.T. (1998), “On the dimensionality of emotional labor: development and validation of the emotional labor scale”, paper presented at the first conference on Emotions in Organizational Life, San Diego, CA.

Cordes, C.L. and Dougerthy, T.W. (1993), “A review and integration of research on job burnout”, Academy of Management Review, Vol. 18, pp. 621-56.

De Jonge, J., Mulder, M.J.G.P. and Nijhuis, F.J.N. (1999), “The incorporation of different demand concepts in the job demand-control model: effects on health care professionals”, Social Science and Medicine, Vol. 48, pp. 1149-60.

Demerouti, E., Bakker, A.B., Nachreiner, F. and Schaufeli, W.B. (2001), “The job demands-resources model of burnout”, Journal of Applied Psychology, Vol. 86, pp. 499-512.

Dirken, J.M. (1969), Arbeid en Stress (Work and Stress), Wolters Noordhoff, Groningen. Dollard, M.F. and Winefield, A.H. (1998), “A test of the demands-control/support model of work

stress in correctional officiers”, Journal of Occupational Health Psychology, Vol. 3, pp. 243-64.

JMP

Eckenrode, J., Rowe, E., Laird, M. and Brathwaite, J. (1995), “Mobility as a mediator of the effects

21,1 of child maltreatment on academic performance”, Child Development, Vol. 66, pp. 1130-42.

Erickson, R.J. and Wharton, A.S. (1997), “Inauthenticity and depression: assessing the consequences of interactive service work”, Work and Occupations, Vol. 24, pp. 188-213.

Frone, M.R., Russell, M. and Cooper, M.L. (1992), “Antecedents and outcomes of work-family conflict: testing a model of the work-family interface”, Journal of Applied Psychology,

48 Vol. 77, pp. 65-78. Geurts, S. and Demerouti, E. (2003), “Work/non work interface: a review of theories and findings”, in Schabraq, M.J., Winnbust, J.A.M. and Cooper, C.L. (Eds), The Handbook of Work and Health Psychology, John Wiley & Sons Ltd, New York, NY, pp. 279-312.

Geurts, S., Rutte, C. and Peeters, M. (1999), “Antecedents and consequences of work-home interference among medical residents”, Social Science & Medicine, Vol. 48, pp. 1135-48.

Grandey, A. (1998), “Emotional labor: a concept and its correlates”, paper presented at the First Conference on Emotions in Organizational Life, San Diego, CA, August.

Grandey, A. (2000), “Emotion regulation in the workplace: a new way to conceptualise emotional labor”, Journal of Occupational Health Psychology, Vol. 5, pp. 95-110.

Green, D.E., Walkey, F.H. and Taylor, A.J.W. (1991), “The tree-factor structure of the Maslach Burnout Inventory”, Journal of Social Behavior & Personality, Vol. 6, pp. 453-72.

Greenhaus, J.H. and Beutell, N.J. (1985), “Sources of conflict between work and family roles”,

Academy of Management Review, Vol. 10, pp. 76-88. Gross, J. (1998a), “Antecedent – and response-focused emotion regulation: divergent

consequences for experience, expression and physiology”, Journal of Personality and Social Psychology, Vol. 74, pp. 224-37.

Gross, J. (1998b), “The emerging field of emotion regulation: an integrative review”, Review of

General Psychology, Vol. 2, pp. 271-99. Gross, J. and Levenson, R. (1997), “Hiding feelings: the acute effects of inhibiting negative and

positive emotions”, Journal of Abnormal Psychology, Vol. 106, pp. 95-103. Gutek, B.A., Searle, S. and Klepa, L. (1991), “Rational versus gender role explanations for

work-family conflict”, Journal of Applied Psychology, Vol. 76, pp. 560-8. Hochschild, A.R. (1979), “Emotion work, feeling rules, and social structure”, American Journal of

Sociology, Vol. 85, pp. 551-75. Hochschild, A.R. (1983), The Managed Heart, University of California Press, Berkeley, CA. Holmbeck, G.N. (1997), “Toward terminological, conceptual, and statistical clarity in the study of

mediators and moderators: examples from child-clinical and paediatric psychology literatures”, Journal of Consulting and Clinical Psychology, Vol. 65, pp. 599-610.

James, N. (1989), “Emotional labor: skill and work in the social regulation of feelings”,

Sociological Review, Vol. 37, pp. 15-42. Jansen, M.E. and Sikkel, D. (1981), “Verkorte versies van de VOEG-schaal (Shortened version of

the VOEG questionnaire)”, Gedrag & Samenleving, Vol. 2, pp. 78-82. Jones, R.G. and Best, R.G. (1995), “An examination of the impact of emotional work requirements

on individual and organizations”, paper presented at the Annual Convention of the Academy of Management, Vancouver, August.

Karasek, R.A. (1979), “Job demands, job decision latitude and mental strain: implications for job redesign”, Administrative Science Quarterly, Vol. 24, pp. 285-308.

Kinnunen, U. and Mauno, S. (1998), “Antecedents and outcomes of work-family conflict among employed women and men in Finland”, Human Relations, Vol. 51, pp. 157-77.

Lee, R.T. and Ashforth, B.T. (1996), “A meta-analytic examination of the correlates of the three

Work-family

dimensions of job burnout”, Journal of Applied Psychology, Vol. 81, pp. 123-33.

interference

Leiter, M.P. (1993), “Burnout as a developmental process: consideration of models”, in Schaufeli, W.B., Maslach, C. and Marek, T. (Eds), Professional Burnout, Taylor & Francis, Washington DC, pp. 237-50.

MacDermid, S. (2000), “The measurement of work/life tension: recommendations of a virtual think thank”, Sloan work-family researchers electronic network, available at: www.bc.edu/

bc_org/ avp/wfnetwork/loppr/ measure_tension.pdf McColl-Kennedy, J.R. and Anderson, R.D. (2002), “Impact of leadership style on emotions on

subordinate performance”, Leadership Quarterley, Vol. 13, pp. 545-59. Mann, S. (1999), “Emotion at work: to what extent are we expressing, suppressing, or faking it?”,

European Journal of Work and Organizational Psychology, Vol. 8, pp. 347-69. Maslach, C. (1982), “Burnout: a social psychological analysis”, in Jones, J.W. (Ed.), The Burnout

Syndrome, London House, Park Ridge, IL, pp. 30-53. Montgomery, A.J., Panagopoulou, E. and Benos, A. (2005), “Emotional labor at work and home

among Greek health professionals”, Journal of Health Management, Vol. 19, pp. 395-408. Montgomery, A.J., Peeters, M.C.W., Schaufeli, W.B. and Den Ouden, M. (2003), “Work-home

interference among newspaper managers: its relationship with burnout and engagement”, Anxiety, Stress & Coping, Vol. 16, pp. 195-211.

Morris, J.A. and Feldman, D.C. (1996), “The dimensions, antecedents, and consequences of emotional labor”, Academy of Management Review, Vol. 21, pp. 986-1010.

Moyle, P. (1995), “The role of negative affectivity in the stress process: tests of alternative models”, Journal of Organizational Behavior, Vol. 16, pp. 647-68.

Netermeyer, R.G., Boles, J.S. and McMurrian, R. (1996), “Development and validation of work-family conflict and family-work conflict scales”, Journal of Applied Psychology, Vol. 81, pp. 400-10.

Panagopoulou, E., Kersbergen, B. and Maes, S. (2002), “The effects of emotional (non-)expression in (chronic) disease: a meta-analytic review”, Psychology & Health, Vol. 17, pp. 529-45.

Parasuraman, S., Purohit, Y.S. and Godshalk, V.M. (1996), “Work and family variables, entrepreneurial career success and psychological well-being”, Journal of Vocational Behaviour, Vol. 48, pp. 275-300.

Pennebaker, J. (1985), “Traumatic experience and psychosomatic disease: exploring the roles of behavioural inhibition, obsession, and confiding”, Canadian Psychology, Vol. 26, pp. 82-95.

Pennebaker, J. and Beall, S. (1986), “Confronting a traumatic event: toward an understanding of inhibition and disease”, Journal of Abnormal Psychology, Vol. 58, pp. 528-37.

Preacher, K.J. and Leonardelli, G.J. (2001), Calculation for the Sobel test: an interactive calculation tool for mediation tests (Computer software), available at: www.unc.edu/, preacher/ sobel/sobel.htm

Pugliesi, K. (1999), “The consequences of emotional labor in a complex organization”, Motivation and Emotion, Vol. 23, pp. 125-54.

Pugliesi, K. and Shook, S. (1997), “Gender, jobs and emotional Labor in a complex organization”, Social Perspectives on Emotion, Vol. 4, pp. 283-316.

Rafaeli, A. and Sutton, R.I. (1989), “The expression of emotion in organizational life”, in Cummings, L.L. and Staw, B.M. (Eds), Research in Organizational Behavior, Vol. 11, JAI Press, Greenwich, CT, pp. 1-42.

JMP

Rafaeli, A. and Sutton, R.I. (1990), “Busy stores and demanding customers: how do they affect the

21,1 display of positive emotion?”, Academy of Management Journal, Vol. 33, pp. 623-37.

Redford, W. and Barefoot, J. (1988), “Coronary-prone behavior: the emerging role of the hostility complex”, in Houston, B.K. and Snyder, C.R. (Eds), Type A Behaviour Pattern: Research, Theory and Intervention, Wiley, New York, NY, pp. 189-211.

Schaubroeck, J. and Jones, J.R. (2000), “Antecedents of workplace emotional labor dimensions and

50 moderators of their effects on physical symptoms”, Journal of Organizational Behaviour, Vol. 21, pp. 163-83.

Schaufeli, W.B. and Enzmann, D. (1998), The Burnout Companion to Study and Research:

A Critical Analysis, Taylor & Francis, London. Schaufeli, W.B., Leiter, M.P., Maslach, C. and Jackson, S.E. (1996), “MBI-General survey”, in

Mashlach, C., Jackson, S.E. and Leiter, M.P. (Eds), Mashlach Burnout Inventory, 3rd ed., Consulting Psychologist Press, Palo Alto, CA.

Schonfield, I.S. (1996), “Relation of negative affectivity to self-reports of job stressors and psychological outcomes”, Journal of Occupational Health Psychology, Vol. 1, pp. 397-412.

Schulz, M.S., Cowan, P.A., Pape Cowan, C. and Brennan, R.T. (2004), “Coming home upset: gender, marital satisfaction and the daily spillover of workday experience into couple interactions”, Journal of Family Psychology, Vol. 18, pp. 250-63.

Schwartz, G.E. and Kline, J.P. (1995), “Repression, emotional disclosure and health: theorectical, empirical and clinical considerations”, in Pennebaker, J.W. (Ed.), Emotional Disclosure and Health, American Psychological Association, Washington, DC, pp. 177-95.

Shirom, A. (1989), “Burnout in work organizations”, in Cooper, C.L. and Robertson, I. (Eds), International Review of Industrial and Organizational Psychology, John Wiley, Chichester, pp. 25-48.

Smith, A.C. and Kleinman, S. (1989), “Managing emotions in medical school: students’ contacts with the living and the dead”, Social Psychology Quarterly, Vol. 52, pp. 56-69.

Smith, S.L. (1999), “The theology of emotion”, Soundings, Vol. 11, pp. 152-8. So¨derfeldt, B., So¨derfeldt, M., Muntaner, C., O’Campo, P., Warg, L. and Ohlson, C. (1996),

“Psychosocial work environment in human service organisations: a conceptual analysis and development of the demand-control model”, Social Science & Medicine, Vol. 42, pp. 1217-26.

So¨derfeldt, B., So¨derfeldt, M., Jones, K., O’Campo, P., Muntaner, C., Ohlson, C.G. and Warg, L. (1997), “Does organisation matter? A multilevel analysis of the demands control model applied to human services”, Social Science and Medicine, Vol. 44, pp. 527-34.

Stenross, B. and Kleinmann, S. (1989), “The highs and lows of emotional labor”, Journal of

Contemporary Ethnography, Vol. 17, pp. 435-52. Stephens, M.A.P., Franks, M.M. and Atienza, A.A. (1997), “Where two roles intersect: spillover

between parent care and employment”, Psychology and Aging, Vol. 12, pp. 30-7. Sutton, R.I. (1991), “Maintaining norms about expressed emotions: the case of bill collectors”,

Administrative Science Quarterly, Vol. 36, pp. 245-68. Temoshok, L. (1985), “Biophysical studies on cutaneous malignant melanoma”, Journal of

Psychosomatic Research, Vol. 29, pp. 139-53. Temoshok, L., Heller, B.W., Sagebiel, R.W., Blois, M.S., Sweet, D.M., DiClemente, R.J. and Gold,

M.L. (1985), “The relationship of psychological factors tp prognostic indicators in cutaneous malignant melanoma”, Journal of Psychosomatic Research, Vol. 29, pp. 139-53.

Tolich, M.B. (1993), “Alienating and liberating emotions at work: supermarket clerks’

Work-family

performance of customer service”, Journal of Contemporary Ethnography, Vol. 22, pp. 361-81.

interference

Vinson, G., Jackson, H., Bono, J. and Muros, J. (2005), “Felt and expressed emotions at work: examining the role of interaction partners”, in Ilies, R. and Johnson, M. (Co-Chairs), Work-Related Social Interactions and Mood: Tests of Affective Events Theory, Symposium presentation at the 20th Annual Conference of the Society for Industrial and Organizational Psychology, Los Angeles, CA.

Voydanoff, P. (1988), “Work role characteristics, family structure demands and the work/family conflict”, Journal of Marriage and the Family, Vol. 50, pp. 749-61.

Wallace, J.E. (1999), “Work-to-nonwork conflict among married male and female lawyers”, Journal of Organizational Behavior, Vol. 20, pp. 797-816.

Watson, D. and Clark, L.A. (1984), “Negative affectivity: the disposition to experience aversive emotional states”, Psychological Bulletin, Vol. 96, pp. 465-90.

Weidner, G., Istvan, J. and McKnight, J.D. (1989), “Clusters of behavioral coronary risk factors in employed women and men”, Journal of Applied Social Psychology, Vol. 19, pp. 468-80.

Wharton, A.S. (1993), “The affective consequences of service work: managing emotions on the job”, Work and Occupations, Vol. 20, pp. 205-32.

Wharton, A.S. (1999), “The psychological consequences of emotional labor”, ANNALS, Annals of the American Academy of Political and Social Sciences, Vol. 561, pp. 158-76.

Wharton, A.S. and Erickson, R.J. (1993), “Managing emotions on the job and at home: understanding the consequences of multiple emotional roles”, Academy of Management Review, Vol. 18, pp. 457-86.

Wharton, A.S. and Erickson, R.J. (1995), “The consequences of caring: exploring the links between women’s job and family emotion work”, Sociological Quarterly, Vol. 36, pp. 273-96.

Woodworth, R.S. (1928), “Dynamic psychology”, in Murchison, C. (Ed.), Psychologies of 1925, Clark University Press, Worcester, MA.

Zapf, D., Vogt, C., Seifert, C., Mertini, H. and Isic, A. (1999), “Emotion work as a source of stress: the concept and development of an instrument”, European Journal of Work and Organizational Psychology, Vol. 8, pp. 370-400.

Corresponding author Anthony J. Montgomery can be contacted at: amontgomery@rcsi-mub.com

To purchase reprints of this article please e-mail: reprints@emeraldinsight.com Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/0268-3946.htm

JMP 21,1

Assessing ethical behavior: the impact of outcomes on judgment bias

Robert L. Cardy

Department of Management, W.P. Carey College of Business,

Received April 2005 Revised October 2005

Arizona State University, Tempe, Arizona, USA, and

Revised November 2005 Accepted November 2005

T.T. Selvarajan

University of Houston-Victoria, School of Business, Sugar Land, Texas, USA

Abstract Purpose – The objective of this empirical study is to apply the methodology commonly used to

performance appraisal and examine if outcomes achieved by ratees bias rater’s judgment of ratee ethical behavior. Design/methodology/approach – Two studies were conducted: in study 1 the participants were undergraduate business students and in study 2, the participants were MBA students but who were also full time employees. In both these studies, participants read the vignettes and rated the ratee performance using behavior observation scale.

Findings – Both the studies found support for the main hypothesis that outcomes achieved by the ratees influenced judgment of ethical behavior. The hypothesis that ethical beliefs of raters will moderate the biasing influence of outcomes on ethical judgment bias was not supported.

Research limitations/implications – If outcomes achieved by employees influence judgment of ethical behavior, future research has to examine how the biasing influence of outcomes on ethical judgments can be mitigated or eliminated.

Practical implications – If managers are influence by outcomes achieved by their employees in judging the ethical behavior, it can lead to “success breeds acceptance” culture. If organizations place undue emphasis on outcomes at the cost of ethical standards, unethical behavior of individuals could

be condoned or justified which would lead to worsening of ethical climate in these organizations. Originality/value – This study demonstrated that outcomes achieved by employees biases

judgment of their ethical behavior and this finding has important implications for designing effective appraisal systems for assessing ethical behavior of employees.

Keywords Ethics, Performance appraisal Paper type Research paper

Ethics at the organizational and individual levels has been of considerable interest to researchers over the past four decades (e.g. Baumhart, 1961; Brenner and Molander, 1977) and has been asserted to be an important problem facing American companies today (Baehr et al., 1993). The issue of ethics has recently become the focus of media attention in the wake of corporate scandals created by the actions of executives in companies such as Enron, WorldCom, Global Crossing, and Arthur Anderson. In

Journal of Managerial Psychology Vol. 21 No. 1, 2006

addition to being a current issue in the media domain, unethical behavior in

pp. 52-72

organizations has also been identified as a relevant social issue demanding the

q Emerald Group Publishing Limited 0268-3946

attention of researchers (Trevino and Youngblood, 1990). Ethical decision making and

DOI 10.1108/02683940610643215

behavior has become a focus of interest in areas such as accounting (e.g. Brief et al.,

1996), marketing (Akaah and Lund, 1994), and management (Gatewood and Carroll, Assessing ethical 1991).

behavior

Most of the research in the ethical domain has not directly addressed ethical performance and its measurement. Rather, research has either been prescriptive or focused on surveys regarding perceptions or opinions of ethical performance. Further, theoretical work has consisted of the development of models of the determinants of ethical behavior (e.g. Trevino, 1986). In general, these models all propose that personal

and organizational variables influence ethical behavior (Akaah and Lund, 1994). Understanding the psychological and situational determinants of ethical performance is important, but, as suggested by Gatewood and Carroll (1991) in their conceptual consideration of the assessment of ethical performance, accurate measurement of ethical behavior is important as well.

Explicit incorporation of ethical behavior into performance appraisal has been recently recommended by researchers (Buckley, 2001; Weaver, 2001; Weaver and Trevino, 1999). Their argument is that inclusion of ethical dimensions into regular performance appraisal systems integrates ethics expectations into employees’ formal role identities and makes ethical behavior at work relevant and rewarding for employees. However, the extent to which ethical performance judgments might be subject to systematic biases has not yet been empirically addressed. The purpose of the present study is to provide an initial exploratory examination of possible influences on the accuracy of judgments of the ethics of performance.

Unethical behavior among organizational members can take a variety of forms, ranging from convenient disregard of company policies to breaking civil or criminal law. Some instances of unethical behavior may result in objective evidence that unethical behavior occurred. However, ethical behavior is often seen as a social reality (Payne and Giacalone, 1990) rather than as an objective fact. Thus, just as with the assessment of job performance (Cardy and Dobbins, 1994), determination of the level of ethical performance requires subjective judgment. Performance ratings based on subjective judgment raise concerns regarding their accuracy. Certainly, facets such as user acceptance and satisfaction are important criteria for assessing appraisal effectiveness, but accuracy continues to be a critical criterion of appraisal effectiveness (Jelley and Goffin, 2001; Uggerslev and Sulsky, 2002; Borman et al., 2001; Viswesvaran et al., 2002; Cardy and Dobbins, 1994). Recently, in a study of 360 degree feedback, (Brett and Atwater, 2002) found that less favorable ratings were related to the belief that ratings were less accurate. Further, as observed by Weaver (2001), accurate appraisal of ethical behavior is important from a fairness/justice perspective. Given the importance of accurate judgment of the ethical performance in organizations, investigation of influences on the accuracy of these judgments is needed. The objective of this study is to apply the methodology commonly used in the appraisal literature in an empirical examination of selected possible influences on the ethical judgment process.

Performance appraisal is a rater decision making process which is subject to biases due to cognitive limitations of the rater (Cardy and Dobbins, 1994). Since biases degrade decision outcomes (Doerr and Mitchell, 1998), a fundamental issue of interest would be to determine the nature of rater biases in the ethical judgment process. The performance appraisal literature suggests that performance judgments can be biased by outcomes achieved by the ratee, characteristics of the rater, and trait inferences

JMP

made by the rater (e.g. Cardy et al., 1991; Krzystofiak et al., 1988) Thus, an interesting

research question would be to determine if outcomes (success or failure in job performance) bias judgments of ethical behavior. In other words, a central question of importance is, “Does success breed acceptance”? The issue of outcomes affecting ratings is all the more important in appraising ethical performance since it is an important facet of the “means versus ends” dilemma in judging ethical behavior

54 (Brady, 1990). Research suggests that outcomes influence performance ratings (DeNisi and Stevens, 1981; Cardy et al., 1991). Likewise, an outcome schema may also have an influence on evaluating the ethical nature of a performer’s behaviors. From a schematic perspective (e.g. Neisser, 1967), a worker who achieves excellent work outcomes may

be placed in a successful-performance cognitive category by the rater. Unethical behavior that the worker engaged in may be ignored, discounted or reinterpreted to be consistent with the schema of a high performer. In other words, people may tend to rate the ethical nature of a successful employee more favorably than that of an unsuccessful employee. The above discussion leads to the following hypothesis:

H1a. Outcomes (success/failure) will bias the ethical judgment of raters such that the ethical nature of successful ratees will be judged with more positive bias than that of unsuccessful ratees.

The level of ethical behavior may also influence the accuracy of ethical judgments. The accuracy of observational judgments of performance has been found to be influenced by the level of ratee performance (Gordon, 1970). Specifically, high performers have been found to be rated more accurately than low performers. While the underlying basis for this effect is unclear, it appears that we may give low performers the “benefit of doubt” (Krzystofiak and Cardy, 1987) and report the occurrence of effective behaviors even though they did not occur. While an exploratory notion, it is possible that a similar phenomenon occurs with ethical performance. The following exploratory hypothesis is offered:

H1b. Ethical judgments will be more accurate (i.e. less biased) for ethical than for

unethical ratees. Characteristics of raters have been recognized and researched as potential influences

on performance ratings (Cardy and Dobbins, 1994). For example, intelligence, schemas for ratings scales, schemas for workers, and rater personality have influenced performance rating accuracy. To the extent that rater characteristics influence judgments of performance, the evaluations can indicate more about the rater than the ratee. While ratee performance is typically found to be a dominant influence on performance ratings, many studies have found that rater characteristics also have significant effects on performance judgments (Cardy and Dobbins, 1994).

Rater characteristics, such as personality and ability, have been recognized in the appraisal literature as potentially important influences on ratings (e.g. Lee, 1988; Cardy and Kehoe, 1984). In addition to a direct influence, rater characteristics have been suggested to moderate the effect of external factors on ratings (e.g. Landy and Farr, 1980; Daniel et al., 1997). Likewise, external influences on ethical judgments may be moderated by rater differences, particularly in regard to ethical beliefs. Raters may differ in their internal standards for identifying behavior as ethical/unethical based on their personal beliefs about ethics. The strength of these beliefs may moderate the Rater characteristics, such as personality and ability, have been recognized in the appraisal literature as potentially important influences on ratings (e.g. Lee, 1988; Cardy and Kehoe, 1984). In addition to a direct influence, rater characteristics have been suggested to moderate the effect of external factors on ratings (e.g. Landy and Farr, 1980; Daniel et al., 1997). Likewise, external influences on ethical judgments may be moderated by rater differences, particularly in regard to ethical beliefs. Raters may differ in their internal standards for identifying behavior as ethical/unethical based on their personal beliefs about ethics. The strength of these beliefs may moderate the

behavior

other words, individual differences in ethical beliefs of raters can be considered as an individual dispositional influence that may affect the relationship between ethical behavior and ethical judgments. Thus:

H2. Individual differences in ethical beliefs will moderate the effect of outcomes on the accuracy of ethical performance judgments.

Though trait judgments have fallen into disfavor in appraisal (e.g. Bernardin and Beatty, 1984), research has shown that raters make trait inferences even when only behavioral information is available (Krzystofiak et al., 1988; Cardy et al., 1991). Traits are considered cognitively superordinate to behaviors (Hoffman et al., 1981) and inferences concerning personality traits guide further information processing such as additional encoding of behaviors and their recall and evaluation (Krzystofiak et al., 1988).

In the present study, we are interested to see if the trait judgments raters make about their ratees influence judgments of ethical behavior of ratees even when the raters are presented no information about the ratee’s personality or demographic factors such as race and gender. For example, when raters are only presented information about success, raters could make inferences on ratee personality (e.g. conscientiousness) and this inference could influence the rater’s ethical judgment of ratees. Thus, it may be expected that raters make ratee trait inferences from the performance of ratees, and these trait inferences may have an independent effect on ethical judgments. This expectation leads to the following hypothesis:

H3. Trait inferences will have an influence on ethical judgments that cannot be accounted for by outcomes.

We conducted two studies to test the hypotheses in this research. Study 1 was conducted with undergraduate students and we tested all the three hypotheses in this study. Study 2 was conducted with full time employees who were also MBA students. The purpose of study 2 was to test if the results of the study 1 (H1 and H2) were generalizable across a sample of raters with real world working experience. In addition, as explained in the section on study 2, the research design was slightly modified to make the stimulus materials more realistic.

Study 1 Method Sample and materials. A total of 132 (77 male and 55 female) students enrolled in a junior level management course at a large south-western university participated in the study for extra credit. Mean age for the sample was 24.1 years.

The basic experimental material for this study consisted of written vignettes describing the behavior of four fictitious salesperson ratees. Each vignette contained ten critical incidents describing a salesperson’s ethical behavior. The critical incidents represented five of the six dimensions (two incidents for each dimension) of ethical behavior. The six dimensions of ethical behavior (personal use, bribery, deception, padding expense accounts, passing blame and falsification) were based on the research by Newstrom and Ruch (1975) and Akaah and Lund (1994). For each ratee vignette, no

JMP

information was provided on one of the six ethical dimensions (there were two lure

items representing this dimension) since items from this dimension served as “lure” items on the behavioral observation scale. Lure items consist of descriptions not included in the vignette but included as part of the behavioral observation scale. In a signal detection framework (Lord, 1985), lure items are required for calculation of judgment accuracy. Lure items are useful to determine if the raters make judgment

56 based on actual memory of behaviors of ratees or scheme for the ratee.

Ethical behavior was manipulated by providing ethical or unethical incidents for each of the four ratees. The “ethical” ratees had ethical incidents and “unethical” ratees had unethical incidents for all the dimensions of ethical performance. A list of ethical and unethical behavioral incidents used in this study is presented as an Appendix. The dimensions of ethical behavior used in this study roughly correspond to the types of misconduct observed most frequently in organizations. In a National Business Ethics Survey of 1,500 employees, the Ethics Resource Center (2003) found that the most often observed misconducts were lying, withholding needed information, abusive or intimidative behavior toward employees, and mis-reporting actual hours worked.

Scale development work (Cardy and Selvarajan, 2004) confirmed the dimensionality and effectiveness of the ethical and unethical incidents. In brief, the development work generated a six-dimension behavioral scale for assessing ethical judgment using the Behaviorally Anchored Rating Scale (BARS) procedure outlined by Bernardin and Beatty (1984). For each of the six dimensions, the authors generated critical incidents representing ineffective, average, and effective ethical behaviors. This is a variation from the BARS procedure outlined by Bernardin and Beatty (1984), in which students were used to generate critical incidents. In this study, due to familiarity with the domain and efficiency concerns, the authors generated the incidents. Retranslation was conducted by a group of 47 undergraduate student raters who indicated the dimension to which each of the critical incidents belonged. Finally, the effectiveness of the items surviving the retranslation process was evaluated by a separate group of 84 student raters. Based on the effectiveness levels of items, behaviorally anchored rating scales were constructed for the six dimensions of ethical behavior. Each scale had approximately five behavioral anchors spanning the range of each scale. Further details can be found in Cardy and Selvarajan (2004).

In addition to critical incidents of ethical behavior, the vignettes also contained summary statements regarding sales performance outcomes. The summary statements were drawn from the dimensions of sales performance (salesmanship, product knowledge, and ability to initiate/utilize sales innovations) identified from research in marketing (Bush et al., 1990; Lucas, 1985). These statements summarized the salespersons’ outcomes achieved for each of these dimensions. For example, a ratee with poor outcomes was described as failing to close sales on the salesmanship dimension. In addition to these summary statements, the overall performance of each ratee was described as successful or unsuccessful, as appropriate; a successful ratee was described as a star performer who consistently exceeded all performance targets and an unsuccessful ratee was described as a dismal performer who never achieved performance targets.

In summary, ethical behavior was manipulated by providing unethical or ethical critical incidents for each of the four ratees. Outcomes were manipulated by providing performance outcome descriptions for the sales performance dimensions and by In summary, ethical behavior was manipulated by providing unethical or ethical critical incidents for each of the four ratees. Outcomes were manipulated by providing performance outcome descriptions for the sales performance dimensions and by

behavior

(ethical or unethical) with the two levels of outcomes (success or failure). Measures. The rating scales included:

a 12-item behavioral observation scale (BOS);

a 50-item bipolar trait rating scale measure of the big five dimensions (Goldberg, 1992);

one seven-point global ethical rating scale;

one seven-point global performance rating scale; and

a ten-item seven-point Likert scale for measuring individual differences in ethical beliefs (Froelich and Kottke, 1991).

A Latin square design was used to balance the order of the presentation of scales to the participants. This eliminates any order effect due to the order of presentation of scales to the rater.

The behavioral observation scale contained a 12-item checklist on which participants were asked to print Y (yes) or N (no) depending on whether the sales person had exhibited the specific behavior. Ten of these items represented the ten critical incidents provided in the vignette. Two items served as “lures” to calculate false alarm rates. This scale is used for calculating the dependent measure “bias”.

The trait scale was used to measure rater trait inferences regarding ratee behaviors. The 50-item bipolar scale was developed by Goldberg (1990, 1992) to measure the big five personality traits. This scale was used since it is shorter than other instruments (e.g. NEO-PI or Hogan Personality Inventory) and exhibits consistent reliability. Internal consistency estimates of reliability have ranged from 0.85 to 0.93 and extensive factor analyses support the construct validity of the scale (Goldberg, 1990, 1992). The alpha reliabilities for the five factors in this study ranged from 0.86 to 0.92.

For measuring individual differences in ethical perception, the ten-item scale developed by Froelich and Kottke (1991) was used. The scale has been shown to have a factor structure with two factors (“company support” and “lie to protect the company”). Froelich and Kottke (1991) reported an alpha coefficient of 0.89 for this scale. The alpha reliability for the two factors were 0.86 and 0.89. Finally, two global rating scales were used as manipulation checks for the ethical and outcome manipulations.

Dependent measure. Measuring rating accuracy invariably requires the use of a laboratory approach so that “true” ratee performance can be determined. The estimates of “true” ratee performance serve as the standards against which ratings can be compared. The process of “true” score estimation was developed by Borman (1977), and further details of the approach can be found in his work. Accuracy can be assessed for evaluative or observational judgments, with observational rating accuracy typically being assessed using measures based on signal detection theory (e.g. Lord, 1985). Observational ratings are the judgments of the raters as to whether a ratee behavior occurred. Signal detection measures utilize hit rates and false alarm rates as a means of indicating the accuracy of rater judgments. The observational accuracy construct is interesting since it indicates whether the actual observations reported by the rater are veridical, not just whether evaluations about the quality of performance are shifted in an upward or downward fashion.

JMP

The signal detection measure of “bias” was used as the dependent measure for this

study. Signal detection measures have been used extensively in determining judgment accuracy (e.g. Landy and Farr, 1980; Lord, 1985; Snodgrass and Corwin, 1988; Sulsky and Day, 1992). While several sensitivity indices based on signal detection theories are available, the bias index introduced by Snodgrass and Corwin (1988) is most relevant for this study since the concern is with directional error. That is, we are interested in

58 knowing whether the rater tended to err toward judging the ratee as ethical or unethical.

Sulsky and Day (1992, p. 502) define bias as “the probability of saying ‘yes’ to an item when faced with a recognition task under conditions of uncertainty”. That is, bias is a function of the probability of saying “yes” to lure items (items not presented in ratee description vignette but appear in behavioral observation scale) Bias refers to a subject’s tendency to over or underattribute behaviors to the target person. In other words, bias is a measure of the leniency-stringency of decision criteria. For example, raters who use an overall positive schema to categorize ratee performance may exhibit more leniency on a recognition task for prototypical behaviors, and raters who do not use an overly positive schema for performance will exhibit less leniency or more stringency in identifying prototypical behaviors. A high bias score indicates a lenient criterion and a low bias score indicates a stringent decision criterion (Lord, 1985). The numerical limits for bias are 0 to þ1 with scores above 0.5 indicating a more liberal bias and scores below 0.5 indicating a conservative bias (Borman and Hallam, 1991). In the present case, a liberal bias would be reflected in identifying more ethical behaviors as characteristic of the ratee than would be warranted by the ratee’s actual behavior; a conservative bias would be reflected in identifying fewer ethical behaviors as characteristic of the ratee than would be warranted by the ratee’s actual behavior. Bias is measured by using the following formula recommended by Snodgrass and Corwin (1988):

Higher scores on this index are associated with more lenient decision criteria. A hit rate is the proportion of items correctly identified as observed, and a false alarm rate is the proportion of items incorrectly identified as observed. Since bias is undefined for hit rates of 1.0 and corresponding false-alarm rates of 0, the following corrections recommended by Snodgrass and Corwin (1988) were used.

Snodgrass and Corwin (1988) recommend the routine use of this correction in analyses using signal detection theory.

Procedure. The participants were randomly assigned to one of the four experimental conditions (i.e. success, unethical; success, ethical; failure, unethical; failure, ethical). The packet of materials given to the participants included one ratee vignette, one set of rating forms, the ethical belief scale, and a demographics data form. Participants were instructed to read each vignette carefully and fill out the various rating forms.

Participants were specifically asked not to look back at the description of the Assessing ethical salesperson while they were filling out the rating forms.

behavior

Analysis. The data included manipulated behaviors and outcomes, trait ratings, the bias measure, and overall ratings. This data was available for 132 participants (33 participants for each of the four manipulation conditions).

To test hypotheses H1a and H1b, a 2 £ 2 factorial ANOVA (two levels of ethics £ two levels of outcomes) was run with bias as the dependent measure. For H2, a

hierarchical regression analysis (Baron and Kenny, 1986) was used to determine the moderating effect of ethical beliefs. The outcome variable was entered into the regression equation in the first step, and belief variables were entered in the second step. In the third step, the interaction term (belief £ outcomes) was entered. A moderation effect is confirmed if the R-squared change attributed to the interaction term is significant. Before doing the regression, a factor analysis of the beliefs scale was performed to determine its factor structure.

The third hypothesis regarding the influence of trait inferences was tested by running a series of regression analyses. Before doing the regression, a factor analysis of the big five trait scale was performed. In the regression model, the outcome manipulation was entered into the equation first. At the second step, the first trait factor was introduced. The increase in the squared multiple correlation that results from the introduction of this factor represents the unique addition attributable to this factor. This process was repeated for all of the trait factors.

Results Analysis of the manipulation checks revealed that both the outcome and the ethical manipulations were effective (Fð1; 131Þ ¼ 197; p , 0:01 and Fð1; 131Þ ¼ 322; p , 0:01, respectively). The means and standard deviations of the bias measure for the four experimental conditions are presented in Table I.

H1a, that outcomes will bias ethical behavior judgments, was tested using analysis of variance. The results of the ANOVA showed significant main effect for the influence of ethics (Fð1; 128Þ ¼ 40:24; p , 0:01) and no significant interaction between the ethics and outcome effects. As can be seen from Table I, the mean judgment bias scores were higher when the ratee exhibited successful outcomes with a mean bias score in the success condition of 0.839 and a mean bias score in the failure condition of 0.687. Thus, H1a was supported. The mean judgment bias scores in the ethical and unethical behavior conditions were 0.842 and 0.684, respectively. Thus, H1b, that ethical

Factor

n Outcome: success

33 Outcome: failure

0.23 33 Table I. Ethical

33 Cell means and standard deviation for outcome

Note: Higher score indicates more bias and ethical manipulations

JMP

behavior judgments would be more accurate for ethical than for unethical ratees, was

not supported. Before running the hierarchical regression analysis, a factor analysis was performed to determine the factor structure of the ethical beliefs scale. Using principal components factor analysis, a two-factor solution was obtained. Varimax rotation of the two factors showed that the first six items loaded on factor 1 and the last four items

60 loaded on factor 2. This pattern is similar to the results obtained by Froelich and Kottke (1991). The two factors accounted for 46.3 percent and 12.4 percent of the variance respectively.

H2 predicted that ethical beliefs would moderate the effect of outcomes on the accuracy of ethical behavior judgments. The results of the hierarchical regression in regard to H2 are shown in Table II. Outcomes alone accounted for 17.8 percent of the variance in the bias measure. The factor COSUP (company support) was not significant and accounted for only a 1 percent increment in variance. The increase in R-squared due to the inclusion of the interaction factor (outcome x COSUP) was 0.001 and was not significant. Likewise, the influence due to the factor LIE_PROTECT (lie to protect the company) was not significant and accounted for only a 0.3 percent increase in variance. The R-squared change due to the inclusion of the interaction factor (outcome £ LIE_PROTECT) was zero.

H3 predicted trait inferences to have an influence on the accuracy of ethical behavior judgments over and above the influence due to performance outcomes. Before running the regression analysis, a factor analysis of the trait scale was conducted. A six-factor solution was found but the sixth factor had an eigen value of 1.01 and accounted for only 2 percent of variance in the trait construct. Thus, the theoretically meaningful five-factor solution was used with a principal components analysis. Varimax rotation of the factors showed that 44 out of 50 items loaded high (loading . 0.5) on only one of the five factors and six items loaded high (loading . 0.5) on more

than one factor. These six items (five items representing conscientiousness and one item representing intellect) were excluded from further analysis. Average scores for the remaining items were computed for each of the five trait dimensions.

Table III shows the regression analysis results for H3. The trait, pleasantness accounted for an 11.5 percent increase in variance (p , 0:01), the trait

conscientiousness accounted for an 8.3 percent increase in variance (p , 0:01) and

Steps and variables

R-squared change F for R-squared change Factor: lie to protect the company

Beta

1. Outcomes (O)

2. Lie to protect company (LIE_PROTECT) 20.076

0.72 Factor: company support

3. O £ LIE_PROTECT

Hierarchical regression 28.89*

Table II.

1. Outcome (O)

analysis to test for

2. Company support (COSUP)

0.66 moderating effects of

3. O £ COSUP

ethical beliefs Note: * p , 0:01 ethical beliefs Note: * p , 0:01

behavior

bias scores. Discussion

The results confirm H1a that successful ratees would be judged with more positive bias than unsuccessful ratees. That is, successful ratees were judged as exhibiting

more ethical behavior than were unsuccessful ratees. It appears that “success breeds acceptance” in that ethical behavior judgments were more positively biased when the ratee achieved successful outcomes than when the ratee was unsuccessful. The pattern of accuracy measures also indicate that behavioral ethical judgments were as positive for a worker who was successful but employed unethical behaviors in obtaining the outcome as for a worker who was not successful but engaged in ethical behaviors (it can be observed from Table I that the bias scores for the success-unethical condition and the failure-ethical condition are almost the same).

For H1b, we found that positive bias was greater for ethical ratees than for unethical ratees. That is, ethical ratees were judged less accurately than unethical ratees. This finding contradicts Gordon’s (1970) empirical differential accuracy finding of high performers being rated more accurately than low performers (in this context, high and low ethical performance). This contradiction may be because Gordon employed a unique measure of hit rates and correct rejection rates. Indeed, when Baker and Schuck (1975) reanalyzed Gordon’s data from a signal detection perspective, they found no consistent evidence for the differential accuracy phenomenon. The apparent inconsistency with the differential accuracy phenomenon may be a function of the accuracy measures employed. However, the present finding seems to be generally consistent with the underlying explanation for the differential accuracy phenomenon. Specifically, the differential accuracy phenomenon may be due to the tendency to give the benefit of doubt (e.g. Krzystofiak and Cardy, 1987) and, if there is uncertainty, to err toward guessing good rather than guessing bad. In the present case, this tendency may

be exacerbated when confronted with a ratee who exhibits ethical performance. If raters form an overall positive schema for a ratee who is observed to perform ethically on some dimensions, this schema would lead the rater to “guess good” when asked about ethical performance on a dimension that was not observed. This effect would be expected to be particularly pronounced when a ratee description consists of entirely ethical or unethical behaviors.

F for R-squared change Outcome

Variables

Beta

R-squared change

Emotional stability

Table III. Notes: * p , 0:01; ** p , 0:05. Outcome was entered first in the regression equation. R-squared

Hierarchical regression change for each of the five trait variables indicate incremental change over and above the variance

analysis to test for trait explained by the outcome variable

inference effects

JMP

H2 suggested a moderating effect for ethical beliefs in regard to the influence of

outcomes on bias. This hypothesis was not supported. That is, the strength of ethical beliefs did not reduce the influence of outcomes on judgment bias. Stronger ethical beliefs, as measured in this study, did not reduce outcome based judgment bias.

H3 stated that raters may draw trait inferences based on the behaviors of ratees and that these inferences may bias ethical judgments over and above the outcome

62 influenced bias. There was partial support for the hypothesis. The traits, pleasantness, emotionality stability and conscientiousness, independently accounted for variance in

bias over and above the variance accounted for by the outcome manipulation. The confirmation of the trait inference hypothesis is in line with findings of earlier studies (e.g. Krzystofiak et al., 1988; Cardy et al., 1991). Raters seem to draw trait inferences even when the only information provided is ratee behavior and outcomes. There was no information provided on personality characteristics, race, or gender (the outcomes and summary statements were described in neutral terms), yet raters went beyond the given information and drew trait inferences. These trait inferences, in turn, influenced rater judgments concerning the ethical behavior exhibited by the ratee.

Study 2 Study 1 provided initial findings regarding judgment accuracy when assessing ethical performance. Study 2 was conducted to examine the stability of findings from study 1 and to test the hypotheses with more realistic material and more experienced raters. The ethical manipulation in study 1 consisted of ratee vignettes with all critical incidents for the “ethical ratee” depicting ethical behavior and all incidents for the “unethical ratee” depicting unethical behavior. This portrayal of ratees may be extreme and, thus, may not reflect actual ratee behaviors found in real life. A mixture of both ethical and unethical behavioral incidents would represent a more realistic ethical/unethical behavior. Thus, a major objective of study 2 was to again examine the effect of outcomes on ethical judgment bias using ratee vignettes consisting of both ethical and unethical ratee behaviors.

Study 1 was conducted with undergraduate students participating as raters. While the use of students as participants is often not a problem for studies of judgment processes (e.g. Mook, 1983; Cardy, 1991), the question of generalizability of findings with a more applied sample always remains. Study 2 employed participants with more real world experience.

Finally, the individual differences in the ethical belief scale used in study 1 (Froelich and Kottke, 1991) may measure a narrow domain of ethical beliefs. Froelich and Kottke’s (1991) scale measured two dimensions representing “company support” and “lie to protect the company.” A more comprehensive individual beliefs scale may more effectively tease out the moderating influence of ethical beliefs on ethical judgments. Study 2 again examined the possible moderating role of ethical beliefs, but utilized a more comprehensive measure of ethical beliefs.

Method Sample and materials. A total of 48 full time employees enrolled in a night MBA program at a large south-western university volunteered to participate in the study. Of these, 44 participants (18 women; 26 men) returned questionnaires in usable form. The average age of the participants was 32 years with an average of ten years of work Method Sample and materials. A total of 48 full time employees enrolled in a night MBA program at a large south-western university volunteered to participate in the study. Of these, 44 participants (18 women; 26 men) returned questionnaires in usable form. The average age of the participants was 32 years with an average of ten years of work

behavior

experience in appraising others” to “7 ¼ Great deal of experience in appraising others”, the average appraising experience for these participants was 4.6.

As in study 1, the basic experimental material for this study consisted of written vignettes describing the behavior of four fictitious sales person ratees. The critical incidents used in study 1 (presented as Appendix) were used for study 2. Each vignette

contained ten critical incidents describing a salesperson’s ethical behavior. The behavioral incidents represented five of the six dimensions (two incidents for each dimension) of ethical behavior. For each ratee vignette, no information was provided on one of the six ethical dimensions. The “ethical” ratees had ethical behavioral incidents for four of the five dimensions of ethical performance and unethical behavioral incidents for the fifth dimension (“deception”). Similarly, “unethical” ratees had unethical behavioral incidents for four of the five dimensions of ethical performance and ethical behavioral incidents for the fifth dimension (“deception”). As in study 1, in addition to the incidents of ethical behavior, the vignettes also contained eight summary statements regarding sales performance outcomes; further, the overall performance of each ratee was described as successful or unsuccessful, as appropriate.

A total of four ratee vignettes were formed by crossing the two levels of ethical behavior (ethical or unethical) with two levels of outcomes (success or failure). Measures. The rating scales included a 12-item behavioral observation scale used in study 1. A scale developed by Daniel et al. (1997) was used for measuring individual differences in ethical beliefs. This scale includes five dimensions of ethical beliefs: personal integrity issues, corporate integrity issues, individual rights issues, environmental issues, and international issues. For this study, the items representing the three most relevant dimensions, namely, personal integrity issues, corporate integrity issues, and individual rights issues, were used. Sample items for this scale include: “It’s acceptable to use investment resources from questionable resources” and “It’s acceptable to restrict legal actions by damaged customers” The alpha reliability for this 16-item scale in the present study was 0.76. The two global rating scales were used as manipulation checks for the ethical and outcome manipulations. A demographic data form was used to collect information about age, number of years of experience, current job title, gender, and performance appraisal experience. As in study 1, the dependent measure was bias.

Procedure. The participants were randomly assigned to one of the four experimental conditions (i.e. success, unethical; success, ethical; failure, unethical; failure, ethical). The packet of materials given to the participants included one ratee vignette, one set of rating forms, an ethical belief scale, and a demographics data form. Participants were instructed to carefully read each vignette and fill out various forms. Participants were specifically asked not to look back at the description of the salesperson while they were filling out the rating forms.

Results Usable questionnaires were completed by 44 of 48 participants. The possible influence of demographic characteristics of respondents on the bias measure was the focus of an initial analysis. Results of the analysis showed that age, gender, years of experience,

JMP

and appraisal experience did not significantly affect the level of bias in the ethical

performance ratings. Analysis of manipulation checks revealed that both the outcome and ethical manipulations were effective (Fð1; 42Þ ¼ 62:45, p , 0:001; Fð1; 42Þ ¼ 42:41, p , 0:001 respectively). The corresponding effect sizes were 0.602 and 0.504.

The means and standard deviations of the bias measure for the four experimental

64 conditions are presented in Table IV. The results of the ANOVA showed significant main effects for ethics (Fð1; 40Þ ¼ 11:71, p , 0:01) and no significant interaction effects between ethics and outcomes. The mean level of bias for the success condition was 0.791 and for the failure condition was 0.567.

Before running the hierarchical regression analysis, a factor analysis was performed to determine the factor structure of the scale. Using principal components factor analysis and inspection of a scree plot, a five factor solution was obtained in which all eigen values were greater than 1. However, a three-factor solution was expected based on the scale development work (Daniel et.al., 1997). Given the theoretical meaningfulness of three dimensions, a three-factor solution was forced using principles component analysis. Varimax rotation indicated that two items did not load high (. 0:5) on any of the three factors. These items were dropped from further analysis.

The results of the hierarchical regression for the moderating influence of ethical beliefs (H2 of study 1) are shown in Table V. As evident from the table, the interactions between outcomes and the three ethical beliefs dimensions were not significant. None of the three interactions resulted in practical or significant increases in R-squared at p , 0:05.

Discussion Study 2 reexamined H1a and H2 of study 1 using participants with more real world experience as raters and more realistic vignettes. Consistent with study 1, the results of study 2 indicate that outcomes bias judgments. Ethical ratings were again found to be more positively biased for successful than for unsuccessful ratees. While not a significant contrast, the pattern of results for this more experienced group of raters indicate that ethics was rated even more positively for a worker who was unethical but successful than for a worker who was ethical but unsuccessful. The moderating influence of individual beliefs was not found to be significant, although a more comprehensive scale was employed to measure individual differences in ethical beliefs.

n Outcome: success

0.87 0.04 10 Outcome: failure

Table IV. Cell means and standard Unethical

0.134 11 Ethical

0.158 12 deviation for outcome

and ethical manipulations Note: Higher score indicates more bias

Assessing ethical

Steps and variables

Beta

R-squared change

F for R-squared change

behavior

Factor: personal integrity 1. Outcomes (O)

2. Personal_Integrity

3. O £ Personal_Integrity

Factor: Corporate_Integrity

1. Outcome (O)

2. Corporate_Integrity

3. O £ Corporate_Integrity

Factor: Individual_Rights 1. Outcome (O)

Table V. 2. Individual_Rights

Hierarchical regression 3. O £ Individual_Rights

analysis to test for moderating effects of Note: * p , 0:01

ethical beliefs

Both studies generated identical results despite changes in materials and different samples. The consistency of results across two samples may reflect the findings by Locke (1986) that students and employees respond similarly to experimental stimuli. More importantly, the identical pattern of findings across the two studies supports the generalizability of the effects of outcomes on ethical performance judgments.

General discussion The purpose of this research was to examine some of the important processes underlying the assessment of the ethical nature of performance. Specifically, two studies examined whether outcomes achieved by ratees bias rater judgments of their ethical performance. The results from both studies indicate that outcomes bias ethical judgments. That is, successful ratees were judged to have exhibited more ethical behaviors than did unsuccessful ratees. This finding extends similar findings in the domain of performance appraisal (e.g. Cardy et al., 1991; DeNisi and Stevens, 1981) to judgments of ethical performance. It is important to note here that the dependent measure of judgment bias has to do with observation, not with evaluation of the quality of the ratee’s performance. The raters in these studies were more likely to report the occurrence of ethical behavior that did not actually occur when the ratee achieved successful outcomes than when the ratee’s performance was unsuccessful. While determining the exact locus of this effect requires additional research, it appears that either rater perception and/or recall of ethical behavior is biased by the outcomes achieved by ratees. It is also interesting to note that these results are consistent with the results of the National Business Ethics Survey conducted by the Ethics Resource Center (2000). In this survey of 1,500 employees, nearly one-third of respondents said their co-workers disregard questionable ethics practices by showing respect for those who achieve success using them.

The present finding regarding the influence of outcomes on ethical performance judgments is provocative and, given that it reflects common cognitive processing characteristics, it has important implications for organizations. Success may serve to excuse unethical behaviors. This phenomenon is sometimes referred to in the sports

JMP

domain where it has been noted that winning is everything and that nothing succeeds

like success. For the business world, which generally places strong emphasis on results, this finding means that others’ perception of ethical behavior may be significantly colored by the outcome levels. If ethical judgments are colored by outcomes, then, employees may quickly learn that achieving positive ends can excuse less than positive means.

66 Both the studies found that rater’s ethical beliefs did not moderate the influence of outcomes on ethical judgment bias, contrary to the hypothesis (H2). This finding further underscores the importance of the biasing influence of outcomes on ethical judgment bias. That is, this outcome influenced bias held true even for those raters

who held relatively stronger ethical beliefs in their lives. In this study, we tested for one rater characteristics. There could be other potential rater characteristics such as the rater personality characteristics, rater’s cognitive moral development, and rater ability which could potentially moderate the relationship between outcome and ethical judgment bias.

Practical implications and directions for future research The major finding that outcomes bias ethical judgments has important implications for practicing managers. Clearly, if managers take a relatively more lenient view of unethical behaviors for successful subordinates, it could perpetuate a “success breeds acceptance” culture. This effect would likely be more pronounced in organizations which place a stronger emphasis on outcomes. Managers, in particular, may naturally place an emphasis on outcomes since they do not always have the opportunity to observe the behavior of subordinates closely. One way to counter the influence of outcomes on ethical judgment may be to collect ethical ratings from multiple sources (e.g. peers and customers). It may be that non-managerial sources of appraisal may be more sensitive to the process of ratee performance rather than to its outcomes. Customers, in particular, may be much more focused on process rather than outcomes. Peers may also be much more focused on the work process engaged in by a team mate since it is directly observable. Whether the source of appraisal may moderate the influence of outcomes on the accuracy of ethical judgments must await further research.

Another area that might be investigated is the potential for rater training to reduce the influence of outcomes on judgments of the ethical nature of performance. It may be that a process of norm development, instruction, and evaluation practice, similar to the frame-of-reference training (Bernardin and Buckley, 1981), could reduce the influence of outcomes. While it is worthwhile exploring the possibility of reducing the influence of outcomes on ethical judgments it is important to note that there was direct and adequate information provided in the present studies for clear and independent judgments of ethical behavior. Nonetheless, the ethical judgments were biased by the outcomes achieved by the ratees.

Our research design considered performance at two levels (high and low performance). However, in real world situations, a vast percentage of employees tend to have average levels of performance. Thus, future research design needs to consider the biasing influence of outcomes of ratees whose performance ratings are at average levels.

Another practical implication is that biased ethical judgments may result in Assessing ethical

suboptimal payoffs for organizations in the long run. Performance judgments can be

behavior

considered a form of investment in employees, and managers may want to reward those employees who achieve the outcomes they expect by ignoring/discounting ethics. In the short run, such a decision may not have any adverse impact for organizations. However, investment decisions based on subjective biases of decision makers tend to have a greater dissipating effect on the value of the outcomes in the long run (Chi and

Fan, 1997). Thus, organizations, such as those exemplified by Enron, that condone unethical behavior of employees may appear highly successful in the short term but may become bankrupt in the long term. While this study examined outcome biases from a cognitive psychology and performance appraisal perspective, future researchers may also look into literature on decision making for further explanation of ethical biases. Decision making literature (e.g. Mowen and Stone, 1992) has looked at how outcomes influence the quality of the inputs and decision making process.

Results from study 1 indicated that raters make trait inferences from ethical behavior observations. As argued by Krzystofiak et al. (1988), this trait inference phenomenon may be more pronounced in a field setting where raters have an opportunity to construct implicit theories based on various observations, stereotypes and so on. One implication of the present finding is that using a behavior based scale to measure ethical performance may not reduce the influence of personality inferences on ethical judgments.

Limitations Potential limitations of the present research need to be recognized. The studies were of

a laboratory nature and the generalizability of the findings can be an issue. However, as with many laboratory studies, it is the model or theory tested, and not the representativeness of the study characteristics, that should provide meaningfulness and relevance to the findings (Cardy, 1991; Mook, 1983; Sackett and Larson, 1990). The purpose of the study was to test the possibility that outcomes bias ethical judgments. It is this conclusion, and not the characteristics of the study per se, that have important implications for appraisal. Moreover, the use of paper people and artificial stimuli does not necessarily preclude the generalizability of research findings (Berkowitz and Donnerstein, 1982; Cleveland, 1991). Finally, as mentioned previously, the same pattern of findings was obtained across the two studies employing different subject samples. This similarity in results is consistent with the findings that judgment biases generalize from naı¨ve participants to real-world employees (e.g. Shanteau and Stewart, 1992). Nonetheless, replication and extension of present findings in field settings is called for, especially since the influence of political factors (e.g. Longnecker et al., 1987) on the appraisal of ethical behavior can only be studied in a field setting.

Conclusion In this study, we have looked at some of the important issues involved in the ethical performance judgment process. Specifically, it was found that outcomes influence the judgment of ethical performance such that successful people were judged more favorably than were unsuccessful people. These results were not moderated by the strength of individual ethical beliefs. It was also found that raters drew trait inferences even when they were presented with only behavioral and outcome information. These

JMP

trait inferences were found to have independent effects on the accuracy of ethical

judgments. These findings have important applied implications for measuring ethical performance.

References