Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 682.full

Getting Doctors to Do Their Best
The Roles of Ability and Motivation
in Health Care Quality
Kenneth L. Leonard
Melkiory C. Masatu, MD
Alexandre Vialou
abstract
Adherence to medical protocol (quality) is low in most developing countries.
We show that, although the differences in knowledge of protocol among
doctors in Arusha region of Tanzania are explained by years of training, the
differences in actual adherence to protocol and the gap between knowledge
and actual adherence are best understood by examining the types of organizations in which these doctors work. These results suggest that some
organizations are better at getting doctors to perform at capacity and that
understanding the link between organizational structure and protocol
adherence is important in any attempt to increase the quality of care.

I. Introduction
Poor access to health care is considered one of the major impediments to balanced growth in poor countries (World Health Organization and Sachs
2001)). The argument underlying this assertion is that if poor, underserved populations had access to quality health care they would be healthier (Gertler and Gruber
Kenneth Leonard is a professor of economics at the University of Maryland College Park. Melkiory
Masatu is a professor of economics at the Centre for Educational Development in Health, Arusha

(CEDHA), Tanzania. Alexandre Vialou is a professor of economics at the University of Maryland
College Park. The data used in the article can be obtained beginning January 2008 through December
2010 from Kenneth Leonard, 2200 Symons Hall, College Park, MD 20783, kleonard@arec.umd.edu.
This work was funded by NSF Grant 00-95235 and The World Bank, and was completed with the
assistance of R. Darabe, M. Kyande, S. Masanja, H. M. Mvungi and J. Msolla. The authors are solely
responsible for the data contained herein. We extend our appreciation to the Commission for Science
and Technology (COSTECH) for granting permission to perform this research. The paper has benefited
from the comments of the audiences at the World Bank and NEUDC and the input of Sonia Lazlo, Ester
Duflo and Erik Lichtenberg. The design of the vignettes benefited greatly from extensive discussion with
Jishnu Das
[Submitted October 2005; accepted January 2007]
ISSN 022-166X E-ISSN 1548-8004 Ó 2007 by the Board of Regents of the University of Wisconsin System
T H E JO U R NAL O F H U M A N R ES O U R C ES

d

XLII

d


3

Leonard, Masatu, Vialou
2002), which would allow them to spend more time in income-generating activities
(Abel-Smith and Leiserson 1978; Pitt and Rosenzweig 1986; Schultz and Tansel
1997; Townsend 1994). This, in turn, would lead to such populations pursuing a
wider range of profitable pursuits (Laxminarayan and Moeltner 2003; Philipson
1996), and would allow them to invest more resources in their children (Abel-Smith
and Leiserson 1978) at the same time that their children are better able to learn
(Bhargava et al. 2001; Miguel and Kremer 2004). There has been an unprecedented
response in attention and funding to this logic and to the health targets of the Millennium Development Goals (MDGs). The current focus on improving health care
has been a capacity-based approach and the majority of attention and resources is
focused on training doctors, building health facilities and improving access to existing and new medical equipment and drugs. However, these elements are only necessary, not sufficient to save lives. What is not clear is the extent to which
improvements in ability will lead to improvements in performance.
In this paper, we examine the gap between the ability of doctors (as measured by
knowledge of protocol) and their performance (as measured by actual adherence to
protocol) for a sample of 80 clinicians in Arusha region of Tanzania working in 40
facilities under seven different types of organizations. These two measures of process
quality are gaining increasing attention in the literature on health disparities in developing countries. Das and Hammer (2005) undertake a similar exercise using information on both ability and performance in Delhi, India. That paper examines the
differences across qualification and location as well as whether the doctor works

in the private sector. The contribution of this paper is to examine the determinants
of the differences between ability and performance for both diagnostic and communication quality across a greater spectrum of organizations. Leonard and Masatu
(2005) examine the overall correlation between ability and performance using the
same data as used in this paper and show that, although ability and performance
are correlated, there are significant gaps between what doctors know how to do
and what they do. We examine two aspects of performance or practice quality: adherence to steps required to diagnose the patient’s illness properly (diagnostic quality) and adherence to steps required to communicate the diagnosis and treatment to
the patient properly (communication quality).
We find that the behavior of doctors differs by years of training (cadre) and the
type of organization in which doctors work. For ability, the most important differences between doctors are those of training, not type of organization. However, for both
diagnostic and communication quality, the type of organization is an important factor. Similarly, cadre (training) explains less of the difference between ability and
practice than the type of organization.
One possible source of differences in performance among doctors after controlling
for quality is that doctors have different motivations to exert effort and perform at a
level closer to their ability. This is particularly important because the motivation of
doctors can be affected by the organization for which they work. We use data on the
decentralization of decision-making authority within each organization type in order
to explain some of the variation in performance between organizations. We look at
the location of the authority (national/regional/local) to hire and fire individual doctors, to choose the staff composition, set salaries, and determine fees, and we use
these data to create an index of decentralization. We find that, after controlling for


683

684

The Journal of Human Resources
ability, both diagnostic and communication quality increase with decentralization. In
addition, we examine the single-doctor private practice, which is different from other
decentralized facilities because there is no external authority or stakeholder. We find
that doctors in single-doctor private practices are better at communicating with their
patients and that after controlling for this type of practice, there are no differences in
communication quality by the degree of decentralization. However, doctors in singledoctor practices are not significantly better in diagnostic quality and decentralization
remains an important determinant of this type of practice quality.
Our findings show that some types of organizations can achieve results similar to
those of private practices for diagnostic quality, though not necessarily for communication quality. Many of these same organizations are nongovernmental charities
dedicated to delivering affordable care to the poor. The fact that such organizations
are on par with more expensive private practices is evidence that lives can be saved
through policies to improve management even at highly constrained levels of
abilities and resources. Although decentralization is only one of many possible differences between the organizations in this sample, our findings suggest that the structure of an organization can improve the motivation of doctors and therefore their
performance.


II. Data and Instruments
K. L. Leonard and M. C. Masatu collected data from Tanzania on
doctor ability and performance over a period of two years from October 2001 to
March 2003. Forty health facilities in the rural and urban areas of Arusha region
were visited at least two times each. We used vignettes to measure quality and direct
clinician observation (DCO) to measure performance. Each instrument measures
compliance with Tanzanian protocol, which, in turn, is designed to be sensitive to
the limited resources available in the facilities we survey. Every doctor we visited
was trained in protocol and had the resources at his or her disposal to follow it. Protocol requires history-taking (such as asking the patient the duration of the illness or
whether diarrhea is accompanied by vomiting), physical examination (such as taking
the patient’s temperature or auscultating the chest) and health education and communication (such as telling the patient her diagnosis and whether to return for a followup).1 The full sample includes 100 practitioners; we were able to evaluate
quality using both vignettes and direct clinician observation (DCO) for 80 clinicians.
Tanzanian doctors administered both the vignette and DCO instruments (see Leonard
and Masatu 2005 for further details of the study). Our research focuses on a subset of
outpatient conditions that are both prevalent and potentially serious: illnesses presenting with symptoms of fever, cough, or diarrhea.
The facilities studied in this project represent those available to rural residents
of Monduli and Arumeru regions in Arusha district. In these rural areas, 78 percent
of residents are within five kilometers (kms) of a functioning health facility and
1. All of the required items on both instruments are described in Tables 4, 5, and 6. These appendix tables
are available on the JHR website, www.ssc.wisc.edu/jhr/, associated with the listing for this article.


Leonard, Masatu, Vialou
93 percent are within 10 kms (Klemick et al. 2006).2 By most traditional measures of
access, this area is well served.
The clinicians in our sample include nurses of various specializations, clinical
assistants (assistants), clinical officers (officers), assistant medical officers (AMOs),
and medical officers (MOs). Clinical assistants have an elementary school education
and three years of medical training. Clinical officers traditionally have O level education and two years of medical training. AMOs are clinical officers with two additional years of training. MOs have both an A level education and five years of
university-level medical training. Nurses are not supposed to diagnosis but in the rural areas they are frequently the only health personnel present and they do diagnose
patients in these circumstances.
Most doctors in the sample, as in Tanzania, work in the public service in government run health facilities. In addition, there are five other organizations delivering
care in the area and five single-doctor private practices that collectively represent
a seventh type of organization. Four of the organizations delivering health care in
the area are nongovernmental organizations (NGOs), three of which operate multiple
facilities in this area. The fourth is a local Islamic charity that operates one large facility. There is also one autonomous parastatal hospital included in our survey.
In this paper, we focus on the relationship between ability and practice for those
conditions that are both common and important. Significant morbidity and mortality,
particularly among children, can be explained by illnesses whose symptoms include
a fever, cough or diarrhea. We recognize that the patterns we find in this paper may
not hold for inpatient care or very serious illnesses. However, the gaps between ability and practice found in this paper represent severe shortfalls with deadly consequences.

A. Vignettes
Vignettes are purposefully designed case study patients presented to doctors and
paired with a measurement tool that examines adherence to protocol for that case.
Vignettes are valuable for comparing protocol adherence across providers because
they control for case mix.3 Our vignettes use two researchers: a ‘‘patient’’ and an examiner. The examiner, after introductions, never speaks, he only observes. The ‘‘patient’’ presents herself as a patient would, entering the room from outside and leaving
after the consultation. She describes her symptoms and answers questions as a patient would. It is explained to the doctor that he must do physical examination by
posing questions. The patient then answers the question verbally. For instance, if
the doctor says, ‘‘I would take the patient’s temperature,’’ the ‘‘patient’’ would say,
‘‘the temperature is 38.5.’’ The examiner then fills a protocol checklist including
history-taking questions and physical examination items. Each doctor was examined
2. This figure includes only facilities with present clinicians and adequate drug supplies, and has therefore
already taken into account problems such as absenteeism (see Banerjee et al. 2004a,b; Chaudhury and
Hammer 2004).
3. There is an extensive and growing literature on the use of vignettes (Das and Hammer 2005; Leonard
and Masatu 2005; McLeod et al., 1997; Murata et al., 1992, 1994; O’Flaherty et al. 2002; Peabody et al.
2003 2002; Peabody, Luck, Glassman, Jain, Hansen, Spell and Lee 2004; Peabody, Tozija, Munoz,
Nordyke, and Luck 2004; Peabody et al., 1994 2000).

685


686

The Journal of Human Resources
for three case studies (malaria, pneumonia and diarrhea). Because the format of
vignettes is different from the format of regular patients and more closely resembles
the use of case studies for training, doctors view this exercise as a test and it is a useful measure of ability.
B. Direct Clinician Observation (DCO)
With DCO, a doctor who is a member of the research team sits in on the examined
doctor’s consultations. For each consultation, the observer fills a protocol checklist
designed to match patients presenting with fever, cough, or diarrhea. For other conditions, there is a more general history-taking protocol, one physical examination
protocol item, and the full set of communication protocol items. Although most clinicians react to the arrival of the research team by improving their adherence to protocol, this Hawthorne effect is short-lived and doctors rapidly return to their regular
behavior (see Leonard and Masatu 2006, for discussion of the Hawthorne effect and
evidence that it is short-lived). The quick return to regular behavior allows us to use
DCO as a measure of practice quality.
C. Creating aggregate scores for the DCO and vignette
Protocol requires that doctors implement about 10 to 15 unique procedures for each
consultation. We examined each doctor for three vignette case studies, each of which
required about 10 protocol items. Therefore, aggregating these three case studies,
there are 32 required protocol items, which each doctor implemented either correctly
or incorrectly. Thus, aj 2 ð0; 1Þ : j 2 f1; 2; .; 32g. Over three possible illness categories on the DCO instrument, doctors face 41 unique diagnostic protocol items and

five unique communication protocol items dj 2 ð0; 1Þ : j 2 f1; 2; .; 41g and
cj 2 ð0; 1Þ : j 2 f1; 2; .; 5g. The average doctor faced 152 diagnostic protocol items
and 68 communication protocol items over 18 real consultations. From the ability,
diagnostic quality, and communication quality adherence data (a, d, c), we created three
doctor-quality scores (Ai,Di,Ci) using the survey information about the rate of adherence as well as the pattern of adherence across different items.
To generate an aggregate score for each doctor for ability (Ai), we apply a methodology widely used in education research to construct and analyze tests (Birnbaum
1967; Bock and Leiberman 1970; Das and Hammer 2005). We assume that the probability that doctor i, with ability Ai will correctly implement item j (aij ¼1) is a function of (i) Ai, ii, an item-specific discrimination factor (aj ), and iii, an item-specific
difficulty factor (bj ). Discrimination is the degree to which better doctors are more
likely to implement an item properly and difficulty is the degree to which all doctors
are likely to implement an item. We estimate aj, bj, and Ai in a maximum likelihood
logit model where the probability that aij ¼1 is modeled as follows:
ð1Þ

probðaij ¼ 1Þ ¼

expðâj Âi 2b̂j Þ
:
1 + expðâj Âi 2b̂j Þ

The probability of getting any given item correct decreases with its difficulty, and

increases with its discrimination if the doctor has high ability. Ai is fixed for each

Leonard, Masatu, Vialou
doctor across all items from the vignette and aj and bj are fixed for each item on the
vignette across all clinicians. Di and Ci are estimated in a similar manner across all
diagnostic quality items in the DCO.4 These scores are referred to as latent scores
because they are derived by assuming there are three latent characteristic for each
doctor that lead to the adherence patterns we observe in the data. Each latent score
is derived in independent estimations, the details of which are outlined in Appendix
1. a and b, average adherence, and the description of each item are shown in Tables
4, 5, and 6 in Appendix 1.
The role of the difficulty and discrimination parameters is illustrated in the following examples. For the fever vignette, 88 percent of doctors checked the patient’s temperature whereas only 6 percent asked if the patient had experienced any
convulsions. For the diarrhea vignette 8 percent asked if the patient had experienced
projectile vomiting. The latter two items are more difficult than the first and B̂j (difficulty) for these items is significantly higher than for the average item and B̂j for
checking a patient’s temperature is significantly lower. In addition, whether a doctor
asks about convulsions is highly correlated with the overall rate of protocol adherence, whereas neither checking the patient’s temperature nor asking about projectile
vomiting is well correlated with overall adherence. The coefficient for âj (discrimination) is large and significant for the convulsion question, small and insignificant for
taking the patient’s temperature and negative (though insignificant) for the projectile
vomiting question. Rather than questioning the value of protocol, this estimation procedure suggests that we can improve our knowledge of the latent characteristics of
doctors by examining the pattern as well as the rate of adherence. Estimating ability,

diagnostic quality, and communication quality using item-specific weights allows us
to incorporate more survey information into our analysis.

III. Analysis
Table 1 reports the distribution of ability, diagnostic quality, communication quality and the gap between diagnostic quality and ability and the gap between communication quality and ability by training and type of organization. The
gap is the residual from regressing both diagnostic quality and communication quality on ability and a positive residual corresponds to a smaller gap.5 Each column of
Table 1 contains the results of two separate regressions on cadre (years of training)
and organization. There are 11 different organizations delivering care, but five of
these are single-doctor private practices that we group into one type of organization.
Column 1 of Table 1 examines the patterns of ability by cadre and ownership separately. Analysis of variance with both categories suggests that cadre accounts for 61
percent of the explained variance and the organization accounts for 20 percent. The
4. Because there are only five communication quality items we cannot reliably estimate item-specific discrimination parameters (aj ) and we hold this variable fixed across all five items when solving for Ci.
5. In item-to-item comparison of vignettes and DCO, Leonard and Masatu (2005) find that the average clinician adheres to about 50 percent of protocol items on the vignette and 30 percent in practice. Thus, in
Columns 4 and 5 of Table 1, a zero residual corresponds, loosely, to a gap of between 10 and 20 percentage
points of protocol adherence.

687

688

Number of
clinicians
Cadre
Medical officer
Assistant medical officer

3
7

Clinical officer

42

Clinical assistant

16

Nurse

12

R-squared
Owner
1 (Govt)

52

2 (NGO)

9

3 (NGO)

6

Ability Practice Gap in
Diagnostic
Communication

Diagnostic

Communication

Ability

Quality

Quality

Quality

Quality

1.495
[0.750]*
0.09
[0.307]
0.364
[0.141]*
20.482
[0.266]*
-1.119
[0.268]*
0.36

0.871
[0.341]*
0.303
[0.086]*
0.095
[0.135]
20.259
[0.251]
20.755
[0.267]*
0.16

1.536
[0.632]*
0.136
[0.291]
20.029
[0.208]
20.088
[0.261]
20.48
[0.144]*
0.12

0.514
[0.264]*
0.322
[0.138]*
0.042
[0.129]
20.087
[0.262]
20.412
[0.304]
0.05

0.944
[0.320]*
0.128
[0.257]
20.151
[0.217]
0.141
[0.184]
0.014
[0.191]
0.06

20.079
[0.152]
20.739
[0.605]
20.175
[0.479]

20.157
[0.125]
20.314
[0.362]
0.316
[0.197]

20.168
[0.159]
0.14
[0.197]
20.637
[0.214]*

20.093
[0.121]
20.072
[0.370]
0.407
[0.235]*

20.106
[0.156]
0.476
[0.265]*
20.536
[0.069]*

The Journal of Human Resources

Table 1
Patterns of Ability, Diagnostic Quality and Communication Quality

4 (NGO)

3

5 (NGO)

3

6 (Parast.)

2

7 (Priv)a

5

R-squared

0.907
[0.157]*
1.636
[0.590]*
0.208
[0.576]
0.734
[0.148]*
0.17

1.802
[1.381]
1.023
[0.215]*
0.261
[0.463]
0.392
[0.278]
0.11

1.115
[0.461]*
1.208
[0.866]
0.422
[0.459]
0.983
[0.275]*
0.18

1.603
[1.398]
0.629
[0.059]*
0.249
[0.618]
0.239
[0.280]
0.06

0.767
[0.398]*
0.557
[0.621]
0.384
[0.220]*
0.707
[0.228]*
0.11

There are 80 observations for each regression. Regressions are weighted by the number of observations in observed with the DCO instrument. Robust standard errors in
brackets. Each column represents two different regressions where the top controls for cadre and the bottom controls for organization. Ability is the latent ability score
derived from the vignette. Diagnosis quality is the latent diagnostic-quality score derived from the DCO instrument. Communication quality is the latent communication
quality score derived from the DCO instrument. The ability practice gap is the residual from a regression of quality on ability and a positive residual represents smaller
gap.
*indicates significant at 10 percent.
a. Single-doctor private practices.

Leonard, Masatu, Vialou
689

690

The Journal of Human Resources
regression of ability on cadre shows that training leads to greater ability at all levels
of training except for AMOs who appear slightly less qualified than clinician officers
(though the difference is not significant). Doctors who work for Organizations 4 and
5 (both NGOs) and in single-doctor private practice have higher ability than other
doctors have.
The second column investigates the distribution of diagnostic quality by cadre and
organization. ANOVA with both cadre and organization suggest that cadre accounts
for 41 percent of the explainable variance and organization accounts for 31 percent.
Organizations play a more prominent role in practice quality than in ability. Column
2 of Table 1 shows that higher cadres exhibit better practice quality, but the differences among cadres are smaller than they are for ability. The differences among
organizations are large, but only one coefficient is significant, suggesting withinorganization diagnostic quality is also variable.
The third column investigates the distribution of communication quality by cadre
and organization separately. Cadre explains 33 percent of the explainable variance
in communication quality and type of organization explains 56 percent. The regression of cadre on communication quality shows that MOs are better and nurses are
worse, but that the other cadres are not significantly different from each other.
One of the NGOs has worse communication quality than either the government or
the overall average, and private facilities and one NGO have higher communication
quality.
Column 4 examines the gap between diagnostic quality and ability, where the gap
is equal to the residual of a regression of quality on ability. Thirty-three percent of
variance in the gap is explained by differences in cadre and 50 percent of the variance is explained by differences in organizations. Cadre is very important in explaining ability and more important than organization in explaining diagnostic quality.
However, the type of organization is more important than cadre in explaining the
gap between practice and ability. Both medical officers and assistant medical officers
perform at levels that are closer to their ability. Almost all of the private and NGO
organizations are better at closing this gap, though only two NGOs are significantly
better. Note that doctors working for organization number three have lower ability
and diagnostic quality, but the gap is smaller than it is other organizations. Doctors
working in single-doctor practices are not significantly better than other doctors are
at closing the gap between ability and practice.
Column 5 examines the gap between communication quality and ability, where the
gap is equal to the residual of a regression of communication quality on ability. Thirtyeight percent of the variance is explained by cadre and 66 percent is explained by organization. Among the various cadres, only medical officers have a significantly smaller
gap. Nurses are better than clinical officers by this measure, though the coefficient is not
significant. Two of the NGO facilities are better by this measure and both types of
private facilities are better. One of the NGO organizations (Number 3) is worse.
The patterns in the communication quality gap are not the same as the patterns in the
diagnostic-quality gap. Both cadre and organization play important roles in explaining whether doctors practice at levels close to their ability for both diagnostic and communication quality. However, whereas medical officers are better by both measures,
some organizations are better at ensuring diagnostic quality and others at ensuring communication quality. The third NGO organization is significantly better at

Leonard, Masatu, Vialou
closing the gap in diagnostic and significantly worse at closing the gap in communication quality.
There is an emerging literature suggesting that NGOs provide services that are
superior to the public sector services (Mliga 2000; Reinikka and Svenson 2004). This
literature has tended to view NGO as a discrete label. The results above, however,
suggest that the differences among NGOs may be at least as important as the difference between NGOs and the public sector. In addition, it is not clear why NGOs are
superior to the public sector. Arrow (1963) and Newhouse (1970) suggest that the
motivations of not-for-profits in the health sector may have important implications
for the services they deliver. However, in Tanzania, as in most developing countries,
the public sector and the not-for-profit sector have very similar objectives. That they
share objectives is less important than understanding how they motivate doctors who
do not necessarily share these objectives. To advance our understanding of the differences among organizations delivering health care we examine data on the organizational structure, specifically decentralization.

IV. Decentralization of authority and stakeholders
Doctors are motivated to deliver health care by multiple sources and
regulation and supervision are seen traditionally as one important source of this
motivation. A third party we will refer to as the stakeholder implicitly supervises
most doctors in this sample. The stakeholder can use supervision and authority to
influence the quality of care delivered. In public facilities, the stakeholder is the government and in other organizations, it is the institution that owns the facility. Supervision is a local phenomenon: The chief of post supervises doctors in his facility, is
supervised by his immediate superior, who is supervised by his superior, etc. Importantly, the technology of supervision does not differ between organizations in
this region, though implementation does differ significantly. We model the differences in implementation of supervision as a function of the differences in the decentralization of authority. Supervision is more effective when the location of authority
(local versus distant) is closest to the location of supervision, that is, when the
person with the best information is also the person with the power to act on the information.
As a summary measure of the role of stakeholders in quality assurance, we use the
location of the authority to make key management decisions. Mliga (2000) describes
the location of decision-making authority in all of the organizations examined in this
study. The variables used include: a dummy variable indicating whether the chief of
post can hire and fire personnel; the level at which salaries are set (national/regional/
local); the degree to which the chief of post can (or must) use local funds to pay salaries and buy medicines; and the level at which choices about staffing are made
(national/regional/local). We use a single measure of decentralization of authority
derived from these four variables using factor analysis as put forth in Appendix 2.
Single-doctor private practices do not have organizational stakeholders. In one
sense, this is an extreme form of decentralization. However, the presence of an
external stakeholder may have an important discrete effect on the relationship between doctors and their patients. Having two clients (patients and stakeholders) is

691

692

The Journal of Human Resources
categorically different from having only one client (only patients).6 For this reason,
we also analyze the performance of single-doctor practices.
A. Determinants of practice quality
Table 2 regresses diagnostic quality and communication quality on ability, the degree
of decentralization, and whether the practice is a single-doctor private practice (‘‘singledoc practice’’). For this analysis, we introduce a second set of aggregate quality
scores, the raw percentage adherence without item-specific weights. In addition, as
a control, we show the regressions both with and without cadre. Since the cadre
of doctor is endogenous for the single-doctor practice, and there is likely to be some
tie between higher cadres and the location of authority in other organizations, it is not
clear that cadre should be included as a control. However, the basic interpretation of
the results for decentralization remains the same. The inclusion of dummy variables
for cadre (nurse is the omitted category) has a significant effect on the coefficients for
ability because ability is closely related to cadre.
1. Diagnostic Quality
Latent diagnostic quality is partly determined by ability, whether measured by performance on the vignette (as in Columns 1 and 5) or by cadre (as in Columns 2 and
6). In Columns 2 and 6, the role of cadre in latent diagnostic quality is increasing
with years of training. Percent adherence to diagnostic quality, on the other hand,
is not a function of ability, whether ability included cadre variables (as in Columns
3 and 7) or not (as in Columns 4 and 8). Though the dummy variables for cadre are
significant in Columns 4 and 8, the relationship is not increasing with years of training, suggesting that these variables are picking up something other than ability.
For diagnostic quality, the role of incentives remains qualitatively the same
whether we use latent or percentage quality scores and whether one controls for
cadre. In every column (except Column 6, where the result is not quite significant),
diagnostic quality is higher in more decentralized facilities. The dummy variable representing single-doctor facilities is not significant and, if anything, its inclusion
increases the importance of decentralization.
2. Communication Quality
Ability as measured by the vignette, whether by latent score or percentage adherence,
is always a significant determinant of communication quality. In addition, none of the
cadres is better or worse than any other for communication quality. Increasing decentralization is consistently associated with higher communication quality, but this
6. Leonard (2002) suggests that one of the most important differences between traditional NGOs and private, single-doctor practices is that the NGO organization can play the role of a third party, allowing for
different and potentially welfare improving contracts to be implemented. In a typical production-in-teams
setting with only two actors, the surplus must be divided between actors, reducing the incentives of either
party to exert unverifiable effort. By introducing a third party who can break the balance in payments
between the two parties, contracts can be written that give both parties full incentives to exert effort
(Hölmstrom, 1982). Practically speaking, organizations can ensure that some part of a practitioner’s compensation is independent of the fees paid by patients.

Table 2
Determinants of Practice Quality
Latent quality Score
Diagnostic quality
Ability
Decentralization

0.236
[0.102]*
0.457
[0.207]*

0.104
[0.112]
0.394
[0.225]*

Percentage of items correct

0.089
[0.061]
0.633
[0.183]*

-0.005
[0.091]
0.639
[0.208]*

Single-docprac.

AMO
Officer
Assist
R-squared

0.1

1.037
[0.511]*
0.935
[0.322]*
0.664
[0.334]*
0.463
[0.396]
0.15

0.13

0.462
[0.415]
0.673
[0.244]*
0.462
[0.233]*
0.335
[0.248]
0.17

0.239
[0.104]*
0.498
[0.238]*
-0.122
[0.341]

0.1

0.107
[0.118]
0.411
[0.287]
-0.04
[0.401]
1.018
[0.566]*
0.936
[0.324]*
0.663
[0.338]*
0.465
[0.399]
0.16

Percentage of items correct

0.089
[0.061]
0.555
[0.229]*
0.225
[0.207]

0.14

-0.012
[0.095]
0.555
[0.300]*
0.194
[0.286]
0.55
[0.486]
0.665
[0.247]*
0.462
[0.237]*
0.321
[0.254]
0.18

Leonard, Masatu, Vialou

MO

Latent quality Score

693

694

MO
AMO
Officer
Assist
R-squared

0.22

0.408
[0.175]*
0.434
[0.303]

0.604
[0.710]
0.127
[0.360]
-0.188
[0.461]
0.169
[0.250]
0.26

0.484
[0.093]*
0.456
[0.274]

0.31

0.561
[0.115]*
0.485
[0.308]

-0.287
[0.496]
-0.388
[0.348]
-0.405
[0.292]
-0.083
[0.260]
0.33

0.368
[0.116]*
0.311
[0.326]
0.544
[0.371]

0.23

0.364
[0.183]*
0.109
[0.352]
0.764
[0.331]*
0.978
[0.805]
0.104
[0.363]
-0.16
[0.464]
0.121
[0.256]
0.28

0.483
[0.102]*
0.202
[0.309]
0.733
[0.380]*

0.33

0.533
[0.117]*
0.115
[0.347]
0.861
[0.373]*
0.105
[0.540]
-0.423
[0.354]
-0.405
[0.304]
-0.143
[0.270]
0.36

Eighty observations for each regression, constant included, but not reported. Each regression weighted by the number of observations in DCO. Robust standard errors in
brackets. * indicates significant at 10 percent. Diagnostic quality and communication quality as measured by latent quality scores and percentage of items correct are
regressed on ability and two possible determinants of motivation. ‘‘Decentralization’’ is an index of the location of authority varying from 0 (centralized authority) to
1 (local authority). ‘‘single-doc prac.’’ indicates a private single-doctor practice in which there is no third party capable of supervision.

The Journal of Human Resources

Communication quality
Ability
0.382
[0.108]*
Decentralization
0.493
[0.268]*
Single-doc prac.

Leonard, Masatu, Vialou
Table 3
Regression of Overall Incentive Scores on Individual Incentive Contributions
Variable
The ability to hire and fire personnel (yes/no)
The level at which salary decisions are made
The level at which financial decision are made
The level at which staffing decisions are made
Constant

Coefficient

Standard Error

1.245
0.175
0.119
0.121
22.017

(0.030)
(0.007)
(0.019)
(0.010)
(0.017)

coefficient is only significant in the first column, where latent communication quality
is regressed on latent ability and decentralization without controls for cadre. If the
dummy variable for single-doctor practices is included, the size of the coefficient
for decentralization is markedly smaller and the variable indicating single-doctor
practices is significant for three of the four specifications in which it is included.
These results suggest that the single versus multiple doctor organizations, more than
the degree of decentralization, are what matters to communication quality.
3. Latent aggregate scores and raw percentage scores
The difference in the results for latent and percentage scores in the relationship between diagnostic quality and ability highlights the importance of the methodology
we used to create latent scores. Weighting items on the vignette and DCO by their
empirical importance does not significantly change the results we have obtained
on the role of organizations and decentralization, but it does change the relationship
between performance and quality. The weighting process reduces the noise inherent
in creating any aggregate score and increases the efficiency of the estimates.
B. Graphical Analysis
Figure 1 presents a similar analysis to Table 2 in graphical form. The two graphs
show the relationship of latent diagnostic and communication quality to latent ability.
Although the decentralization score analyzed in Table 2 takes on multiple values between 0 and 1, for Figure 1 it is collapsed into only two categories: ‘‘distant authority’’ (decentralization ¼ 0) and ‘‘local authority’’ (decentralization > 0). The graphs
show doctors in single-doctor private practices as a separate category.
The locations of doctors by ability are the same for each graph in Figure 1. Doctors in single-doctor practices are much better than doctors under local authority
who, in turn, are only slightly better than doctors under distant authority. For diagnostic quality, doctors under local authority are better than doctors under distant authority, but doctors in single-doctor practice are no better than other doctors under
local authority. For communication quality, doctors in single-doctor practices are significantly better than the other two categories of doctors, and doctors under local authority are slightly better than doctors under distant authority.

695

696
The Journal of Human Resources

Figure 1
Relationship between Location of Authority, Single-Doctor Private Practices, Ability, and Practice Quality
Diagnostic protocol compliance is the latent score for compliance with items necessary to diagnose a patient properly. Communication protocol compliance is the latent score for compliance with items necessary to communicate the diagnosis and treatment to
the patient. Ability is the latent score for knowledge of protocol. Distant authority is all public sector facilities. Local authority is all
nonpublic sector facilities with multidoctor practices. Single-doc practice is single-doctor private practices. Fitted values are derived from regression weighted by the number of observations in DCO. Two outlying values are omitted from the first graph. One
single-doctor practice with ability of 0.14 and diagnostic quality of -8.4 and nonpublic multidoctor practice with ability 0.42 and
diagnostic quality of 11.04.

Leonard, Masatu, Vialou
The regression lines for the points shown on these graphs (weighted by number of
observations on the DCO) show that local authority is better than distant authority
over the range of ability, and that ability appears to be a more important determinant
of practice quality (both diagnostic and communication) for doctors under local
authority than for doctors under distant authority. For practice quality, the regression
line for single-doctor practices suggests a completely different relationship between
ability and practice. This regression line is based on only five points, and the coefficient is not significantly different from zero, but it does suggest that ability
may be more important for single-doctor practices. On the other hand, for communication quality, single-doctor practices are very similar to doctors who work under
local authority, but the higher levels of ability translate into higher levels of communication quality.

V. Conclusion
We have used two different instruments to gather information about
the practices of 80 doctors working in seven different types of organizations. By
measuring both ability and practice quality we show there is a significant gap between what doctors know how to do and what they choose to do. This gap varies
among the types of organizations. Although years of training is the most important
determinant of ability, type of organization is more important than cadre at describing the gap between ability and practice. By examining the gap as a function of decentralization and the type of practices, we show that doctors who work for
organizations with decentralized authority practice systematically superior care (by
both diagnostic and communication quality) even after controlling for ability. In addition, facilities without a stakeholder (single-doctor private practices) may provide
superior communication quality when compared to doctors who work in other decentralized facilities. The absence of a stakeholder does not affect diagnostic quality,
however, after controlling for decentralization.
We find that the existence of stakeholders combined with decentralized authority
achieves the same result as the private market: higher quality. This is despite the fact that
NGOs typically employ doctors with lower qualifications, locate in less desirable areas,
and charge significantly lower fees. On the other hand, this system of authority and supervision does not improve communication quality to the same degree. In the case of
communication quality, doctors who survive on fees from patients are responsive to
patients in the element of their practice that is the most visible to patients, whereas doctors
who are at least partly compensated by stakeholders provide fewer of these inputs, perhaps because they are less visible to stakeholders than they are to patients.
These results are obtained using only one dimension of differences among a small
group of organizations and therefore one should be careful not to assume that there is
a causal relationship between decentralization and doctors who work closer to their
ability. Organizations that perform well are likely to be very careful about the kinds
and qualifications of the doctors they hire and the staffing mix in their facilities
among other things. However, this work does highlight the fact that there are gaps
in performance that cannot be closed by increased training. Both medical training

697

698

The Journal of Human Resources
and the structure of organizational management are important in delivering good
health care and the effect of the latter is strikingly large.

Appendix 1
Latent Ability and Practice Scores
For ability, diagnostic quality, and communication quality we use the following rule
to derive u:
expðâj ûi + b̂j Þ

ð2Þ

probðxij ¼ 1Þ ¼

ð3Þ

lij ¼ logðPj ðûi ÞÞxij + logð12Pj ðûi ÞÞ12xij

1 + expðâj ûi + b̂j Þ

lij is the contribution to the likelihood function of each item in the data.
The latent score for vignettes is based on 31 possible inputs over the three
vignettes used. A few clinicians were observed more than once and we use their
responses on both sets of vignettes but solve for only one latent score. The latent
score for diagnostic quality is based on 37 possible items over three presenting conditions. Clinicians were observed many times each and we solve for only one score
per clinician. In addition to the rule above, we control for patient characteristics and
allow for a linear fall in practice quality over order of consultation that can vary between history-taking and physical examination items. The communication quality
score is based on five items and we do not estimate a or b for this score, using only
the average percentage for each clinician allowing for a linear slope with order of
consultation.
Estimates of a and b are shown in Tables 4, 5, and 6. These appendix tables are
available on the JHR website, www.ssc.wisc.edu/jhr/, associated with the listing for
this article.

Appendix 2
Factoral Analysis of Incentives
To create an incentive score we used dummy variables representing all possible levels at which control is exercised over salary decisions, financial decisions and staffing
decisions and performed a factor analysis. Values for the level of authority are derived from Mliga (2000). There are six significant factors and the first factor has
an eigenvalue that is twice the size of the second and is straightforward to interpret
as the intensity of incentives. We create our incentive score from this first factor. Table 3 demonstrates the relationship between each element of incentives and our overall incentive score. Each element is positive and significant. The greatest weight is
put on the ability to hire and fire and the three other factors have approximately similar weights.

Leonard, Masatu, Vialou

References
Abel-Smith, Brian, and Arcila Leiserson.1978. Poverty, Development, and Health Policy.
Geneva: World Health Organization.
Arrow, Kenneth. 1963. ‘‘Uncertainty and the Welfare Economics of Medical Care.’’ American
Economic Review 53(5):941–73.
Banerjee, Abhijit, Angus Deaton, and Esther Duflo. 2004a. ‘‘Health Care Delivery in Rural
Rajasthan.’’ Economic and Political Weekly 39(9):944–50.
———. 2004b. ‘‘Wealth, Health and Health Services in Rural Rajasthan.’’ American
Economic Review 94(2):326–30.
Bennett, Sara, Barbara McPake, and Anne Mills, eds. 1997. Private Health Providers in
Developing Countries: Serving the Public Interest? London and New Jersey: Zed Books.
Bhargava, Alok, Dean Jamison, Lawrence Lau, and Christopher Murray. 2001 ‘‘Modeling the
Effects of Health on Economic Growth.’’ Journal of Health Economics 20:423–40.
Birnbaum, Allan. 1968. ‘‘Some Latent Trait Models and their Use in Inferring an Examinee’s
Ability.’’ In Statistical Theories of Mental Test Score, ed. Frederic M. Lord and Melvin R.
Novick, 397–479. London: Addison-Wesley.
Bock, Richard, and Maury Leiberman. 1970. ‘‘Fitting a Response Model for Dichotomously
Scored Items.’’ Psychometrika 33:179–97.
Chaudhury, Nazmul, and Jeffrey Hammer. 2004. ‘‘Ghost Doctors: Absenteeism in
Bangladeshi Health Facilities.’’ World Bank Economic Review 18(3):423–41.
Das, Jishnu, and Jeffrey Hammer. 2005. ‘‘Which Doctor? Combining Vignettes and
Item-Response to Measure Doctor Quality.’’ Journal of Development Economics
78:348–83.
——— 2005 ‘‘Money for Nothing: The Dire Straits of Medical Practice in Delhi, India’’
World Bank Policy Research Working Paper No. 3669
Gertler, Paul, and Jonathan Gruber. 2002. ‘‘Insuring Consumption Against Illness.’’ American
Economic Review 92(1):51–76.
Hölmstrom, Bengt. 1982. ‘‘Moral Hazard in Teams.’’ Bell Journal of Economics 13:324–40.
Klemick, Heather, Kenneth Leonard, and Melkiory Masatu. 2006. ‘‘Present Doctors,
Vanishing Care: Access to Quality Health Care in Rural Tanzania.’’ University of
Maryland. Unpublished.
Laxminarayan, Ramanan, and Klaus Moeltner. 2003. ‘‘Malaria, Adaptation and Crop
Choice.’’ Washington, D.C: Resource for the Future, Presentation at NEUDC.
Leonard, Kenneth. 2002. ‘‘When States and Markets Fail: Asymmetric Information and the
Role of NGOs in African Health Care.’’ International Review of Law and Economics
22(1):61–80.
Leonard, Kenneth, and Melkiory Masatu. 2005. ‘‘Comparing Vignettes and Direct Clinician
Observation in a Developing Country Context.’’ Social Science and Medicine 61(9):
1944–51.
———. 2006. ‘‘Outpatient Process Quality Evaluation and the Hawthorne effect.’’ Social
Science and Medicine 63(9):2330–40.
McLeod, Peter et al. 1997. ‘‘Use of Standardized Patients to Assess Between-Physician
Variations in Resource Utilization.’’ Journal of the American Medical Association
278:1164–68.
Miguel, Edward, and Michael Kremer. 2004. ‘‘Worms: Identifying Impacts on Education and
Health in the Presence of Treatment Externalities.’’ Econometrica 72(1):159–217.
Mliga, Gilbert. 2000. ‘‘Decentralization and the Quality of Health Care.’’ In Africa’s
Changing Markets for Human and Animal Health Services, ed. David K. Leonard, Chapter
8. London: Macmillan. also available at http://repositories.cdlib.org/uciaspubs/
editedvolumes/5/, chapter 8.

699

700

The Journal of Human Resources
Murata, Paul et al. 1992. Prenatal Care: A Literature Review and Quality Assessment
Criteria. Santa Monica, Calif.: Rand Publications.
Murata, Paul et al. 1994. ‘‘Quality Measures for Prenatal Care.’’ Archives of Family Medicine
3(1):41–49.
Newhouse, Joseph. 1970. ‘‘Toward a Theory of Nonprofit Institutions.’’ American Economic
Review 60:64–74.
O’Flaherty, Martin et al. 2002. ‘‘Low Agreement for Assessing the Risk of Postoperative
Deep Venous Thrombosis When Deciding Prophylaxis Strategies: a Study Using Clinical
Vignettes.’’ BMC Health Services Research 2(16):1–3.
Peabody, John W., Jeff Luck et al. 2003. ‘‘Vignettes: An Innovative Technology for
Measuring Quality in Diverse Populations.’’ Journal of General Internal Medicine
18:292.
Peabody, John W., Jeff Luck et al. 2002. ‘‘Measuring What We Want to Measure: Using
Vignettes in Clinical Education.’’ Journal of General Internal Medicine 17:232–32.
Peabody, John W., Jeff Luck, Peter Glassman, Sharad Jain, Joyce Hansen, Maureen Spell, and
Martin Lee. 2004. ‘‘Measuring the Quality of Physician Practice by Using Clinical
Vignettes: A Prospective Validation Study.’’ Annals of Internal Medicine 141:771–80.
Peabody, John W., Fimka Tozija, Jorge A. Munoz, Robert J. Nordyke, and Jeff Luck. 2004.
‘‘Using Vignettes to Compare the Quality of Clinical Care Variation in Economically
Divergent Countries.’’ Health Services Research 39:1951–70.
Peabody, John W., O. Rahman, K. Fox K, and P. Gertler. 1994, ‘‘Quality of Care in Public a