Data The Program and Data

household in Cambodia, and is almost exactly equal to the direct cost of schooling, including fees, uniforms, supplies, and transportation but excluding the opportunity cost of going to school Ferreira, Filmer, and Schady 2009.

B. Data

We make use of two sources of data in this paper. The fi rst data set includes the compos- ite dropout- risk score, as well as the individual characteristics that make up the score for all 26,537 scholarship applicants. The second data set is based on a household survey we fi elded, and which collected information on 3,020 applicants and their families. The sample for the survey was constructed as follows. First, we purposefully se- lected fi ve provinces in different parts of the country where the program was operat- ing, and where there were a reasonably large number of program schools. In total, there were 57 program schools in these fi ve provinces: Battambang 9, Kampong Thom 14, Kratie 4, Prey Veng 25, and Takeo 5. Second, within each school, the survey sample included all children who had been offered scholarships 30 or 50 children, depending on whether a school had been designated as “small” or “large” and 20 children who had been turned down for scholarships, beginning with the “fi rst” child turned down for a scholarship the child whose score was just below the cutoff for eligibility and up to the 20 th child. 9 Data were collected between February and April of 2010, almost fi ve years after chil- dren fi lled out the application forms. 10 For applicants enrolled in school who had not repeated grades, the household survey therefore refers to school attendance in the second half of grade 11. The median age of children at the time of the household survey was 19. The household survey collected information on a large number of child outcomes. Children were asked about the highest grade of schooling they had completed to date, whether they were enrolled in school in each academic year between 2005 when they completed the applications and 2010 at the time of the survey, and, if so, in what grade. They were also asked whether they were currently working separate questions for work for pay and work without pay, their earnings over the last pay period, and how many hours they worked in the last week. We administered three tests during the household survey. The fi rst is a math test, which included 20 multiple choice items. Areas covered included algebra, geometry, and several questions that required using mathematical tools to answer simulated real world situations—including reading a simple graph or interpreting a bar chart. The test included mathematical concepts that students would have been exposed to over 9. Between October and December 2006, we fi elded a “midline” survey that is the basis of our earlier work on the CSP scholarship program Filmer and Schady 2011. The sample for this midline survey was con- structed in the same way as described above for the “endline” survey we use in the current paper. During the midline, if the survey fi rm could not contact an applicant after several attempts, they were provided a list of “replacements.” These were selected from those with the next lower dropout- risk scores below those already on the list for example, if either a scholarship recipient or one of the 20 “fi rst” children denied scholarships could not be found, the 21 st child denied a scholarship was given as a replacement. The endline survey included these “replacements” in the list of applicants that the fi rm was supposed to contact. 10. Some surveys were conducted as late as July 2010 because tracking these applicants took additional time. Eighty- eight percent of interviews were carried out before the end of April 2010; 98 percent before the end of June 2010. the lower secondary school cycle. 11 The second test was a vocabulary test based on picture recognition. This test asked respondents to identify the picture corresponding to a word which the enumerator read out loud. For each word the respondent was then asked to select from a choice of four pictures. While the initial words are relatively easy to identify “shoulder,” “arrow,” “hut” the test is structured such that items be- come increasingly diffi cult. Later words in the test we administered included, for example, “speed,” “selecting,” “adjustable”. There were in total 72 words that each applicant was asked to identify. The third test is a test of puzzles and shapes loosely based on the Ravens Progressive Matrices. 12 These tests are not linked to the curricu- lum taught in Cambodia’s lower secondary schools. We normalize the scores on all three tests by subtracting the mean and dividing by the standard deviation of nonre- cipients. The Cronbach alpha values for all of our tests are reasonably high—0.68 for the math test, 0.90 for the vocabulary test, and 0.65 for the test of puzzles and shapes. 13 The household survey also collected data on adolescent mental health using an ad- aptation of the Center for Epidemiological Studies Depression scale CESD, a widely used measure of depression Radloff 1977. 14 Subjective social status was assessed using the “MacArthur ladders.” 15 Adolescents were shown a picture of a ladder with ten rungs, and were told that higher rungs correspond to higher socioeconomic status. They were asked to place themselves on the ladder in relation to everyone in their communities, and in relation to everyone in Cambodia. Finally, we asked respondents if they were married or had children.

III. Identifi cation Strategy