Field Experiments

## Discussion

### Collective Action

According to de Rooij, Green, and Gerber 2009, laboratory experiments regarding collective action indicate that individuals cooperate at far higher rates that theories would predict. However, questions about external validity makes it difficult to generalize on the basis of these lab experiments. As such, collective action is a field that is ripe for the intervention of field experiments, which have many of the merits of lab experiments, with the added bonus of greater external validity. de Rooij et al note that while field experiments are not a new idea, it is only since the late 1990s that their use has become widespread. They note that one consistent finding of field experiments is that the content of political messages doesn't seem to be very important - see for example Gerber and Green 2000, Michelson 2003, and Panagopoulos 2009. On the other hand, field experiments do seem to suggest that the way in which the message is conveyed has some significance - see for example Addonizio, Gerber, and Glaser 2007 or Nickerson 2008. Below, you can find a more in depth description of several of these experiments.

Gerber and Green 1999 - Does Canvassing Increase Voter Turnout? A Field Experiment
Shortly before the 1998 election, Gerber and Green obtained a list of all registered voters in New Haven. After cleaning the list of all PO boxes and university addresses, a random sample was divided into a treatment group and a control group. The treatment group had 4,509 members, and the control group 23,921. During each Saturday and Sunday for 4 weeks preceding the election, canvassers (graduate students in pairs paid \$20/hour) were sent out to canvas the treatment group. The canvassers were able to contact 1,605 members of the treatment group. After the election, voter records were obtained for all members of the sample, in order to determine whether or not they had voted. Turnout rates were as follows:
Control: 44.63%
Treatment (all): 46.88%
Treatment (contacted): 59.19%
It is tempting to assume that the difference between the turnout rate for those voters contacted by the canvassers and the turnout rate for the control constitutes the treatment effect. However, Gerber and Green argue that this would ignore the potential bias resulting from the possibility that those voters who are easier to contact are also more likely to vote. As such, they argue that the appropriate way to calculate the treatment effect is to use the following equation:

(1)
\begin{align} \cfrac{V_{E}-V_{C}}{\cfrac{N_{1}}{N_{E}}} \end{align}

where VE is the percentage turnout among the experimental group, VC is the turnout among the control group, N1 is the number of voters actually contacted, and NE is the number of voters in the experimental group. To put it another way, they suggest that in order to arrive at the treatment effect, we should divide the turnout differential between the treatment group and the control group by the contact rate within the treatment group. In this case of this experiment, that would yield the following result:

(2)
\begin{align} \cfrac{46.88-44.63}{\cfrac{1,605}{4,509}}=6.33% \end{align}

for a treatment effect of a 6.33% increase in voter turnout. Gerber and Green also divide the sample into parties, finding a 5.88% treatment effect among registered Democrats, a 4.40% treatment effect among registered Republicans, and an 8.83% treatment effect among unaffiliated voters. Finally, they also divided the treatment group into two groups, one of which received a supplementary prompt asking the voter to commit to vote on election day. The treatment effect for those who did not receive the supplementary prompt was 4.61%, while for those who did receive the prompt it was 7.45%.

### Field Experiments and Surveys

Surveys and field experiments have an extensive history together. For example, Brannon, et al 1973 use a survey in combination with a field experiment in order to demonstrate a relationship between responses to a survey regarding (racially) open housing and willingness to sign a petition advocating open housing. However, in this iteration, as in others, the survey is used not to measure the effects of a given treatment, but rather as one aspect of the variables to be measured.

In a similar vein, Aquilino 1994 uses the mode of the survey as the variable in his field experiment. Again, there is no clear treatment in this "experiment;" instead, it is a comparison of two methods of survey data collection with an eye toward the mode effects associated with each.

#### Wantchekon's Study of Benin (2003)

We can see a different model for the use of surveys in field experiments in Wantchekon 2003. In his astonishing field experiment on the effect of clientelist versus public policy based platforms on voting behavior in Benin, Wantchekon used a survey as the primary means by which the treatment effect was estimated. The structure of Wantchekon's experiment is quite complicated, but here are the basics:

Electoral politics in Benin are based upon electoral districts, of which there are 84. Of these 84 electoral districts, only 5 or 6 are really competetive; the remainder are "safe" districts for one party or another. For the purposes of this experiment, 8 districts were selected: 4 incumbent dominated and 4 opposition dominated. Each of these districts was then divided into 3 subgroups. Subgroup 1 consisted on 1 village, which would only be exposed to a clientelist message. Subgroup 2, also composed of only village, would be exposed solely to a public policy message. Subgroup 3, consisting of the remainder of the villages in the district, would be exposed to both types of messages.

Part of the reason for which Wantchekon's experiment is so astonishing is that he was able to run it during an actual election with cooperation from the campaigns of the major candidates. That is, in very simple terms, he sat down with teams from each campaign and agreed upon what messages would be run in each village of his experimental districts for the purpose of carrying out the experiment. In order to do this, he first identified two national and two regional candidates. These candidates were identified by looking at the locations of their safe, or stronghold districts. A district was identified as a stronghold for a particular party if that party won more than 70% of the votes in the last two presidential elections. A party was identified as "regional" if all of its strongholds were located within one of Benin's six provinces. By conducting this analysis, Wantchekon was able to identify two national candidates, two regional candidates, and the districts associated with each candidate as safe of competitive. From among these districts, Wantchekon selected two stronghold districts for each of the four candidates, as well as two competitive districts (one in the North of Benin, the other in the South). The villages within each stronghold district were divided as described above. In each of the competitive districts, two experimental villages were also selected, but in this case, the experimental villages were used to see what would happen if one candidate ran on a platform of clientelism while the other ran on a public policy platform - the candidates took opposite roles in each of the two experimental villages in each district, and the remainder of villages functioned as a control.

These divisions resulted in the following aggregate sample sizes:

• Noncompetitive districts:
• Clientelist villages: 6,633 registered voters
• Public policy villages: 6,983 registered voters
• Control group: 28,376 registered voters
• Competitive districts:
• Experimental villages: 4,503 registered voters
• Control group: 80,000 registered voters

Platforms were constructed for each position (clientelist vs. public policy) that addressed the same issues in different ways. Public policy messages would focus on solving problems at the national level, while clientelist messages focused on providing funds and political patronage jobs to inhabitants of the particular villages to which they were directed. This process was, of course, complex, and a full description can be found in Wantchekon's article. For our purposes, though, it's important to note that, because voting records in Benin are not as precise as US voting records, the best way in which to evaluate the treatment effects of the field experiment was to conduct post-election surveys collecting demographic information from samples of the voting population, as well as information about how they voted. The results of these surveys were evaluated using the same basic method as described in Gerber and Green (above) and showed, in general, a positive treatment effect for clientelist messages and a negative treatment effect for public policy messages, though there was some regional variation in these effects. The article concludes with some guesses about the cause of these regional variations.

I think this shows a missed opportunity - if survey data had to be collected anyway, a much deeper analysis of the treatment effect that might have shed some light on the reasons behind these regional variations could have been carried out with the help of a well designed survey instrument. However, it also shows some small precedent for using survey instruments as the data collection method for the evaluation of the treatment effect.

#### Dunning's Study of Ethnic Categories in Karnataka (2009)

Dunning 2009 uses the survey instrument in a more nuanced way. Dunning's study focuses on the Indian state of Karnataka - the study encompasses both a field experiment and natural experiment, but for our purposes it is the field experiment that's most relevant. In this portion of the study, Dunning shows subjects video of actors posing as candidates for the local village councils (Gram Panchayat). The treatment (or, as Dunning terms it, "experimental manipulation") concerns what the subject is told about the surname of the candidate. Because surname's communicate information about caste, by manipulating the candidate's surname he is able to get access to differences resulting from caste classification. Thus, the experiment could produce three possible conditions, into each of which subjects were placed at random:

1. The candidate is perceived to be of both the same sub-caste and the same caste as the subject
2. The candidate is perceived to be of a different sub-caste, but the same caste as the subject
3. The candidate is perceived to be of both a different sub-caste and a different caste than the subject

Dunning did research on caste, sub-caste and surname, even conducting a test experiment on assignments, in order to ensure that there was a high likelihood that each subject would perceive the candidate to have the intended caste relationship to him/herself. The entire experiment had 1,453 subjects, with 458 in Group 1, 470 in Group 2, and 525 in Group 3.

Dunning used two different speeches (one with a clientelist, the other with a policy-oriented, message). The speeches were assigned at random to each subject, but he found no discernible difference in their effects.

Each participant (the goal was 10 from each of 200 villages) was given a pre-screening questionnaire which included questions about caste and sub-caste identification, allowing the experimenters to assign the candidate an appropriate surname to put the participant in the appropriate group. Participants were then shown the videotaped speech, which was introduced by the experimenter using the surname assigned by the experiment. Each participant was then asked a series of post-treatment questions related to the speech, as well as to other aspects of local governance not directly related to the treatment. Dunning intentionally oversampled scheduled caste members in order to get a closer look at how scheduled caste status affected caste affinity.

Dunning also incorporated a second, natural experiment into his design. It's not terribly relevant to our project here, so I won't summarize it thoroughly, but it had to do with testing for the effect of reserving some seats for members of scheduled castes. Villages were selected using a regression discontinuity approach in order to test for the effects of reservation. This portion is detailed on pages 25-31 of Dunning 2009.

Dunning uses Gerber and Green's method outlined above to estimate the treatment effect. Respondents are asked to rate the extent to which the speech made them want to vote for the candidate, on a scale of 1-7. Each set of answers is then compared to each of the other two sets using the formula above. The experiment suggests a significant effect results from shared sub-caste, but shared caste without shared sub-caste doesn't seem to have much effect.

The survey instrument also contained questions about specific aspects of the candidate's character, including likeability, competence, intelligence and impressiveness. This allows one to look at the ways in which candidates are perceived differently on the basis of caste. Dunning finds, in this case, a particular salience in the expectation of distributive benefits if a member of one's own sub-caste is elected.

## Sources

page revision: 66, last edited: 15 Dec 2009 20:20