Summary Introduction Referential Distance

means that some referents with a high level of importance in a particular passage may not show much in the way of persistence because they are introduced at the end of a passage. Cognitively the continued occurrence of a referent will enhance the activation of that referent and suppress the activation of other referents. 5 This will increase the activation of the persistent referent in relation to all other participants. This allows topic persistence to be used as an estimate of future acti- vation levels. For this reason, if different topic persistence levels can be associated with different refer- ential forms, then the conclusion that language producers use specific forms to achieve future goals as stated in the model of Goal Oriented Activation is supported.

4.5.3 Ranking of participants

It is important to establish the ranking of participants in the stories independent of their referential form, because important referents are treated differently than episodescenario dependent partici- pants Anderson, Garrod, and Sanford 1983. The ranking of participants can be done in a number of different fashions. One of the first ways is the method of introduction. Participants introduced with proper names are in general more likely to be important than unnamed participants 1983, Fleming 1978. The problem with using the form of introduction is that the definition is related to the topic un- der investigation, that of referential form, and as such involves circularity. Olo narratives identify the most important participant by telling who or what the story is about in the first line. An example of this is given in 148, the first line of the story Amerika. 148 ki nampli il p-iti Amerika, I bring.out.3p words 3p-of America Amerika sungoi lolpopo p-epe, America long.time fight. CNT 3p-this I am telling the story of America. A long time ago America fought here, This, when it occurs, allows the identification of the most important referent. It would be good to have other means to rank all the referents. It is reasonable to assume that characters with high overall fre- quency are important over the whole narrative, while characters with low overall frequency are not as important. It is more difficult to make conclusions about mid-frequency referents. Since very impor- tant referents and unimportant referents are expected to be at opposite ends of the frequency scale, and should hopefully show some difference in referential forms if importance matters to referential form, I devised a way to separate them from each other and from referents that possibly are not at the same importance level. This was done by looking at the two ends of the frequency scale, those that oc- curred three or less times and those that occurred twenty or more times. This differentiation, while not exact, is based on the assumption that as a general rule important participants occur more frequently than unimportant referents.

4.7 Summary

In this chapter I have described the methodology for quantifying the parameters in the text based study of Olo narratives. Measurements will be made of referential distance RD, topic persistence TP, and for the various boundary phenomena involving space, time, punctuation, and discourse markers. The interaction of these measurements will be examined in the next chapter as support for the central claim that language producers manipulate referential form to adjust the activation of dif- ferent participants as part of their strategy for meeting their ongoing goals of the communication. 4.7 Summary 71 5 The actual degree of enhancement and suppression is based on the actual forms used in each occurrence Gernsbacher 1990. 5 Results and Analysis

5.1 Introduction

The results of this study confirm that the Goal Oriented Activation model is superior to the other models under consideration. This chapter gives the results of the different measurements made on the Olo texts: referential distance, topic persistence, and boundary phenomena. The chapter opens with the general characteristics of the referential forms in the text corpus and then goes on to examine in de- tail the different measurements. In each section the results of the measurements are compared with the claims of each of the competing models: Goal Oriented Activation, recency, episodes, memorial activa- tion, and prominence. This comparison will allow us to choose one model, Goal Oriented Activation, over the competing models.

5.2 General Characteristics of the Database

The text corpus examined in this study consists of eight texts. In those texts there are a total of 1,603 clauses, which gives us a large database with which to work. In a departure from previous treatments, I have divided the clauses into two groups: participant introductions and post introduction reference. There are 333 initial introductions and 1,270 post introduction references. The division allows the two groups to be treated separately. This is valuable as each of the different models under investigation make different predictions concerning new introductions. Since other investigators taking a quantified approach to reference have commonly not separated out new introductions from post introduction ref- erences, it is impossible to compare the ratios of new introductions to either number of clauses or post introduction mention figures with any sort of benchmark. The forms of reference used are divided into different morphological categories. These are given in table 5.1 for the initial introductions and in table 5.2 for the post introduction references. This study does not attempt to account for quotes, subordinate clauses, or coordinate NPs. Quotes have com- monly been excluded from counting studies. Investigators have avoided including them because they are simply very difficult to code. The dilemma is whether to code the referential form in the same man- ner inside the quote as outside it. There is no good answer to the coding issue. The second problem with quotes is that they are embedded discourse with their own world of discourse or mental space Fauconnier 1985, which may be related to the mental space in the current discourse but is not identi- cal to it. A theoretical framework to handle the interaction of quoted material and the discourse in which it is quoted still needs to be developed. Part of that development is devising a theory of refer- ence that will handle all the nonquoted material. This division of the tasks is why I have excluded quoted material from the analysis. Subordinate clauses share the same problems that beset quotative 73 material. It is unclear how to code the referential forms in the clause, and they can involve a different mental space. If part of them do involve a different mental space and part of them do not, the decisions on which to include and which to exclude could prove to be unmanageable. The reason for excluding coordinate noun phrases rests entirely on the coding issue. If one referent is encoded within a coordi- nate noun phrase by a noun and another referent is encoded within the noun phrase by an affix the two forms are inherently different but could both be coded by calling the occurrence a coordinate noun phrase. While the answer to the questions of proper coding are solvable, and the theoretical issues re- volving around quotations and subordinate clauses, need to be resolved, that resolution is not needed to determine the best model among those under consideration. While quotations, subordinate clauses, and coordinate noun phrases are not examined in this study, they are included in counts for referential distance and topic persistence. Table 5.1. Initial introduction referential forms and the number of their occurrences Form Occurrences Form Occurrences zero 7 noun phrase and verb affix 54 verb affix 17 name 8 pronoun 12 name and verb affix 11 pronoun and verb affix 15 quote 25 noun 38 subordinate clause 24 noun and verb affix 12 coordinate NPs 26 noun phrase 46 Total 333 Table 5.2. Post introduction referential forms and the number of their occurrences Form Occurrences Form Occurrences zero 152 noun phrase and verb affix 35 verb affix 648 name 13 pronoun 65 name and verb affix 23 pronoun and verb affix 123 quote 42 noun 36 subordinate clause 37 noun and verb affix 34 coordinate NPs 30 noun phrase 32 Total 1270

5.2.1 Participant introductions

Certain facts are readily discernible from table 5.1. There are 333 referents in the texts; the majority of them, 280, are introduced by some form that is specific, such as a name or a noun phrase. The re- maining fifty-three are initially specified with either a pronominal form, verb affix or zero. Given that table 5.1 includes first-person referents, most of the models of reference management would predict that all the minimal forms must be first-person referents. This would allow them to claim that since first-person referents are always available in the speech situation, then their model, be it recency, epi- sodes or memorial activation does not have to account for them. However, thirty-five of the fifty-three are third-person referents, and all the models, to be adequate, have to account for them. 74 Results and Analysis

5.2.2 Post introduction references

The most common form of reference in Olo is by verbal affix. This behavior is not unexpected, and it is easily accounted for by the different models. They would all predict a preponderance of minimal forms. Given that discourses are continuous streams of coherent speech, the different models predict that once a referent was referred to by a minimal form, that form would continue until some outside force caused the use of a more fully specified form. For the recency model, this would be absence from the register. Episode models claim the episode boundary is the outside force. The memorial activation model claims that it is the drop in activation, caused by episode boundaries as well as other possible phenomena. Goal Oriented Activation agrees that it is expected that minimal forms would be the most common, since a referent is referred to by a minimal form only when the activation level of the refer- ent is what the speaker wants to meet his ongoing goals. Anything that changes the balance of activa- tion levels among participants or lowers overall activation would cause the speaker to adjust the activation levels by using a more specific form. There were 648 occurrences where verbal affixes were the only morphological form used to refer to the participant. This class makes up 51 percent of the referential forms cited in table 5.2. When com- bined with zero forms, we find that minimal referential devices make up 63 percent of the referential forms after initial introduction. This proportion is close to that found by Payne 1993 where he re- ports devices of this type make up roughly 67 percent of the referential forms in Yagua. When the ver- bal affixes and zero are combined with pronominal forms, the figure in Olo climbs to 77.8 percent. Since the majority of the stories in the database in this study are first-person narratives, it is possible that these figures could be unduly biased by first-person referents. However, this is not the case. While the numbers are lower, the combination of zero reference and verbal affixes still accounts for 61 per- cent of all referential forms. The figures for just third-person post introduction referential forms are given in table 5.3. Table 5.3. Third-person post introduction referential forms Form Occurrences Form Occurrences zero 107 noun phrase and verb affix 34 verb affix 414 name 13 pronoun 37 name and verb affix 23 pronoun and verb affix 53 quote 34 noun 36 subordinate clause 26 noun and verb affix 34 coordinate NPs 21 noun phrase 29 Total 861 The percentage of nonfully specified forms rises to 71 percent when the pronouns and pronoun plus verbal affix are added. These “pronominal forms” are much more frequent than the more fully speci- fied “nominal” forms. This is expected, and all the different models easily account for this behavior.

5.3 Referential Distance

Referential distance can be used to estimate the activation levels of a referent. It is based on the no- tion that activation is lowered over time without remention to “reactivate.” The second component of referential distance is the suppression caused by competing participants. The mention of a competing referent will cause a degree of suppression of the other participants depending on the referential form used Gernsbacher 1990. Each of the different models make distinct predictions about the effects of referential distance on the choice of referential form. 5.3 Referential Distance 75 The recency model makes the claim that there is an absolute correlation between referential dis- tance and the form chosen. The actual distance figure assigned to each form is somewhat debatable. No matter what figures are finally chosen, minimal forms should occur at less distance than more fully specified forms. For a pure recency model to be valid there should also be no cases where different types of forms occur at the same distance. Recency claims that the only thing that affects the form is referential distance, so all forms should be distinguishable on the basis of referential distance alone. If the recency model is based on mental activation, then in its strict interpretation the only parameter that affects the activation is distance. While it is too much to expect that we would find a complete dis- tinction of forms on the basis of referential distance, it is clear there should be little overlap among the different forms or there is clearly something other than referential distance that is interacting with the selection of referential forms. Following Givón’s 1994a estimates, we would expect zero, affixal forms and pronouns to only be used with a referential distance of 1, topical nominals used with distances of 2 or 3, and other nominal forms should be found in all cases of distance over 3. The pure episode model says that the only thing that influences the choice of referential form is epi- sode boundaries. The episode model would then predict that no recency effects should be seen in the data, but that the forms should be distributed in a semirandom fashion across the different referential distances. It would be expected that minimal devices would not occur at long distances, presumably across episode boundaries, while more specific devices could occur at close distance. A pure prominence analysis also claims no effects for recency of mention. The prominence analysis expects the importance of the referent to be the sole influence in determining referential form. Promi- nence would predict a purely random arrangement of referential forms and referential distance. The memorial activation model combines episode boundaries with recency and predicts recency ef- fects within episodes. It claims nominal forms are used when the activation level of a particular refer- ent drops to some low level. One place this has been found to occur is after episode boundaries Tomlin and Pu 1991. The model further predicts that pronominal forms are used when a referent has a high level of activation which normally occurs after a nominal form has been used to either introduce or re- introduce a participant. The memorial activation model would then predict that there should be some recency effects, that is, referential distance does cause the choice of referential form, but that there is a nonabsolute correlation. In particular, more fully specified nominal forms should be found at low referential distances, because in some instances an episode boundary would lower the activation level between two references even though they are only one clause apart. While memorial activation does not necessarily predict no minimal devices at long distances, if a passage of time truly causes the re- duction in the level of activation Cowan 1988, then a long distance means that a sizable decrease in activation should have occurred. Further, at long distance it is expected that other referents would have been mentioned, also causing suppression of activation. For these reasons the occurrence of many minimal referential devices after large distances is inconsistent with the expectations of memorial activation. The Goal Oriented Activation model claims that the speaker adjusts the activation of the referents by using different forms based not only on past activation this would include both recency and episode effects, but also to achieve the future and overall goals of the speaker. This model predicts that there should be some difference in activation of referents as measured by referential distance, but that there should be overlap among the forms in regard to referential distance. Goal Oriented Activation is more tolerant than the other models in dealing with minimal referential forms occurring at large distances because it does not equate low activation with the use of fully specified forms and high activation with the use of minimal forms. Goal Oriented Activation makes different predictions than the other models about forms used in introductions, but that is dealt with later in the chapter. To summarize, the predictions of memorial activation and Goal Oriented Activation are similar. While both claim that there should be some correlation with referential distance and referential form, they predict that referential distance alone cannot predict the referential form. The pure recency model predicts a complete correlation between referential distance and referential form. The episode model predicts minimal to no correlation, while the prominence model claims no correlation at all. 76 Results and Analysis Referential distance measurements We can begin to see how the data looks by examining the central tendencies of the different referen- tial forms. The central tendency is the midpoint around which the data clusters. The closer the data for each group is to its central tendency, and the more discrete the central tendencies are, the surer we are that the phenomenon we are looking at influences the other. Looking at the central tendency allows us to view a summary of the data in a single number. This is very useful to get an overall picture of the data. The different referential forms and their central tendencies for referential distance are given in table 5.4. Because some of the models are making predictions on the basis of activation, and first- person referents are thought to be continually activated, the more conservative approach is to exclude first-person referents from these measurements. Therefore, the figures are given for third-person refer- ents only. Two central tendencies are given: the first is the median and the second is the mean. These two mea- surements are different estimates of the central tendency. The mean is a simple average of all values, while the median is the value that has an equal number of values smaller as well as larger. Means are normally used when the data is continuously distributed and the value of the measurements can be considered additive. A pure recency hypothesis would be more interested in the mean, since the differ- ence between 1 and 2 clauses is considered to be the same as the difference between 2 and 3. From the work of Clark and Sengul 1979 we know that the difference between 1 and 2 clauses is not the same as the difference between 2 and 3 clauses. For this reason the median is more appropriate, because it is not dependent on the additive assumption. Secondly, the mean is very sensitive to outliers, that is, numbers that are much greater than the central tendency or much lower that skew the reported cen- tral tendency to be higher than is justified. While I am reporting both values, the median and the mean, the median should be considered the more reflective of the central tendency. The central tendency will provide us an estimation of what would be true if no other factors are involved. It does not tell us if other factors are involved. Table 5.4. Central tendencies of referential distance for different forms Form Median Mean zero 1 1.3 verb affix 1 2 pronoun 1 9.27 pronoun and verb affix 1 2.2 noun 6.5 26.8 noun and verb affix 3 11.6 noun phrase 8 20.2 noun phrase and verb affix 4 22.9 name 4 21.9 name and verb affix 4 23.4 From the central tendencies, the referential forms can be divided into three classes. Class one con- sists of zero, verbal affixes, and pronouns. They have a median distance of 1. Class two has a median referential distance of 34 and consists of nouns with verb affixes, noun phrases with verb affixes, names, and names with verb affixes. Class three consists of nouns and noun phrases that have no coreferential verb affix. They have a median distance of 6.5 or above. Only an examination of the cen- tral tendencies would lead us to conclude that the recency hypothesis is sustained, as well as that of Goal Oriented Activation and memorial activation, because all would predict a correlation of referen- tial forms and referential distance; we find a correlation between referential distance and referential form based on central tendencies. We would also conclude that prominence and episode models are 5.3 Referential Distance 77 ruled out because they would predict minimal or no correlation. The central tendencies line up much like the predictions. The measures of the central tendency only tells us what correlates if no other factors are involved. It does not tell us if other factors might be involved. To find out if other factors might be involved we have to look at the variance across the different forms. We need to know how much overlap there is at a given referential distance for different referential forms. This is needed particularly for the recency model because it predicts “absolute” concordance between distance and form. It claims that nothing else influences the choice of referential form either directly or by the activation level dropping over time. Therefore a recency model can only be sustained if no 1 occurrences of minimal referential forms occur with a referential distance beyond that which the less minimal forms occur. The recency model predicts no overlap. The forms and frequency of occurrence at different distances is given in table 5.5. Table 5.5. Frequency of occurrence of different referential forms at specific referential distances Referential Distance Form 1 2 3 4 5 6 7+ Max RD zero 96 5 1 1 2 2 10 verb affix 354 31 5 4 1 2 19 72 pronoun 23 6 3 1 4 158 pronoun and verb affix 41 2 3 1 1 4 26 noun 3 8 1 1 2 3 18 231 noun and verb affix 12 1 6 1 13 107 noun phrase 5 3 2 2 17 215 noun phrase and verb affix 4 2 5 2 11 160 name 3 1 2 1 1 5 171 name and verb affix 8 3 1 10 120 When the occurrences of the different referential forms are examined at the different referential dis- tances it is clear that there is considerable overlap. While the central tendency of nouns with a coreferential verb affix as measured by the median is three, the greatest number of occurrences at a single referential distance for this form is 12, 2 and the referential distance is one, not three. In all cases of the fully specified forms, the referential distance with the most occurrences is always lower than the median. It is clear that the range of referential distances is quite large, and there is large scale overlap. Just examining the number of occurrences of each form at a referential distance of 1 shows far higher overlap than can be attributed to coding error. It is obvious from an examination of the overlap be- tween forms in the different columns of table 5.5 that a pure recency hypothesis is untenable. For ex- ample, 36 percent of nouns with coreferential verb affixes and names with coreferential affixes occur at a referential distance of 1. All referential forms have large numbers of occurrences at referential dis- tances of 1 and 2. Further, all forms have occurrences at distances of 7 and greater. Verbal affixes alone occur nineteen times at these distances. This clearly demonstrates that neither a mechanical recency approach, nor one based on recency as the only factor affecting memorial activation is tenable, be- cause both predict that there should be no overlap, or only minimal overlap, and that the forms should be predictable on the basis of referential distance alone. If there were only a few cases, they could be attributed to some type of anomaly, however with the frequencies shown in table 5.5 this is not the case. 78 Results and Analysis 1 This is the tight formulation. Obviously some allowance must be made for coding error and other residue that is not theoretical in nature. 2 The reader needs to understand that the chart in table 5.5 is a collapsed version with all the values for referential distance greater than 6 put into a single column, 7+. In no case were there more than three occurrences at any single distance value. The recency approach fails because it is too simple. The assumption that recency is the only fac- tor affecting activation dooms this approach. Two other approaches, memorial activation, and Goal Oriented Activation, claim that recency should show some influence on choice of referential form. The episode model and the prominence model predict that referential distance should show little or no influence. The cognitive basis for referential distance as an estimation of current memo- rial activation levels is based not only on distance but also on the possible occurrence of other ref- erents in the text. Referential distance is, at best, an indicator of memorial activation, but it does not predict memorial activation. If it predicted activation the recency hypothesis would have been upheld. The variance in the data showed that there are other factors that influence the choice of referential form. In the memorial activation and Goal Oriented Activation models, these other in- fluences are things that affect the activation levels. It is possible in a sense to filter out the other in- fluences to see if the referential distance has a real impact on the choice of referential form. By using statistical tests we are able to compare central tendencies to see if they are significantly dif- ferent. Two categories are considered to have different central tendencies if they show a “signifi- cant” difference. The question is not just whether there are numerical differences, but whether the differences are real. There are many possible statistical tests that can be applied to a data set. Some are more appropriate than others. The main task in choosing the specific test is to select a test whose basic assumptions are not violated. A test that demanded continuous data points should not be used with data divided into categories. Tests that assume a normal i.e., bell shaped distribution should not be used with data that is not normally distributed. It is possible to violate some assumptions without irreparable harm, but it is better to avoid violations if possible. The test chosen to compare the referential distances for significance is the Mann Whitney U test. This test is a nonparametric test and is designed to be used on data that is noncontinuous; it does not require that the data be normally distributed. Both of these are characteristics of the data in question. The Mann Whitney U test is used in much the same circumstances that a t test is used on continuous and normally distributed data. The Mann Whitney U test involves pairwise compari- sons. When a pairwise comparison is conducted, each form is compared to all other forms one at a time. All forms that are said to significantly differ on the parameter of referential difference have a p value of .01 or less. This means that the likelihood of wrongly claiming that a real difference ex- ists when it does not is less than 1 percent. Table 5.6 gives the four-place p value. Those that are boxed by a heavy line are considered significant at greater than a .01 rate. The table can either be read horizontally or vertically. A significant difference in the chart means that the central tendencies are truly different. A differ- ence in central tendencies is predicted by the memorial activation model and Goal Oriented Activation model. They both claim that referential distance is related to activation of referents, but that other things also are involved. We measure the likelihood that the central tendencies are different. If the cen- tral tendencies are significantly different we can then claim that, although there are other influences on activation, one of the influences is the distance since last mention. While the activation models are congruent with significant differences in central tendencies, the prominence model predicts no signifi- cant differences. The episode model would be well supported by a finding of no significant differences, however, it could tolerate some minor significant differences, that is, if either the p value is close to threshold of nonsignificance .01 or if the significant differences appear randomly distributed. In ei- ther case, it would be possible to claim that the effects are merely a confounding distance with episode boundaries. 5.3 Referential Distance 79 Table 5.6. Probability of error values for referential distance distinctions forms zero Vaff pro pro Vaff noun noun Vaff NP NP Vaff name name Vaff zero .0415 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 Vaff .0415 .0000 .0001 .0000 .0000 .0000 .0000 .0000 .0000 pro .0000 .0000 .1577 .0000 .0058 .0001 .0011 .0061 .0226 pro Vaff .0000 .0001 .1577 .0000 .0000 .0000 .0000 .0002 .0006 noun .0000 .0000 .0000 .0000 .0469 .3026 .0704 .3628 .1610 noun Vaff .0000 .0000 .0058 .0000 .0469 .3136 .7972 .4896 .8022 NP .0000 .0000 .0001 .0000 .3026 .3136 .3889 .9297 .5858 NP Vaff .0000 .0000 .0011 .0000 .0704 .7972 .3889 .6051 .9354 name .0000 .0000 .0061 .0002 .3628 .4896 .9297 .6051 .6746 name Vaff .0000 .0000 .0226 .0006 .1610 .8022 .5858 .9354 .6746 From table 5.6 we can conclude that the activation as measured by referential distance shows no sig- nificant difference between zero and verbal affixes, but does show a significant difference between them and all other categories of referential form. In the same way pronouns and pronouns with verb affixes are not significantly different from each other, but both are distinct from all other categories. The nominals: nouns, noun phrases, and names, whether with verb affixes or without, are not signifi- cantly different from each other, but are significantly different from the set of pronouns and the com- bined set of verbal affixes and zeros. The parameter of referential distance allows a three way distinction between zeroaffixes, pronouns, and nominals. From this information we can see that the activation level of the different classes as measured by the central tendency of referential distance for each class is distinct. The results of the Mann Whitney U test allow us to claim that referential distance does impact activation, even though it is not the only thing that does. The results support the two acti- vation models since they both predict that activation is affected by distance. Finding a correlation be- tween referential distance and referential form means that a strict prominence account is ruled out, since it predicts no correlation. The episode model is considered unlikely since a highly significant cor- relation was found and that the patterning of correlation is congruent with an activation account. That is, the significance is not randomly distributed among the different forms, but rather falls into three distinct classes, zeroverbal affixes, pronouns, and nominals.

5.4 Topic Persistence