Methods .1 Estimating probabilities of miscategorization

Abramson 1960:134ff., Rungpat Ruengpitya, p.c.. 13 The functional load of glottalization in signaling vowel length is even less clear. We may note, however, that vowel glottalization is a feature of all obstruent-coda syllables, regardless of whether the coda is oral or glottal, and even of whether the vowel is long or short. This fact would indicate that the glottalization is a function of the obstruent coda, and not of vowel length per se. In the absence of further evidence, then, I will assume that vowel duration is the primary marker of the vowel length contrast. For hearers to accurately categorize perceived vowels as long or short, they must at the very least perceive the start and end points of the vocalic articulation. Of course, phonetic stimuli are not identical, even when the stimuli encode the same category. Accurate perception of phonetic duration therefore does not guarantee accurate categorization: the listener must somehow resolve parametric durations into a categorial contrast. Following Boersma 1998, I assume that the phonetic instantiations of a single contrastive category—[ a], for example—are normally distributed around some mean vowel duration, quality, and phonation. 14 The listener is thus faced with an exercise in probability: given a perceived vowel of a certain duration and spectral character, is the vowel more likely to belong to the population of short vowels or the population of long vowels? To this task the listener brings his or her implicit knowledge of the phonetic cues associated with each contrastive vowel length and the typical range of variation of each cue. I will further assume that the listener also makes use of knowledge about the relative frequency with which each category occurs. Some categories are more likely to be encountered than others due to contrasting categories which occur in speech with different frequencies. This is due to their frequency in the lexicon and the rates at which various lexical items are used. Plainly all this is not conscious knowledge, but it seems reasonable to think that with long experience, the listener has an intuitive grasp of each of these parameters. 3.2 Methods 3.2.1 Estimating probabilities of miscategorization Boersma 1998 presents a method for estimating the probability that any given percept will be categorized correctly or incorrectly. Given a pair of overlapping probability functions such as those in figure 1, their intersection marked by the dashed vertical lines in figure 1 is first estimated. The area under each curve that lies beyond the critical point may then be calculated; the portion it represents of the entire area under the curve is the probability of miscategorization. For instance, there should be various critical points along the spectrum of F 1 frequencies that listeners use to categorize perceived formants into vowel height categories. It is reasonable to think that these critical points are determined by the individual probabilities that any given F 1 frequency belongs to each vowel category—or more exactly, to each of the normally distributed populations of F 1 frequencies that instantiate each vowel height category. Other things being equal, the listener will classify any given 13 Abramson 1960:134ff. found that subjects identified progressively shortened long vowel articulations as long down to, on average, 58 percent of the original long vowel’s duration. Given that the durations of the short vowels in Abramson’s experiment were, on average, 40 percent of the long vowel’s duration, Abramson concludes that vowel quality alone cannot maintain the vowel length distinction. Compare Rungpat 1999, who found that the crossover point for shortened long syllables in nasal-final syllables was only 20 msec longer than a corresponding short vowel. 14 See appendix B for statistical tests for normality of vowel duration for one data set. phonetic stimulus as an instance of the vowel height category that is most likely to be instantiated by the F 1 in question. In most cases this leads to a correct categorization, but instances along the tails of each population are likely to be misinterpreted as belonging to the next category over. Figure 1 illustrates this scheme: each bell-shaped curve represents the distribution of the phonetic exponents of its vowel category. Their heights relative to each other reflect the relative frequency of their categories. Concretely, the normal distribution for each category is multiplied by a constant which represents its frequency, vis-à-vis other categories along the same phonetic dimension. The dashed vertical lines at the intersections of the probability curves indicate the critical values that listeners use in sorting F 1 percepts, the points at which there is a change in the most-likely category; the shaded portions of the tails indicate the range of F 1 values for which listeners are likely to miscategorize the F 1 percept. Figure 1. Normally distributed F 1 populations instantiating three vowel height categories F re que nc y of oc cur re nc e i e  In general, given a phonetic domain in which a particular category occurs more frequently than the others, all percepts within the domain will be assigned to that category. The probability that any particular percept will be assigned to that category is therefore the sum of the probabilities of a phonetic feature here, vowel length being produced within that domain, across all categories along the phonetic dimension. 14 P C y P C y perc f prod f f x f x i i a b i n = = = = =     ∑ ∑ | | where a and b are points along a phonetic dimension d such that for all points d i such that a ≤ d i ≤ b, P prod f x i = ≥ P prod f x j = .

3.3 Predictions and data