- 208 - syllables we discover that syllable boundaries are to some extent arbitrary; there is a
difference in the phonological, or what Cruttenden calls the “innate” length of vowels e.g. the vowel in peat is generally longer than that in pit; and finally, the last syllable before a
pause is often lengthened. How aware is the listener of these influences, and how do they affect hisher subjective judgment as to whether a particular syllable is lengthened or not?
6.3.2 Loudness
Loudness, as perceived by the listener, “is related to the breath-force which a speaker uses. The acoustic correlate of loudness is intensity or the amount of energy which is present in
a sound or sequence of sounds” Cruttenden 1986:3. Here, also, there are complicating factors: open vowels are acoustically of greater intensity than close vowels and listeners
“must in some way allow for this when interpreting relative loudness for other purposes”; the relationship of absolute intensity to perceived loudness “is by no means linear a sound
has to be much more than doubled in absolute intensity before it will be heard as twice as loud and moreover the relationship is different at different frequencies”; then there is the
matter of the communicative purpose itself - “I may shout because I am angry or I may make my accented syllables much louder than my unaccented syllables as an emphatic
device” p. 3. Experiments have shown that accented syllables are more regularly indicated by length and pitch than by loudness, though the listener often interprets this as stress, or
loudness. Then there is the opposite of loudness where, in order to achieve “special prominence”, a speaker uses a combination of lowered pitch and whisper which is highly
effective, and perceived by the listener as significant Charleston 1960:23.
6.3.3 Pitch
Pitch is “the prosodic feature most centrally involved in intonation”. As far as the physiology is concerned, “pitch is primarily dependent on the rate of vibration of the vocal
cords within the larynx”; this rate is produced by the length and tension of the cords, and reflected in the acoustic measurement of fundamental frequency. The average fundamental
frequency for children’s voices is approximately 265Hz. As we have said above, pitch is used as a perceptual term, relating to listeners’ judgments as to whether sounds are “high”
or “low”; however, such judgments are not linearly related to fundamental frequency. “For listeners to judge that one tone is twice as high as another, the frequency difference
between the two tones is much larger at higher absolute frequencies, e.g. 1000Hz is judged to be double 400Hz, but 4000Hz is judged to be double 1000 Hz” Cruttenden 1986:4.
Even so, for most practical purposes pitch can be equated with fundamental frequency.
- 209 -
6.3.4 The Transcription of the Data
A very simple system has been used for recording the syllables perceived to be prominent in the data: unstressed syllables are written in the usual way, those which seem more
prominent are written with capitals, and those which are perceived as being nuclear, or most prominent, are italicized see also p. 117, e.g. the following from Shvinder’s Story A:
Conj Subj
Verb Comp
3 And
his WIFE said,
4 “Is
THERE ANy um RI-ICE
left? Lengthened vowels are indicated by V-V [V = vowel] as in RI-ICE. The analysis, therefore,
is of necessity impressionistic. We have used our own judgment and knowledge of English syllable structure, and in nearly all cases our perceptions have been based on a combination
of pitch and length, rather than loudness. We have been able to check this out, using the CECIL Computerized Extraction of Components of Intonation in Language programme
and speech box CI-500 Analyzer developed by the Summer Institute of Linguistics version 1.0, January 1990, version 1.2A, December 1990.
However, as we have indicated above, the impressionistic analysis of a speech sample and the acoustic analysis of that sample are not directly comparable for two main reasons. In
the first place, not all acoustic information present is, in fact, relevant to speech and so we need to ask how much of the acoustic data can we actually perceive? And secondly,
“human beings are, from an acoustic point of view, very variable in their performance” O’Connor 1973:94 and two seemingly identical productions of the word “saw” [
s ɔ
], for example, would show acoustic differences on a CECIL print-out. O’Connor points out
that: This interplay between the acoustic record and what people hear is of extreme importance,
not only in research but in general for the understanding of the communication process… p. 95
Although this is the case, the investigation of the relationship between the two is not part of this thesis. Our reason for looking at acoustic records here is simply to present some
corroborative evidence for saying, for example, that a certain syllable is prominent because
- 210 - it has a rising or falling pitch pattern, and nothing more. As expected, there is some
disparity between the acoustic records and our perceptions: neither approach is truly accurate for the reasons outlined above. I am not a phonetician and have not been
intensively trained to listen to pitch patterns: therefore, my perceptions are suspect. But so are the acoustic records: the original recordings were not made in quiet surroundings, and
background sounds unintentionally recorded include traffic noise, the school steel band and a fire engine siren; the subjects themselves produced extraneous noises, such as
banging the table, scraping chairs and heavy breathing into the microphone; the recording equipment used was adequate but not of the highest quality, and so all these additional
factors have a bearing on the validity of the acoustic records and the computer print-out. See figure 6.1 for a print-out of Shvinder’s Story A 4. The first graph indicates “stress” or
the “energy” at particular points in the sound wave and the “average amplitude” is used to give some indication of the relative stress. The second graph indicates the frequency,
measured in Hz; this instrument is not designed to cope with frequencies above 500Hz and version 1.0, not above 400Hz, and herein lies the first serious drawback: Shvinder
produces frequencies of over 500Hz when she says, “Is there any ...?” and so the graph is distorted at that point, CECIL picking out only the lower frequencies inherent in the
syllables instead of the significant ones, occurring above the range catered for by the programme. “The problem with high pitched voices is that it reduces the accuracy with
which the frequency at any point can be calculated” Hunt 1989:9. The top line is normally reserved for the phonetic representations of the sounds; we chose
to write the data using normal English orthography. The playback feature of CECIL was used to determine where each syllable began and ended, and a broken vertical line indicates
the termination of a syllable before a pause. In the transcription of Shvinder’s A3 and 4 above, we indicated that, to our ears the “i” [
ai ]
of rice was lengthened, but that the “um” was not probably because it was unstressed: CECIL reveals that the “um” actually lasted 0.745 secs. but that the “i” was only 0.246
secs. in duration, while the “r” was 0.343; however it is almost impossible to perceive the boundary between the “r” and the “i”.
- 211 - Figure 6.1
When we compare the amplitude trace in graph 1 with the frequency trace of graph 2, we see that stress and frequency seem to vary in harmony with each other. The frequency
range is adjusted for each speaker by calculating the lowest and the highest significant frequencies recorded, and discarding the frequencies which lie outside this range.
- 212 - Figure 6.2
Figure 6.2 shows how the trace is smoothed by CECIL to give the intonation contour; breaks in the contour usually indicate voiceless sounds. Other adjustments can also be
made to remove spurious traces from the graph: these are changing the voicing threshold and group size. Where there are unexpected gaps in the trace, these can often be filled in
by increasing the interpolation gap and the percentage change parameters. The details of the adjustments to these parameters are given in the print-out. However, nothing can be
done to improve a poor recording.
- 213 -
6.3.5 Acoustic Data, Perception and the Attitudinal Function of Prosodic Features