Language differences in parsing

2.4.1 Language differences in parsing

Until the mid-1980s, empirical studies of parsing focused upon English (see Mitchell Brysbaert, 1998). The resolution of an ambiguity such as (18) was satisfactorily explained by the Garden Path Model (Frazier, 1979; 1987).

(18) The man saw the spy with the binoculars.

Here, binoculars may attach to spy (the spy had the binoculars) or saw (the man used the binoculars to aid his vision). Following the principle of Minimal Attachment, the adverbial interpretation (saw with the binoculars) is indicated. A similar ambiguity – a GA sentence with an ambiguous constituent that may attach to two potential host sites – is the NP-PP-RC structure in this example from Cuetos and Mitchell (1988):

(19) Someone shot the servant of the actress who was on the

balcony.

Following Late Closure, this too should show low attachment and, indeed, does (Cuetos Mitchell, 1988, Experiment 1B). However, this preference does not generalize to all languages. Mitchell and Brysbaert (1998, p. 316) go on to cite the occurrence of an NP1 bias in Spanish (four studies), French (four studies), German (three studies), Dutch (one study), Russian, Afrikaans and Thai (personal communications for these last three). Thus, the use of English as a test dummy for all other languages has a weak empirical basis; it is a minority language with respect to this ambiguity. In addition, there is evidence that Spanish-English bilinguals process such ambiguities with the preference of their dominant or earliest-learned language when reading in their second (Fernandez, 1998). It is reasonable to conclude that syntactic preference is influenced by one’s linguistic environment. This is problematic for universalist, invariant models. We will now turn to the leading theoretical accounts of this variation.

2.4.1.2 Theoretical accounts of language differences in parsing The Linguistic Tuning Hypothesis

(Mitchell Cuetos, 1991a; Cuetos, Mitchell Corley, 1996) On this account, the parser attempts to resolve an ambiguity using a statistical database. This contains distributional information about how similar structures have been resolved. A companion mechanism keeps track of these resolutions as they occur. Cross-linguistic differences emerge because the distributional properties of ambiguities will vary from one language to another (Brysbaert Mitchell, 1996; Cuetos et al., 1996; Mitchell Cuetos, 1991a; Mitchell, Cuetos, Corley Brysbaert, 1995). Tuning explicitly assumes that the parser is serial.

Evaluation If ambiguity resolution does indeed mirror real-world frequency of attachments in such

structures, we should expect to find comprehension preferences mirrored in corpora (searchable databases of real-world language). Corpus studies in French and Spanish (see Mitchell Brysbaert, 1998, p. 324), both NP1-favouring languages, show a prevalence for NP1 structures. However, though corpus studies in English suggest that three-site RC attachment ambiguities are resolved as NP1 more frequently than NP2

(even at different grain 1 levels) than NP1 (Gibson Schütze, 1999; Gibson, Schütze

Salomon, 1996), this appears to be countered by on-line (Gibson et al., 1996) and off- line (Gibson Schütze, 1999) studies that demonstrated clear reading preferences for NP1 over NP2. This finding is problematic for a tuning theory that does not posit an additional, non-statistical mechanism for ambiguity resolution. However, the three-site ambiguity was revisited by Desmet and Gibson (in press). They demonstrated that: (i) the disambiguating pronoun one, which was used in the empirical studies of Gibson Schütze (1999) was associated with high-attachment in a corpus search of the Brown corpus (Kučera Francis, 1976) and the Wall Street Journal (WSJ) corpus; (ii) this bias was confirmed in a self-paced on-line reading study where one was compared with a non-anaphoric noun phrase (e.g. an article), and confirmed again in an eye-movement tracking experiment. Thus, the high-attachment readings can be traced to an experiment confound, and the Tuning Hypothesis survives.

A study by Mitchell and Brysbaert (1998) was conducted on Dutch corpora. It attempted to identify the actual frequency of NP1 and NP2 attachment preference for sentences that take the form NP1-PP-NP2-RC (see (5), p. 25). Previous on-line and off- line data indicated a reliable NP1 attachment preference (Brysbaert Mitchell, 1996) but the Dutch corpora indicated the reverse: a preference for NP2 over NP1. Mitchell and Brysbaert concluded that the Tuning Hypothesis could not account for these data.

However, subsequent re-analysis suggests that this conclusion is premature. Desmet, Brysbaert and De Baecke (2002) re-examined these corpus frequencies at a lower level of grain. They found that NP2 dominance exists only when NP1 is non-human. When NP1 is human, sentences tend to be disambiguated with an NP1 preference. Returning to their materials, Desmet et al. noted that the majority had human NP1s. This fixed effect would skew the materials towards an NP1 preference. Indeed, when the authors used the same materials in a sentence completion task, it was found that participants completed the RC in line with the finer-grained corpus search. When NP1 was changed to non-human for other sentences, the preference shifted to NP2, though the effect was only marginally reliable.

1 This is the hierarchical level of statistical association. For example, records of the co-occurrence of morphemes would be classed as a relative low level of grain. The co-occurrence of syntax-level

elements is higher. The possible combinations of grain vary across models. On the Tuning account, it is relatively coarse. For a multiple constraint-based approach (e.g. MacDonald, Pearlmutter Seidenberg, 1994; Thornton, MacDonald, Gil 1999) there may be no specification of grain; all levels are computed.

Thus the Tuning Hypothesis can accommodate cross-linguistic differences when grain size is varied. However, there are so many grain sizes that the theory runs the risk of being unfalsifiable. To avoid this, a standard level of grain must be agreed. We will see that this criticism is also true of multiple constraint satisfaction accounts (see p. 45).

Predicate ProximityRecency Theory (Gibson, Pearlmutter, Canseco-Gonzales Hickok, 1996)

This theory proposes two competing parameters: Predicate Proximity and Recency. At the point of ambiguity, the Recency biases an attachment to the current or most recently processed node (cf. Kimball’s Right Association, 1973), whilst Predicate Proximity biases towards a host site that is as near as possible to the head of the predicate. The competition of these parameters leads to an assignment of processing load for each attachment site. The parser then chooses the site with the lowest load. Cross-linguistic variation is seen as variation in the strength of the Predicate Proximity parameter. This is set by the native language. In English, it is weak, whereas, in Spanish, it is strong. A stronger Recency component would result in low attachment RC whereas a win for Predicate Proximity would favour the higher attachment site. On this account, the parser may operate in a weighted parallel or probabilistic serial fashion.

Evaluation The Predicate Proximity factor arose from the finding that corpus frequencies do not

directly mirror comprehension preferences (e.g. Gibson Schütze, 1999). Since this finding has been recently thrown into doubt (Desmet and Gibson, in press; see p. 43), the status of the theory is now questionable. Additionally, Predicate ProximityRecency retains the drawback that the mechanism of variation in the proximity parameter has not been quantified (see Mitchell Brysbaert, 1998). It cannot, therefore, be applied a priori to cross-linguistic variation in two-site RC ambiguities.

Multiple Constraint Satisfaction (MCS) (MacDonald, Pearlmutter Seidenberg, 1994; Thornton, MacDonald Gil, 1995)

Though this account has not addressed cross-linguistic variation specifically, MacDonald et al. (1994) did examine the lexical nature of modifier attachment ambiguities. They suggested that the distributional association between (i) the nouns that occupy the two potential attachment sites and (ii) modifiers will determine attachment preference. However, on this multiple constraint-satisfaction account, it is Though this account has not addressed cross-linguistic variation specifically, MacDonald et al. (1994) did examine the lexical nature of modifier attachment ambiguities. They suggested that the distributional association between (i) the nouns that occupy the two potential attachment sites and (ii) modifiers will determine attachment preference. However, on this multiple constraint-satisfaction account, it is

Evaluation Evidence in support of the Tuning hypothesis may be taken as support for the MCS

account. Other studies provide evidence against a lexicalist constraint satisfaction account (e.g. Corley, 1995; Mitchell et al., 1995) but not an account that incorporates super-lexical information (see Pearlmutter MacDonald, 1995). The Desmet et al. (2002) corpus study and Desmet and Gibson (in press) are compatible with the MCS account. Latterly, research into the interaction between working memory and parsing has led to the reconceptualization of the parser as a probabilistic serial mechanism (see MacDonald Christiansen, 2002).

Construal Theory (Frazier Clifton, 1996)

Construal Theory is a modified version of the Garden Path Model. In broad terms, the theory handles ambiguous material differentially for primary and nonprimary structures. Primary structures are linked to verb arguments and nonprimary structures to non- argument based constructions (see Frazier Clifton, 1996, p. 41 for a formal description). Whereas the first type is parsed in a deterministic fashion, interpretation of the second type may be influenced by discoursive, semantic and syntactic factors. This second type includes RC attachment. So doing, Construal accommodates cross- linguistic variation by a reclassification of those ambiguities that show variation. More specifically, the authors argue that all languages should show a preference for NP1, following a discourse principle termed Relativized Relevance: ‘preferentially construe a phrase as being relevant to the main assertion of the current sentence’ (Frazier, 1990, p. 321). The existence of an NP2 preference in English is explained with reference to genitive forms. Where most languages have one, English has two: the Saxon (actress’s servant) and Norman (servant of the actress) genitive. The reader assumes that the writer of the RC ambiguity is adhering to the Gricean maxim of clarity. That is, the sentence represents the clearest statement of its idea. If the RC ambiguity uses the Norman form (servant of the actress), the writer has purposefully avoided the Saxon form (actress’s servant, which indicates N1 attachment). The reader will therefore Construal Theory is a modified version of the Garden Path Model. In broad terms, the theory handles ambiguous material differentially for primary and nonprimary structures. Primary structures are linked to verb arguments and nonprimary structures to non- argument based constructions (see Frazier Clifton, 1996, p. 41 for a formal description). Whereas the first type is parsed in a deterministic fashion, interpretation of the second type may be influenced by discoursive, semantic and syntactic factors. This second type includes RC attachment. So doing, Construal accommodates cross- linguistic variation by a reclassification of those ambiguities that show variation. More specifically, the authors argue that all languages should show a preference for NP1, following a discourse principle termed Relativized Relevance: ‘preferentially construe a phrase as being relevant to the main assertion of the current sentence’ (Frazier, 1990, p. 321). The existence of an NP2 preference in English is explained with reference to genitive forms. Where most languages have one, English has two: the Saxon (actress’s servant) and Norman (servant of the actress) genitive. The reader assumes that the writer of the RC ambiguity is adhering to the Gricean maxim of clarity. That is, the sentence represents the clearest statement of its idea. If the RC ambiguity uses the Norman form (servant of the actress), the writer has purposefully avoided the Saxon form (actress’s servant, which indicates N1 attachment). The reader will therefore

Evaluation Fernandez’s (1998) investigation of bilingual reading provides evidence against

Construal. Spanish-English bilinguals make attachment decisions consistent with their dominant language. Thus, an analysis of the ambiguity itself will not predict reading preference. This variability lies with the experience of the reader, not their language.

Further evidence undermines the Saxon genitive explanation for the English tendency towards NP2. Mitchell Brysbaert (1998) argue that Dutch has two alternative possessive forms. Because the Gricean contract holds equally for speakers of Dutch as it does for English, Dutch should be a NP2-favouring language. There is good evidence, however, that Dutch readers favour NP1 (Mitchell Brysbaert). To accommodate these findings, Construal might be modified to predict an NP2 shift only when the alternative structure is sufficiently common, not when the alternative is only possible (suggested by Clifton, personal communication to Mitchell, cited in Mitchell Brysbaert, 1998, p. 329). On the other hand, it may make use of additional syntactic information about the NP-complex that precedes the ambiguous RC (that is, the plausibility of the Saxon alternative). The former modification renders the ambiguity process identical to a Tuning account. The latter implies a negative correlation between the plausibility of the Saxon alternative and NP1 preference. No evidence for this correlation was found in either eye-tracking or self-paced reading experiments (Mitchell Brysbaert, 1998; Frazier Vonk, 1997).

The RelPro Drop or ‘anaphoric binding’ model (Hemforth Konieczny, 1996)

This model suggests, like Predicate ProximityRecency Theory, that two factors compete during RC-attachment. And, like Construal, RC attachments fall outside the domain of purely syntactic processing. They are attended by anaphoric processors. And, because these are interpreted at a super-syntactic level, they are free to employ distributional information. Hemforth, Konieczny, and Scheepers (2000) point to the frequent omission of relative pronouns in English relative clauses (this is not the case is most other head-first languages) as a factor that reduces the saliency of the host sites. As

a consequence, they are less likely to be examined by anaphoric processors. These a consequence, they are less likely to be examined by anaphoric processors. These

a parser that favours local attachment. Cross-linguistic variation in preference results from the presence or absence of relative pronouns. For the RelPro model, the parser is a serial probabilistic mechanism (at least in respect to RC attachment).

Evaluation Gibson Schütze’s (1999) three-site RC ambiguity provided evidence in support of the

RelPro Drop model because the parser should prefer to associate the RC with the highest occurring NP – this NP is most likely to be the main assertion of the sentence. In combination with a locality principle, NP1 and NP3 will each be preferable over NP2. However, the recent findings of Desmet and Gibson (in press) cast doubt on this conclusion.

Fernandez’s (1998) bilingual reading study is problematic for this account. Her Spanish-English bilinguals read English RC ambiguities consistent with their own linguistic experience. They produced a ‘Spanish’ NP1 preference in the absence of relative pronouns. According to RelPro Drop, this absence should reduce the saliency of NP1 and therefore reduce the likelihood of an NP1 attachment.

The RelPro Drop model predicts that languages with explicit relative pronouns in RC structures should pattern with Spanish and attach high. The Dutch evidence of Mitchell and Brysbaert (1998) is certainly consistent with this account because relative pronouns must, normally, be present in such constructions.

Summary

In this section, we have tested theories of parsing with cross-linguistic variation. Though the evidence is not conclusive, one may summarize: the Linguistic Tuning Account is consistent with the evidence presented so far; multiple constraint satisfaction (MCS) accounts are somewhat under-specified – as an exposure account, they may share the problems of the Tuning Hypothesis, but (i) their probabilistic processing need not mirror corpus frequencies and (ii) they do accommodate cross-linguistic variation naturally; the Predicate ProximityRecency account was based upon a study of three-site ambiguities that had an experimental confound; the RelPro Drop model successfully predicts NP1 attachment in Dutch, and this is indeed the case; and while Construal can accommodate Spanish-English cross-linguistic differences, its explanation does not appear to work in Dutch, and the theory cannot account for bilingual reading patterns.

With regards the classes of parsing models discussed in the previous section, the evidence generally favours probabilistic parsers, though says little to distinguish parallel from serial.

We will now look at a second test of parsing theories: differences between individuals who speak the same language.

Language differences in parsing

Parts

Dokumen yang terkait

ALOKASI WAKTU KYAI DALAM MENINGKATKAN KUALITAS SUMBER DAYA MANUSIA DI YAYASAN KYAI SYARIFUDDIN LUMAJANG (Working Hours of Moeslem Foundation Head In Improving The Quality Of Human Resources In Kyai Syarifuddin Foundation Lumajang)

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

An analysis of moral values through the rewards and punishments on the script of The chronicles of Narnia : The Lion, the witch, and the wardrobe

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Transmission of Greek and Arabic Veteri

Dukungan

Links

Language differences in parsing

Parts

Dokumen yang terkait

ALOKASI WAKTU KYAI DALAM MENINGKATKAN KUALITAS SUMBER DAYA MANUSIA DI YAYASAN KYAI SYARIFUDDIN LUMAJANG (Working Hours of Moeslem Foundation Head In Improving The Quality Of Human Resources In Kyai Syarifuddin Foundation Lumajang)

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

The Correlation between students vocabulary master and reading comprehension

An analysis of moral values through the rewards and punishments on the script of The chronicles of Narnia : The Lion, the witch, and the wardrobe

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Transmission of Greek and Arabic Veteri

Dokumen yang Anda mencari sudah siap untuk unduhkan