Research Method Research Setting Data Source
25
Figure 3.1 British National Corpus Website
The previous figure was the appearance of British National Corpus site. In the window, there wa
s a “look up” column. In order to find expressions collected in British National Corpus library, the researcher typed the keyword. For each
typing, there will be 50 expressions randomly appeared which contained the keyword. British National Corpus provided code for each expression in which
enables the researcher to track the source of the data. The next figure was the result of the keyword typing.
26
Figure 3.2 The Result of BNC Cluster Sampling
In Figure 3.2, there were 50 expressions selected randomly by the computerized system. Although the researcher typed the same keyword, the
results would be different every time the keyword wa s entered in the “look up”
column. The different system would apply to Corpus of Contemporary American
English. Each computerized corpus had different strength and weakness. Although British National Corpus provided the code which enabled the researcher to search
27 the source, the provided data was randomly selected that the researcher could not
know whether the expression had been selected before. Meanwhile, Corpus of Contemporary American English provided 100 expressions in each keyword
typing and eliminated the expression which had been selected before. The weakness was Corpus of Contemporary American English did not provide the
code for each expression.
Figure 3.3 Corpus of Contemporary American English Website
Figure 3.3 showed the appearance of Corpus of Contemporary American English site. There wa
s “word” column where the keyword should be typed. There were some features to restrict the results of the keyword typing. The
example of keyword typing results was presented in the next figure.
28
Figure 3.4 The Result of COCA Cluster Sampling
Figure 3.4 showed the example of keyword typing result. Corpus of Contemporary American English provided pages in order to keep the data
organized. Each page consisted of 100 expressions. The existence of the same recorded expression could be prevented since the data was kept in organized way.