A PREDICTIVE VALIDITY ANALYSIS ON "SELECTION TEST" OF FOREIGN LANGUAGE DEVELOPMENT INSTITUTE OF NURUL JADID, PAITON, PROBOLINGGO.
DEVELOPMENT INSTITUTE OF NURUL JADID,
PAITON, PROBOLINGGO
THESIS
Submitted in partial fulfillment of the requirement for the degree of Sarjana
Pendidikan (S.Pd) in Teaching English
By:
Althafurrahman
NIM: D95212080
ENGLISH TEACHER EDUCATION DEPARTMENT
FACULTY OF TARBIYAH AND TEACHERS TRAINING
STATE UNIVERSITY OF ISLAMIC STUDIES SUNAN AMPEL SURABAYA
(2)
(3)
(4)
(5)
(6)
Faculty of Tarbiyah and Teacher Training, Sunan Ampel State Islamic University, Surabaya. Advisor: Mohammad Salik
Key Words:predictive validity, test takers’ future like performance, selection test, FLDI
Language proficiency (admission) test has important role in language learning process. It determines whether students have met standard skills (in language) set by an institution or not which is very important for predicting future-like performance of the students. Regarding the important of the test in doing prediction, predictive validity is usually used to know how well the test predicts its test-takers’ future skill and ability is. The research question of this study is: “what is the predictive validity of the “selection test” in predicting student’s future success in Foreign Language Development Institute (FLDI) of Nurul Jadid?”. To answer the question, the researcher used descriptive quantitative research as research design, document analysis as the research instrument, and Pearson Correlation Product Moment as the data analysis technique. The result of the analysis shows that the correlation between variable X and Y is positive (0.403) and the significance of the correlations (0.00) is below the significance level (0.05). These finding shows that high predictive validity is found in the selection of FLDI
(7)
CHAPTER I: INTRODUCTION...1
Background of Study ...1
Research Question ...5
Objective of Study ...5
Hypothesis...5
Scope and Limitation of Study...5
Significance of Study ...6
Definition of Key term ...6
CHAPTER II: REVIEW OF RELATED LITERATURE ...8
Review of Related Literature ...8
Assessment, Test, and Evaluation...8
Assessment ...8
Test ...11
Evaluation...13
Language Testing...15
Validity ...16
Content Validity ...18
Criterion Validity ...19
Construct Validity ...20
Consequential Validity...21
Face Validity ...22
Predictive Validity ...23
Review of Previous Studies ...27
Previous Studies...27
This Study ...36
CHAPTER III: RESEARCH METHODOLOGY ...36
Research Design...36
Population ...37
Countable Population ...37
Uncountable Population ...38
(8)
Data and Data Collection Technique ...40
Data ...40
Primary Data ...40
Secondary Data ...40
Data Collection Technique...41
Research Instrument...42
Data Analysis Technique ...42
CHAPTER IV: FINDINGS AND DISCUSSION ...44
Findings...44
The Scores of FLDI’s Selection Test (2015)...45
FLDI’s Final Examination of the First Semester (2015/2016) ...47
The Result of Statistic Data Analysis Using Pearson Correlation Product Moment ...51
Discussion ...52
CHAPTER V: CONCLUSION AND SUGGESTION...55
Conclusion ...55
Suggestion...55
(9)
1
This chapter discusses the area of the study that will cover background of study, research question, hypothesis, objective of the study, scope and limitation of the study, significance of the study, and definition of key terms.
A. Background of Study
Testing is one of many commonest things in social life.1 Since long time ago, people had to show their ability and capability as a proof that they were worth to be in social class position; army leader, artists, expertise and etceteras. For the sake of perfectness, nowadays, institutions have made many tests to ensure that the test-takers are worth to donate their blood, to have driving license, and also right to be in position as manager, expertise, student or teacher; and the tests might be in form of a placement test, interview, final project, psycho test, and also test that related to language such as TOEFL, IELTS and many others.
By the growing of technology that makes our world smaller, TOEFL and IELTS, as worldwide famous proficiency test, show that language has become a significant skill to be mastered in order to communicate with people around the world and to get a scholarship. Many institutions seriously add, make, and apply language curriculum or standard, even some institutions
are made to facilitate
1
(10)
people who wanted to learn language itself, in order to give each individual certain qualification or skill on certain language (usually English), for the sake of the institutions’ prestige and the future of the individual itself.
In case of applying language curriculum or standard means that the institution must conduct a proficiency and achievement test which has different purpose.2Achievement test is often to be linked to language learning process, and some are in form of: end of semester examination, final project, portfolio assessment and etc. Achievement test measures the result of learning process joined by learner in specific period given by an institution. Proficiency test, more or less, has opposite purpose with achievement test; this test measures the future of a test-takers performance without considering any learning he or she joined before. This test might be in form of placement test, selection test, language aptitude test, academic potential test and etc.
This study will focus on proficiency test since it is very important to know how far our test-takers can adapt, learn, and perform their ability in the institution conducting the test, because we do believe that every institution wants the best candidate to join its program.
In predicting test-takers’ future performance, proficiency test must have high predictive validity, without underestimate other validity, to ensure that the test is proper and it’s expediency is not in doubt, because the purpose of the test is to predict the future performance and outcome as the institutions expect. Predictive validity is known by world wide as an important thing in
2
(11)
doing the prediction of future like performance.3 To measure whether the validity is good or not, we should correlate the result of last proficiency test with the result of first semester or second semester examination; if the correlation shows a positive relation between two tests’ result, it means the proficiency test has high predictive validity, but if the correlation shows negative relation, it means the predictive validity of the test is low.
A research conducted by Mary Kerstjens and Caryn Nery entitled “Predictive Validity in the IELTS Test: A Study of the Relationship Between IELTS Scores and
Students’ Subsequent Academic Performance”,4 implied that proficiency test was an important step to have most adaptive students from around the world applying to study in Australia. In this case, they also stated that predictive validity is necessary to measure that the test is good or not in predicting future performance of student candidates. Another research entitled“Validitas Prediktif Ujian Penerimaan Calon Mahasiswa Universitas Islam Indonesia terhadap Indeks Prestasi Kumulatif Mahasiswa”is the second previous study that was conducted by Irwan Nuryana Kurniawan and Arief Fahmie to know how far the entrance examination of Indonesian Islamic University predicts the students’ academic grade point average. The point is both of researches implied that proficiency test was important in finding, choosing, student
3
H. Douglas Brown,Language Assessment-Principles and Classroom Practices(New York: Pearson Education ESL, 2004), 25.
4
Mary Kerstjens - Caryn Nery, Predictive Validity in the IELTS Test: A Study of the Relationship Between IELTS Scores and Students Academic Performance ,IELTS Research ReportsVol. 3, 2000, 86.
(12)
candidates that were capable to learn in some circumstances and to achieve institutions’ goal.
As implied above, those researches, measured IELTS as one of requirement to go to Australia and learn there, and UPCM as selection test for student candidates of UII (Universitas Islam Indonesia) were regional and international scale of admission and proficiency test, unlike this research. This study tends to measure smaller scale of the test because FLDI (Foreign Language Development Institute) is held by local Islamic boarding school, Nurul Jadid in Paiton, Probolinggo. But, even this study cover smaller scope than researches mentioned before, it is also urgent and vital because this institution’s members will be pupils that in future, taking such IELTS, TOEFL, or other admission and proficiency test, and competing with other test-takers to be able to learn at prestige collage.
In this case, FLDI as language institution in Nurul Jadid must be more aware of its selection test. The reason of conducting selection test is to pick qualified student candidate, because the institution know that not every test-taker is capable in reaching, achieving its standard and goal; following and adapting the teaching-learning method. This test is also important for FLDI because having great alumnus that skilled in English is the vision of the institution.5 This study, conducted in FLDI will reveal how good is the FLDI’s selection test in predicting test-takers future performance; and also
5
Lembaga Pengembangan Bahasa Asing, Profil Lembaga Festival Bahasa LPBA 2014
(13)
will make the institution management be more conscious of the selection test they held.
B. Research Question
1. What is the predictive validity of the “selection test” in predicting student’s future success in Foreign Language Development Institute (FLDI) of Nurul Jadid?
C. Objective of Study
1. To assess the predictive validity of selection test of FLDI D. Hypothesis
1. Null hypothesis (H0)
There is no correlation between FLDI’s selection test score and the first semester final examination score.
2. Alternative hypothesis (H1)
There is positive correlation between FLDI’s selection test score and the first semester final examination score.
E. Scope and Limitation of Study
Validity, in most every field of science (if not all), is an aspect that becomes very important to determine whether data, instrument analysis, result of theory, or result analysis is acceptable and admitted by worldwide. In language testing, validity plays a big role in making a test to be appropriate, meaningful, useful and efficient (Waugh and Gronlund: 2012).6This big role
6
C. Keith Waugh - Norman E. Gronlund,Assessment of Student Achievement Tenth Edition
(14)
separates validity into at least five branches: content validity, construct validity, criterion-related evidence (which divided into concurrent validity and predictive validity), consequential validity, and face validity. Those separations have its own job in measuring a same test or a different one.
The discussion of this study is predictive validity analyzing proficiency test which has prediction ability to see future performance of its test-taker. Since, proficiency test has many types, this study only uses selection test on FLDI (Foreign Language Development Institute) of Nurul Jadid as the subject of the predictive analysis
F. Significance of Study
After considering about background, statements of the problems, and objectives of the study, the researcher also has to consider about the significance of the study.
The expectation of this study is to help the management of FLDI to be more conscious and to do better selection test in ensuring the student candidates’ capability had met demands required achieving the goal of the institution, so that its graduates can compete in prestigious collages. Also, as addition, this study expects the institution increase the predictive validity of the selection test by doing such analysis in future for its own good.
G. Definition of Key Term
• Selection test is included to admission test which predicting future like performance of test-taker without relating test-taker’s education, because
(15)
by taking the test, the institution conducting the test will definitely know whether the test-taker is worth for joining the programs or not.
• Predictive validity is included to criterion-related evidence.7 This validity measures, what Brown called by, prediction of test-takers’ likelihood of future success.8
• Foreign Language Development Institute is an autonomy institution in Nurul Jadid Islamic boarding school, Paiton, Probolinggo, which conducts Arabic and English teaching/learning process separately which the members (students) are students of senior high school from first through third grade.
• Test-taker is a male or female santri, a person studying Islamic boarding school which known as pondok pesantren in Indonesia, who join selection test conducted by FLDI.
7
H. Douglas Brown,Language Assessment-Principles and Classroom Practices(New York: Pearson Education ESL, 2004), 25.
8
(16)
8
going to be presented too in
A. Review of Related Literature
In this sub-chapter, the researcher is going to describe some theories and definition used in term of analyzing predictive validity of “selection test” of
FLDI of Nurul Jadid.
1. Assessment, Test, and Evaluation
Before going to testing in language learning, we need to distinguish test, evaluation, and assessment.
a. Assessment
Assessment can be any kind of responses, questions, answers, and/or activities done by learners in ongoing or in unspecific time after or before teaching-learning process.9
9
H. Douglas Brown, ✁✂n✁ ✂ ✄u ☎✆✄s✄ssmnt -✝✞rin✟✄ips ✁✠n✡✁lmossro✝✁ ✞r t☛✞✄s (New York: Pearson
(17)
Figure 2.1 Teaching, Assessment, Test
In addition, assessment can be an informal or formal assessment, formative or summative assessment, and norm-referenced or criterion-referenced test.10
1) Formal–Informal Assessment
A well planned assessment, having certain measurement rubrics, and recording its result is included in a formal type of assessment. Any test, when it has been prepared or conducted must be a formal assessment because it has met the criteria mentioned. However, formal assessment does not only take form of test, it can be drama project, portfolio project or other assessment that does not meet a
demand of test’ criteria.
Informal assessment, of course, has opposite criteria showed by formal assessment. It can be any response such answers, comments,
10 Ibid, 5.
(18)
arguments, which came from the students; it also does not have clear measurement rubrics, and usually unplanned.
2) Formative–Summative Assessment
Formative assessment is scoring done in an ongoing teaching-learning process with forming students’ comprehension, skill and
knowledge of language as purpose of the scoring activity. Other
purpose of this assessment is determining students’ development in
teaching-learning process. By doing this assessment, teacher can
observe students’ comprehension in ongoing teaching-learning process, decide whether his/her teaching methods are appropriate or not, and change the methods used if necessary. Teacher or administrator can apply formative assessment periodically (beginning, middle, or end) or continuously in ongoing teaching-learning. In this case, informal assessment is a form of formative assessment because formative assessment will not record the result
of the assessment but rather “repair” methods used and lead
students to the objective of the course.
An achievement or any other test usually conducted to summarize all results of learning process done by students is
included in summative assessment. This assessment’s purpose is to
gather information for further analysis in order to determine
students’ comprehension and rank them according their
(19)
proceed to higher class or school level. But in the other way, if the result is not approaching indicators or goals set by teacher, so the student will stay in the same class or do remedial.
3) Norm-referenced–Criterion-referenced Test
Norm-referenced test is kind of test that the result of it is written in numerical and/or percentile rank record with placing students in rank order according its score as its purpose and having no feedback from the test-administrator. This test, usually take form of Scholastic Aptitude Test (SAT) or Test of English as Foreign Language (TOEFL).11
In a same line, criterion-referenced test is also a test that recording the test-takers’ result in numerical record with feedback
from test-administrator (teacher). Here, the teacher needs extra effort and time to deliver or explain many things about the feedback given to the test-takers (students).
b. Test
According to Brown, test is method to accurately measure capability, comprehension, or qualification of a person in certain scope and domain.12 A test purpose is to know who test-takers are, what they can do, how far they can do it, what benefit they have for the tester, and of course in a desired domain.
11
H. Douglas Brown,☞ ✌✍nu✌ ✍✎✏✑✎s✎ssmnt -✒✓rin✔✎ips ✌n✕✖l✌ossrom✒r✌ ✓✗ ✓✎t s (New York:
Pearson Education ESL, 2004), 7. 12
(20)
According to McNamara, based on the purpose of test, there are two types of test, achievement test and proficiency test:13
1) Achievement Test
Achievement test connects with teaching-learning processes in the past which conducted in order to know test-taker’s
comprehension about the materials taught in the course and also to determine whether the objectives or goals or indicators set by the institution had been reached by test-taker. Final examination is an example of this test, because it examines test-taker’s
comprehension about material taught in the previous course.
Even though this test type is interested in testing test-taker’s
comprehension got from the past, but this test is often used as reference in determining test-taker’s future.
In the other hand, achievement test cannot be a predictor in predicting test-taker’s future performance if the syllabus of the course does not imply clearly that the test can do prediction.
2) Proficiency Test
This test is the opposite of achievement test which focuses on the past time; proficiency test focuses on the future of test-taker. This test type’s purpose is to predict the future-like performance of test-taker without considering his/her educational background.
13
(21)
In the practice of proficiency test, it has criterion that must be reached by test-taker so that his/her capabilities can be considered in standard level set and having bright future-like performance which needed by institution holding the proficiency test.
One of examples of this test is driving test to get driving license. In this test, test-taker needs to join several subtests such as traffic sign test, driving test itself, and driving theory test. Each subtest has its own criterion to be fulfilled by test-taker which determining whether the test-taker is feasible to have driving license.
c. Evaluation
Cross stated that evaluation is a determining process whether the objective/s of a course or teaching-learning process have been reached or achieved.14 Evaluation, according to Kusuma, must be systematic
and continue in order to describe student’s capability.15
Teacher often misplaces evaluation in specific time (beginning of the course, middle, and end), and the result of this misplacement is lack of information received from the misplaced evaluation. The lack of information that teacher got causes a massive misused methods which triggering mislead student. Kusuma stated that it is batter to
14
A Cross,Home Economic Evaluation(Colombus Ohio: A Bell & Howell Company, 2013), 5. 15
Mochtar Kusuma,Evaluasi Pendidikan, Pengantar, Kompetensi dan Implementasi, (Yogyakarta: Parama Ilmu, 2016), 3.
(22)
evaluate students or method used every single day by making and schedule of the evaluation systematically.16
In teaching, teacher must be aware of students’ capability in
comprehending materials taught in class, because each individual has
different comprehension level. To know student’s comprehension,
teacher must evaluate his/her development since the beginning until the end of the course. The purpose of evaluation, according to
Republic Indonesia’s law number 20 year 2003 about national education system chapter 157 verse (1), to control the quality of national education as accountability form of caretaker of education to students, institutions, and education program.17
After those explanations above about test, assessment, and evaluation, we can draw a conclusion that each of them has similarity and dissimilarity. Test is a planned assessment which usually conducted in the end of a course or before based on its purpose, for example: final examination or admission test. Assessment is any planned and unplanned question or instruction for students to know his/her comprehension,
including test. Evaluation is a systematic process to determine student’s
comprehension in the beginning, ongoing, or at the end of a course or teaching-learning process. So, the similarity of those terms is that the
16
✢bid, 3
17
(23)
purpose of conducting those (test, assessment, evaluation) in determining
the method used and student’s comprehension.
The dissimilarity between them is about the procedure applied in conducting each of the three terms: test usually conducted before or after teaching-learning process, assessment knows no specific time in term of applying it (can be planned or unplanned), and evaluation also knows no time (can do evaluation in any moment) but need to be planned.
2. Language Testing
After knowing a brief explanation of test, evaluation, and assessment, we can guess what language testing is, and how important it is. Language testing is, of course, a test to measure test-takers’ capability, comprehension, or performance in certain language. As skill, language demands good speaking, writing, listening and reading of a speaker, so
language testing provides instruments to measure and grade speakers’
ability in acquiring certain language.
This testing may take form of proficiency test, final examination, admission test and et cetera; tests known by worldwide as credible
language testing to measure speakers’ comprehension are IELTS and
TOEFL, actually many kinds of good test but not as famous as those. As a test, language testing means to measure test-takers ability in certain language; and followed by high scale of validity as one of important point in making good test.
(24)
Before going to the next sub-chapter, the researcher think that it would be necessary to sum up the explanation above about assessment, test, evaluation and language testing. As explained above, assessment, test, and evaluation has similarity and dissimilarity. The purpose of all three terms
is the same, examining students’ comprehension and teaching methods used, and the dissimilarity is about procedure used in applying the three
terms. Language testing, in the same line also examines students’
comprehension in learning or mastering language and also methods used in teaching language. The researcher assumes that all three terms might involve and influence language testing. In other words, language testing may be in form test, assessment, or evaluation; and covered those three terms.
Figure 2.2 Language testing (evaluation, assessment, test)
3. Validity
One of important things in a research, evaluation or testing, and data measurement is validity. Joppe stated that:
LANGUAGE TESTING
(25)
“Validity determines whether the research truly measures that which it was intended to measure or how truthful the research results are. In other words, does the research instrument allow you
to hit "the bull’s eye" of your research object? Researchers
generally determine validity by asking a series of questions, and will often look for the answers in the research of others.”18
McNamara also stated, in his book Language Testing, about validity:
“The purpose of validation in language testing is to ensure
defensibility and fairness of interpretations based on test
performance…. If no validation procedures are available,there is
potential for unfairness and injustice.”19
In other words, research is valid if it truly examines the object of the research and provides results from it. For example, a measurement tool,
let’s call it as speedometer, is valid when measures the speed of a car; and no longer valid when it measures the speed of snail. Even though the speedometer measures speed, it will be invalid if the speed is too slow (in
this example is a snail’s speed) or too fast. A test is in same condition as
instrument of measurement mentioned, it can be classified as an invalid
18
Marion Joppe, The Research Process University of Guelph-School of Hospitality, Food and Tourist Management(https://www.uoguelph.ca/hftm/research-process, accessed on April 25, 2016)
19
(26)
test when a kid takes the test that tended to be taken by adult; even it is possible for the kid to finish it perfectly.
In addition, Messick stated that validity of a test is all about suitability, significance and utility of the result of the test itself,20 and others viewed validity as the interpretation’s level of a test score.21 From this additional perspective, validity is all about the qualification of interpreted test score or research result with the support of evidence and theory.
In study of language testing, validity has many categories that help teacher, institution or any individual to make and hold fine and fair test. Brown categorized validity into five parts22: a) content validity, b) construct validity, c) criterion validity, d) consequential validity, and e) face validity.
a. Content Validity
This validity concerns about positive relation between the content
of a test and the test’s purpose which means that the test’s content must
exactly examines and produces results that fulfilling the purpose of the
test. For example, if a test wants to know the speed of football player’s
20
Samuel Messick Ruth Linn. (Ed.),✣✤u✥✦✦tionl Measurement: Validity (New York: McMillan,
1989), 13-103. 21
American Educational Research Association, American Psychological Association, and National Council on Measurment in Education,Standards for Educational and Psychological test
(Washington, DC: American Educational Research Association, 1999) 9. 22
H. Douglas Brown,Language Assessment-Principles and Classroom Practices(New York: Pearson Education ESL, 2004), 22.
(27)
shot, the right examination of this kind of purpose is to ask the player to shoot a ball toward goal and measure the speed of his shot.
b. Criterion Validity
Every test must have a minimum score as standard result whether the test-takers pass the test or not; and to recognize that they truly achieve the standard of a test purpose, criterion validity comes to measure the test-takers’ achievement by comparing the current result
with other result of similar test and purpose to support the evidence that
they deserve it. For example, an English admission test’s minimum
score is 80, test-takers must reach the score, and if they did, the examiner will compare the result with other result of similar test, purpose and standard to the current test to prove that the test-takers have truly have ability in reaching the minimum score or standard.
According to Hughes, criterion validity is used when there are two or more tests which executed simultaneously (one test is main test and others are assistance test).23Word “simultaneously” means that the tests
are held in the same day and at the same time (even not exactly the same time). Brown divided criterion validity into two parts24:
23
Arthur Hughes,✧★✩stin✪or ✫✬✩n✬✩ ★u ✧★✬ ✭ ✮★r (Cambridge: Cambridge University Press, 2003),
27. 24
H. Douglas Brown,✫✬✩nu✬ ✩ ★✯✰★s★ssmnt -✱✭rin✲★ips ✬n✳✴✬lossrom✱✬ ✭r ✵✭ ★t s (New York:
(28)
1) Concurrent Validity
This validity comes to measure the result a test by comparing
other similar test’s result that has similar purpose with the first test in order to provide evidence and clarify the first’s result.
2) Predictive Validity
Usually, a test that should be measured by this validity has purpose to eliminate test-takers and pick some who reached the standard determined by an institution when they wants to study or work in it. Hughes stated this validity is used to predict student or
worker candidates’ performance and achievement.25
The difference between predictive validity and concurrent validity is the purpose of a test made. The purpose of a test with concurrent validity tends to see how high the achievement of a test-taker, and then provide evidence from another test to support the result of taken test. Whereas, the purpose of a test with predictive validity tends to predict test-takers performance in future time.
As the main topic of this research, predictive validity will be discussed in further explanation in different sub-chapter.
c. Construct Validity
25
Arthur Hughes,✶✷✸stin✹or ✺✻✸n✻✸ ✷u ✶✷✻ ✼ ✽✷r (Cambridge: Cambridge University Press, 2003),
(29)
Before going to the definition of this validity which is also called by construct-related evidence, it must be clear what construct is. Construct, according Brown, is a complex theory, hypothesis or model of bigger idea that explains phenomena in conception domain.26
Construct validity is a validation measuring the theories or topics used in an assessment that are connected to specific language construct. For example, an institution wants to have English written test that requiring test-takers to write an opinion for about two thousand words. Several considerations in the test that help scoring analysis: word spelling, vocabularies used, grammatical accuracy, and idea of the text.
Usually, construct validity is used to measure large scale test such as TOEFL and IELTS. Fraenkel added content and criterion validity as part of construct validity since the scale of the test measured is large.27 d. Consequential Validity
This part of validity is considered to watch over the consequences that may occur in a before and after test, and an ongoing test. The impact can appear in form of test-takers preparation, their socio-economic aspect, and the way they learn and perform after the test28. For example, McNamara stated that an appearance of a test may trigger 26
H. Douglas Brown,✾ ✿❀nu✿ ❀❁❂❃❁s❁ssmnt -❄❅rin❆❁ips ✿n❇ ❈l✿mossro❄r✿ ❅❉ ❅❁t s (New York:
Pearson Education ESL, 2004), 25. 27
Jack R. Fraenkel, et.al.,How to Design and Evaluate Research in Education(New York: McGraw-Hill Education, 2011)
28
H. Douglas Brown,Language Assessment-Principles and Classroom Practices(New York: Pearson Education ESL, 2004), 26.
(30)
at least a course that provides its student with information how to pass the test; some test-takers can afford to pay in order to join the course, but the rest probably cannot.29 Other example is the impact of the test on test-takers’ preparation that takes more their time for study than
socialize with their neighbors or friends. e. Face Validity
Face validity, the last part of validity, can be described as a
test-takers’view on the test.30This validity is a subjective matter of the test-takers that not even expert can judge that the test has high face validity or not. Let me put it like this, for example, a test may have low face validity because the test-takers think that the test is misplaced. They believe that there is a better test that more appropriate to be tested because they were not prepared for the test, and they thought that it must me something else.
29
McNamara,❊❋●n❋●❍u ■ ❍stin● (Oxford: Oxford University Press, 2000), 54.
30
H. Douglas Brown,❊❋●nu❋ ●❍❏❑❍sssm❍nt -▲▼rin◆ ❍ips ❋n❖P❋lossrom▲❋▼r ◗ ▼ ❍t s (New York:
Pearson Education ESL, 2004), 26.
Validity Construct Validity
Content Validity
Consequential Validity Face Validity
Content Validity Criterion Validity
(31)
Figure 2.2 Brown’s five validities
In short, validity is the pin point of any research or testing (in this case
language testing) to make the research acceptable in the scholars’ point of
view. In language testing, validity has five types of validation: content validity, criterion validity, construct validity, consequential validity, and face validity. All of those types are connected each other even construct validity only used in large scale of proficiency test as measurement.
4. Predictive Validity
Predictive validity is a validation procedure taken to show effectiveness of a test in predicting future-like performance of people in certain activity. As explained in the first chapter, collage entrance test, driving test, and other admission batteries including language proficiency test are tests or assessments that need predictive validity approach to help officer or institutions to choose or pick person needed.
Ary et al stated that predictive validity is a relationship between a score measured and score from a criterion.31Also, the researcher has mentioned above that predictive validity is part of criterion validity that measure prediction of test in indicating test-takers future performance. As part of validity, this predictive validity has big role in delineating institutions
31
Donald Ary, et al.,❘ntro❙ ❚ution❯o❱❲❲❳s ❚r❨❘n❩❙u❚❳tion ,(USA: Wadsworth, Cengage Learning,
(32)
future success, in small scale, test-takers’ delineation in performing their
abilities to work or study as the institution demanded.
How to find whether the predictive validity of a test is high or not? to know it, we have to use correlation analysis. In examining the prediction of a test, Kusuma in his book, Evaluasi Pendidikan, Pengantar, Kompetensi dan Implementasi, stated that there are two important technical terms: predictor and criterion.32 Predictor is a test that its predictive validity is being measured; meanwhile, criterion is predicted performance of test-taker by the test that indicating success in a learning process, and usually takes some period time to be reached. Donald Ary et al. also stated that criterion only available in future time.33
It must be kept in researcher, teacher, and institution minds that creating criterion is little bit tricky, because they must consider base rate of the test-taker. Base rate is test-taker’s ability used to reach the criterion made; the
higher test-taker’s base rate means that the criterion is too easy, just the opposite, the lower test-taker’s base rate means that the criterion is too
hard to reach.34
32
Mochtar Kusuma,❬❭❪❪lusi ❫ ❴n❵ ❛❵ikan, Pengantar, Kompetensi dan Implementasi , (Yogyakarta:
Parama Ilmu, 2016), 50. 33
Donald Ary, et al.,Introdcution To Research In Education,(USA: Wadsworth, Cengage Learning, 2010), 229.
34
Mochtar Kusuma,Evaluasi Pendidikan, Pengantar, Kompetensi dan Implementasi, (Yogyakarta: Parama Ilmu, 2016), 51
(33)
Kusuma also stated a procedure in validating predictive validity of a test:35
• Make test item(s) which is appropriate with test-maker’s goal
• Determine subject of the pilot study
• Indentify criterion to be reached
• Wait for criterion variable to appear
• Achieve the criterion
• Correlate two scores from the test and the criterion
So, the first step is to make our own test or use other test to be used as predictor; off course the test must be valid and reliable. The second step is to choose subject to become the population of this validation. The criteria of the subject must be clear so that the examiner can choose it wisely. The third step is creating criterion to be reached by the subject; this criterion
also must be valid and reliable. The fourth is waiting for the criterion’s
appearance. In assessing predictive validity, we must wait for the criterion to appear because, unlike concurrent validity, it would take some time for the subject to reach the criterion as described by Ary, et al:36
35 Ibid,52 36
Donald Ary, et al.,Introdcution To Research In Education,(USA: Wadsworth, Cengage Learning, 2010), 229.
(34)
Figure 2.3 Description of Criterion Validity by Donald Ary et al
The fifth step is that the subject must reach the criterion. Example for the fourth and fifth step: if the examiner assessing predictive validity of an admission test (predictor) of a collage, he must wait for his subject (student in this case) to study for, at least, a semester, finish the examination of the semester (criteria or indicator) and wait for the exam’s
score (criterion). The last step is correlating the admission test and the examination of the semester. If the correlation between the two scores is high this means that the test has high predictive validity.37 But, need to know that this study will pass several steps because the pertinent institution (FLDI) had conducted selection test (predictor) and first semester final examination (criterion). So, the researcher will directly correlate those variables (predictor and criterion).
There is one more thing to be put in caution, as Kusuma stated that general principle of test also is applied in examining predictive validity of
37
Mochtar Kusuma,❜❝❞❞lusi ❡ ❢n❣ ❤❣ikan, Pengantar, Kompetensi dan Implementasi , (Yogyakarta:
(35)
a test: there is no test has perfect prediction, so the score of a test is also imperfect.38
B. Review of Previous Studies
A research, in order to be accepted must have foundations underlying it; and one of the foundations is using previous studies for this research. Knowing previous study finding, we can understand what had been done and undone yet that can help us in doing next research.
In this case of predictive validity analysis or investigation, many institutions, groups, or individuals had conducted predictive validity analysis on admission or placement tests to evaluate how predictive the test was, and it will be briefly described in following sub-chapters along with differences of this research compared with previous studies mentioned below.
1. Previous Studies
Mary Kerstjens and Caryn Nery had been conducted an analysis on IELTS scores and students performance in academic domain.39 This
research’s purpose was to know how high IELTS’s prediction on students
from non English country who learned in Australia in different major of studies by using Pearson correlation product moment correlating their IELTS scores and first semester academic performance in form of grade
38
Ibid, 50. 39
Mary Kerstjens - Caryn Nery, Predictive Validity in the IELTS Test: A Study of the Relationship Between IELTS Scores and Students Academic Performance ,✐❥ ❦❧♠♥♦♦♣s rq r♥♦ports Vol. 3,
(36)
point average (GPA). They also used questionnaire for the students and interview for academic staff in technical and further education (TAFE).
The result of their research showed that correlation between IELTS scores and GPA of the students was positive even though from the research IELTS was not a significant predictor for academic performance since only reading skill that proven as critical skill involved in academic
performance. The students and staff’s responses to questionnaire and interview stated that they agreed that reading skill was most influence material in academic performance; and they considered higher IELTS
scores as aid in helping students’ learning process. The staff also added
that many factors influence students’ academic performance whether
inside or outside education domain.
Another study investigated IELTS as significant predictor also conducted by Patricia Dooey and Rhonda Oliver. They sought whether IELTS could be a credible predictor for college-students’ success on
academic performance in Curtin University Australia remembering that
this international test was one of admission test predicated as “must-pass”
test to study in English-speaking country, in this case is Australia.40 College-students participated in this investigation were joining three different majors, business, engineering, and science.
40
Patricia Dooey Rhonda Oliver An Investigation into the Predictive Validity of the IELTS TEST as an Indicator of Future Academic Success ,srospt✉t Vol. 17 No. 1, April, 2002, 36.
(37)
This research which based on correlation method stated that IELTS (still) was insignificant indicator of academic success of the students. It was caused by the evidences in Dooey and Oliver research showing only a subtest, reading, that could be a significant predictor for all students from three differences majors. The result was the same with research conducted by Kerstjens and Nery which showed that IELTS was insignificant predictor for college-students success.41
Besides IELTS, TOEFL is also considered international English proficiency test and, usually, in the same time, and admission test also. In this case of predictive validity analysis, TOEFL had been investigated by Zhang Yan. He had conducted a research to know whether the test had
high predictive validity or not on students’ first term’s GPA joining
international exchange students between UBC (University of British Columbia) and Ritsumeikan University by using regression as analysis tool.42He involved five variables: writing scores, speaking scores, gender, total TOEFL scores, and TOEFL sectional scores.
The predictive validity found in the research was medium as Zhang Yan elaborated his finding according to the variables.43 Unsurprisingly,
41
Mary Kerstjens - Caryn Nery, Predictive Validity in the IELTS Test: A Study of the Relationship between IELTS Scores and Students Academic Performance ,✈✇① ②③④ ⑤s⑤⑥r⑦ ⑧④ ⑤ports Vol. 3,
2000, 105. 42
Zhang Yan, master thesis: ⑨r⑤⑩ ❶⑦❶ ❷ ⑤t Validity of TOEFL Scores on First Term s GPA as the
Criterion for International Exchange Students (Vancouver: University of British Columbia, 1995), 21.
43 Ibid, 65.
(38)
total TOEFL scores as predictor on the students’ GPA was in mediocre or
medium level. The result of this first variable found because TOEFL was
not the only factor affected the students’ academic performance such as dormitory or class environment. Sectional scores was in small level as indicator for the GPA since only section II (writing and grammar knowledge) showed to be in medium level because this skill was considered to be a useful skill in doing task since many assessments were made in form of productive skill (writen); the others two, section I (listening comprehension) and section III (reading skill) were small and
negligible. Last, as predictor on the students’ GPA, speaking scores was
minor in predicting the GPA, writing scores had small prediction on the
students’ GPA.
There was a simple and temporary explanation about a confusing
finding about two results of two writing’s assessments, why writing knowledge’s score in section II was medium and an independent writing score was small. Zhang Yan explained that in section II, it was writing
knowledge not writing skill, remembering Japanese students’ intermediate skill in writing, “they might need more basic writing knowledge” Zhang
Yan stated,44 and gender was a medium predictor on the students’ GPA
since widely known that non-language and academic domain could
whether disturb or help students’ success.
44
Zhang Yan, master thesis: ❸r❹❺❻ ❼t❻ ❽❹Validity of TOEFL Scores o n First Term s GPA as the
Criterion for International Exchange Students (Vancouver: University of British Columbia, 1995), 71.
(39)
The next previous study mentioned here is a research conducted by Irwan Nuryana and Arief Fahmie about predictive validity of UPCM (Ujian Penerimaan Calon Mahasiswa) at 1999 and 2000 in UII (Universitas Islam Indonesia/Indonesia Islamic University), entitled
“Validitas Prediktif Ujian Penerimaan Calon Mahasiswa Universitas Islam Indonesia terhadap Indeks Prestasi Kumulatif Mahasiswa”. I
included this research because it focused on predictive validity analysis even it had nothing to do with language proficiency. Their basic method in
analyzing predictive validity of UPCM on students’ GPA in their research
was using Pearson correlation product moment as same as the first previous study mentioned; Kurniawan and Fahmie correlated UPCM scores and GPA of students of year 1999 and 2000 whom learned in different major of studies.45 From the research, they found that UPCM had
become insignificant predictor for students’ GPA because there was a
subtest in it that had negative correlation with the GPA, the lower
students’ grade the more GPA they could got.
A research by Renistri Mudela also used predictive validity as main topic for her study.46Her research analyzed the predictive validity of APM
45
Irwan Nuryana Kurniawan Arief Fahmie Validitas Prediktif Ujian Penerimaan Calon Mahasiswa Universitas Islam Indonesia terhadap Indeks Prestasi Kumulatif Mahasiswa ,
❾❿nom❿n➀ Vol. 3 No. 1, Maret 2005, 59.
46
Renisti Mudela, Validitas Prediktif Skor Advanced Progressive Matrice (APM) dan Skor Skala Minat Pekerjaan Terhadap Prestasi Belajar Siswa: Studi Deskriptif Korelasional Skor Inteligensi (APM), Skor Skala Minat Terhadap Prestasi Belajar Siswa Kelas X dan XI, academic year 2013/2014 UPI Digital Repository, Indonesia University of Education,
(40)
(Advanced Progressive Matrice) and SMP (Skala Minat Pekerjaan / job proclivity scale) scores towards first and second grade senior high school
students’ achievement. The population of this research was first and second grade senior high school students with details: 1st grade of SMA Negeri 2 Bandung (342 students), 2ndgrade of SMA Negeri 5 Cimahi (344 students), 2ndgrade of SMA Negeri 1 Marhagayu (396 students), 2nd grade of MA Negeri 1 Bandung (291 students), 1st grade of MA Persis Katapang (47) and SMK Negeri Katapang (377). Method used in this research was correlational descriptive with document analysis as the data collection technique, since the variables were scores of APM (variable X1), SMP
(variable X2), and students’ achievement (Y). In analyzing the data, Mudela used Pearson Correlation Product Moment.
After the statistical analysis, she concluded that the relationship among variables showing positive correlation but the significance is weak since variable X1 and X2 (must be) placed in positive row with plus (+) sign but not powerful enough to influence variable Y drastically.
Validitas Prediktif Tes Komptensi Berbahasa Indonesia Menurut Minat Belajar is a research conducted by Elvi Suzanti, the purpose of this research was to assess the predictive validity of Indonesian speaking competence test towards result of Ujian Nasional (UN) or Indonesian national examination section based on junior high school students interest
(41)
in learning Bahasa Indonesia.47 The competence test here was formed by two parts, Part A was spelling test, and Part B was reading test.
This research by Suzanti was conducted in state junior high school 51 and 25 Jakarta using survey and ex post facto methods. The sample of this study was 494 respondents from third grade of both schools. Suzanti wanted to know the predictive validity of the competence test by using regression (t) analysis (multiple regressions) for her research which was different with other previous studies in this chapter. In analyzing process, data was divided into two categories based on high or low interest in learning. To indentify high or low interest in learning, Suzanti used regression (t) too.
The result of the analysis showed that the predictive validity of
bahasa Indonesia competence test was high, and the test could be used as predictor of UN (national examination) in junior high school level. The
proof of this claim was that the competence test’s predictive validity
towards students who had high interest in learning Indonesian (1) and joined the UN was 36.36 > 01.645; the competence test’s
predictive validity towards students who had and students with low interest in learning Indonesian (2) and joined UN was 02.484 > 01.645.
47
Suzanti, Elvi, Validitas Prediktiv Tes Kompetensi Berbahasa Indonesia Menurut Minat Belajar
➁➂ ➃ ➂n➄ ➅n➆ ➅mbangan dan Pembinaan Bahasa Kementrian Pendidikan dan Kebudayaan
(http://badanbahasa.kemdikbud.go.id/lamanbahasa/produk/1319, accessed on 23 August, 2016).
(42)
Joana Francisca Reni Dwi Astuti had been examined predictive
validity of national examination in Indonesia towards students’
achievement entitledValiditas Prediktif Ujian Nasional Terhadap Prestasi Belajar Pada Mahasiswa Universitas Sanata Dharma.48This research was conducted to find an empirical proof that predictive validity of Ujian Nasional (UN) academic year 2004/2005 can predict future-like achievement or grade point average (GPA) of collage students.
Astuti used Pearson Correlation Product Moment as data analysis technique. She correlated not only UN generally but she analyzed all subtest of UN to find relationship between the scores of each section with
the students’ GPA. The subjects of this study were active students of Faculty of Psychology year 2005 that had joined UN academic year 2004/2005 from first semester to seventh semester (students on IV semester and VII semester for specific). Last, the cumulative scores of UN would be separated according to majors taken by student so see which majors had more potential success if a student is accepted in Faculty of Psychology.
From the research, Astuti found that there was no correlation between
UN scores and students’ GPA on fourth semester according to the result of statistical analysis (r=0.176, p=0.123), and seventh semester also proofed the same (r=0.188, p=0.099). Here was also the result of each subtest
48
Joana Francisca Reni Dwi Astuti, thesis: Vliditas Prediktif Ujian Nasional Terhadapt PRestasi Belajar Pada Mahasiswa Universitas Sanata Dharma (Yogyakarta: Universitas Sanata DHarma, 2010), vii.
(43)
correlated with GPA, Indonesian (r=0.236, p=0.038), English (r=-0.011, p=0.925), math (r=0.078, p=0.652), and economy (r=0.462, p=0.002). Also, the result of correlation showed that collage students from social studies (IPS) major in senior high school had more potential success (r=0.355, p=0.023) than students from mathematical and natural sciences studies (IPA) major in senior high school (r=0.048, p=0.783).
The conclusion of her study showed that the predictive validity of national examination (ujian nasional/UN) was low and could not be used as one of admission battery for university, collage, or campus.
As last previous study, a study conducted by M Zakaria Adityawarman entitled “Predictive Validity of the Collage Entrance
English Test of UINSA Surabaya toward Students’ Achievement of
Teacher English Education Department (PBI)” is a research to analyze
predictive validity of admission test.49 Adityawarman analyzed the predicitive validity of Collage Entrance English test of UINSA (State
University of Islamic Studies Sunan Ampel) in predicting students’ future
performances by correlating its scores with the students’ achievement in
the intensive course of English Teacher Education Department. This study used Pearson r or Pearson product moment correlation; and had been concluded that the Collage Entrance Test had good predictive validity.50
49
M. Zakaria Adityawarman, Predictive Validity of the Collage Entrance English Test of UINSA Surabaya toward Students Achievement of Teacher English Education Department (PBI) (Surabaya: State University of Islamic Studies Sunan Ampel, 2016), 3.
50
(44)
2. This Study
In this thesis, the researcher would like to do predictive validity research which the foundation of the study is clear (as presented in chapter one and two) but the object of this study is different with the previous studies mentioned above. Underlining differences in this analysis is that the object of its scores of high school around 13-17 years old students who decided to learn English in FLDI of Nurul Jadid. The environment of the Islamic-boarding school does help them to learn and practice intensively. Its environment is closed enough for outsider to “disturb” their learning processes, FLDI’s rules are strict also, where every member of the
institution must speak in English which help their development in mastering the language; it does not like the previous studies that used scores of college-students that took different majors in their universities which language was not their top priority, and non-language factor, environment,51 might be an obstacle since they learned in different countries.
51
Zhang Yan, master thesis: ➇r➈➉ ➊ ➋➊ ➌➈t Validity of TOEFL Scores on First Term s GPA as the
Criterion for International Exchange Students (Vancouver: University of British Columbia, 1995), 74.
(45)
➍6
A. Research Design
Research design is an adjustment, adjusting research so the researcher can get valid data needed that suitable with the research variables and purpose for further analysis, and research design also determines instrument used, of the research. In this case, the researcher will use quantitative method, because the data is numerical form since the researcher is eager to analyze scores of test-takers joined selection test of FLDI.
For specific, this study will use correlational research for this study. Correlational research is a research method used to find relationship among variables.52 In this method, there are two categories in interpreting the result of the correlation analysis. First category is signs (+ or -). These signs is used to interpret or indicate relation among variable whether it is positive (independent variable increasing, the other follows) or negative (dependent variable decreasing, the other follows). The last category is the strength of relationship. Strong relations among variables show that the independent variable having powerful influence on the dependent variable. Weak relations among variables show that the independent variable having lack influence on the dependent variable.
52
Donald Ary, et al.,Introdcution To Research In Education,(USA: Wadsworth, Cengage Learning, 2010), 350.
(46)
B. Population
In conducting a research, we have to limit the subject included in it with detail information about the subject. Population is a well-described subject of research which sample drawn from. As addition, as Sugiono stated, subject/object with limitation must have same properties and status which is decided by researcher to be analyzed, and to draw conclusion from it.53
According to Taniredja and Mustafidah from Nawawi, population is a whole subject consisted with human, things, animals, plants, and phenomena as source of the research.54
Taniredja and Mustafidah also stated that population, based on its total number, divided into two categories:55
1. Countable population
This first type of population has data source with obvious limit quantitatively. For example, a population of all of senior high school students who joined final examination 2015; all of senior high school students are the population of a research which can be counted by using secondary data from ministry of education and culture, and the population is limited on year 2015 which is easier for researcher to get the population
➐➑
➒ ➓➔ → ➣↔➣↕Metode Penelitian Kuantitatif Kualitatif Dan R&D↕➙➛ ➜↔➝➓↔➔➞➟➠ ➡➟➛ ➢➤➟➥➦ ↕➧➨ ➩ ➧➫ ↕ ➧ ➩➭
➐➯
➤➓➲→➳➜↔➤➜↔→➳ ➵➝➸➜➺➻→ ➝➜➼➜➽→➾➓➚➽➜➪→ ➝➜➶↕Penelitian Kuantitatif (Sebuah Pengantar)↕➙➛➜↔➝➓↔➔➞ ➟➠ ➡➟➛➢➤➟ ↕➧➨ ➩➹ ➫↕➘ ➘➴
➐ ➐
(47)
data. This example shows the meaning of “has data source with obvious limit quantitatively”.
2. Uncountable population
The second type of population has data source with non obvious limit quantitatively. For example, all customers in traditional market in East Java, all transportations passed through Soedirman street; it means that those population data can be collected by doing observation on the spot, but with unclear quantitative limit, researcher cannot sure how many customer did transaction in the market or how many transportation passed through Soedirman if there is no limit quantitatively like year, or date. It means that the population chose cannot be counted since there so many of them without specific limit.
The population of this research is male and female students of Nurul Jadid
Islamic boarding school who had joined “selection test” conducted by FLDI
in August 2015 and passed through the test; total population in this research
is 86. On “selection test”, the test-takers divided into three classes based on their gender and dormitory. Male test-takers used class in central dormitory, female test-takers was divided into two classes, class in west dormitory (al-Bayan) and east dormitory (al-Hasyimiyah).
After passing the selection test, they have been divided into six classes with two categories. First category is male– female, meaning that the classes are divided by gender of the students; since the institution is an Islamic boarding school, the students must be divided according to their gender
(48)
(divided into several dormitories and applied also for formal schools). Second category is dormitory, meaning that the classes also separated by the dormitory which the students staying in. the institution divided the classes according to three big dormitories, central dormitory (male), west dormitory (al-Bayan/female), and east dormitory (al-Hasyimiyah/female); each dormitory has two classes. The average age of the students was for about 13-17; they also were in first and/or second grade of senior high school in Nurul
Jadid. So, researcher’s population here is countable population, because it has
limit quantitatively. C. Sample
Sample is part of population that has same characteristic used for research and the result of sample analysis will be treated for the population.56 Sample is important in research, because it can ease the burden of researcher by only focus on several individuals in a population instead of focusing on the whole population with great amount number of individual. The result of studying the sample can be interpreted as the result of studying the whole population. But, since the total population in this study is 86, this study will not take any sample because the population is reachable and easy to be measured.57 According to Arikunto this kind of study is called by population research.58
➷6
V. Wiratna Sujarweni,Metodologi Penelitian Lengkap, Praktis, dan Mudah Dipahami,
(Yogyakarta: PUSTAKABARUPRESS, 2014), 65. 57
Tukiran Taniredja Hidayati Mustafidah,Penelitian Kuantitatif (Sebuah Pengantar), (Bandung: ALFABETA, 2014), 34.
58
(49)
D. Data and Data Collection Technique 1. Data
According to Noor, data is information received about fact or phenomenon in form of quantitative data (numbers) or qualitative data (words).59Data is divided into two parts:
a. Primary Data
Primary data is data collected by the researcher himself by using observation, interview, and questionnaire, for example, as tools to get the data.60 When a researcher observed a population to get data he wanted to analyze, the data as result of the observation is primary data
b. Secondary Data
Secondary data is data collected by institution or individual which used by other researcher to be analyzed.61 For example, a researcher
was conducting a study related with students’ performance in mechanic class; he used students’ data from school in his town. The students’ data are secondary data because there was someone who
had collected the data and used by other people.
➬➮
➱✃❐ ❒ ❮❰ÏÐ ❮ÑÒ ÓÓÔÕMetodologi Penelitian Skripsi, Tesis, Disertasi, dan Karya Ilmiah,Ö➱❮× ❮ÔØ ❮Ù ÚÔÛ❰❮Ü❮Ý ÛÜ❒ ❮ÞÔÓ✃ßÕà áââ ãÕâ ä åæ
60
Joop J. Hox Hennie R. Boeije Data Collection, Primary vs Secondary ,Encyclopedia of Social MeasurementVol. 1, 2005, 593.
61
(50)
Data used in this research had been gathered from the result of selection test and first semester examination of FLDI by the institution itself; the data served as secondary data as explained.
2. Data Collection Technique
Data collection technique is a way in collecting data, the number of participant in it, and the schedule of execution. Document analysis will be used in this research to collect data needed because this data collection technique is not limited by space and time that ease researcher to know data from previous research.62 The researcher also had been helped by
FLDI teacher to get permission from the institution’s officer; FLDI’s
officer got the data needed for this study at August 2015 and January 2016.
Since the information or data needed by this research in form of document, researcher only needs to collect it from the institution, FLDI.
The document mentioned above is students’ scores from selection test of
the institution held on august 2015 and also from their first semester academic year 2015-2016. Those scores will be taken from the data base of FLDI by permission of its officers of course.
62
Juliansyah Noor,Metodologi Penelitian Skripsi, Tesis, Disertasi, dan Karya Ilmiah,(Jakarta: Prenadamedia Group, 2011), 141.
(51)
E. Research Instrument
As explained, the data used is secondary data which means that the instrument of this research is documentation or document analysis.63 This kind of instrument let the researcher to use recorded data by previous studies, researches, or results of tests to be analyzed or assessed.
F. Data Analysis Technique
Data analysis technique in this research is a tool in analyzing collected data. The researcher states that there are two variables that need to be correlated in order to know both correlation and draw a conclusion after the
analysis process. Those variables are scores of FLDI’s “selection test” on
August 2015 (variable X) and scores of FLDI’s first semester final
examination on January 2016 (variable Y).
According to the relation or correlation between variables, Denscombe stated that correlation analysis is the best way to find relationship between
scores of FLDI’s “selection test” and scores of FLDI’s first semester final
examination.64 In case of correlating two variables or more, Pearson Correlation Product Moment is used to analyze the relation of variables mentioned, because this type of correlation analysis is easy to use and a good analysis without considering any influence outside the data.65 Here is the
63
Tukiran Taniredja Hidayati Mustafidah,Penelitian Kuantitatif (Sebuah Pengantar), (Bandung: ALFABETA, 2014), 51.
64
Martvn Denscombe,The Good Research Guide, for Small-Scale Social Research Projects, Fourth Edition, (England: open university press, 2010), 258.
65
Donald Ary, et al.,Introdcution To Research In Education,(USA: Wadsworth, Cengage Learning, 2010), 130.
(52)
formula of Pearson correlationk product moment without finding z scores that ease the process of datamnh j, analysis:66
r =Pearsonr
∑ X =sum of scores inXdistribution ∑Y =sum of scores inYdistribution
∑X2 =sum of the squared scores inXdistribution ∑Y2 =sum of the squared scores inYdistribution ∑ XY =sum of products of paired inXandYscores
N =number of pairedXandYscores (subjects)
If the sum of the calculation by using the formula is below the significance level (0.05), it means that both variables have significance correlation. But, if the significance of the correlation of the variables is above 0.05, meaning it indicates insignificance correlation between variables.
To ease the researcher’s work, SPSS 16 for windows will be used to be the
instrument of it. Statistical Package for Social Science or SPSS is an application for Windows (formerly for DOS) that can analyze, and calculating
statistical data which is very useful to ease researcher’s work. As addition,
66
(53)
Donald Ary, et.al, stated that SPSS is well-known among researchers in doing data analysis in educational research.67
67
(54)
ç ç
This chapter will present and explain findings from the result of statistical
analysis of both variables (the scores of FLDI’s selection test and first semester final examination).
A. Findings
The selection test and first semester final examination have several sections, reading comprehension, grammar section, translation, and writing comprehension. Knowing those sections, the data, which is scores of those tests, should be scores of those sections, but in this study the researcher used cumulative or average scores to be analyzed.
It has been clear that data used in this study is the scores of FLDI’s selection test and first semester final examination. In this part, the researcher will describe the data and the finding after analysis process.
1. The Scores of FLDI’s Selection Test(2015)
This sub-chapter presents data in form of the scores of selection test conducted on August 2015. The data given is the average or cumulative scores of all section in the selection test.
(55)
The Score List of Selection Test Foreign Language Development (FLDI)
Islamic Boarding School of Nurul Jadid, Paiton, Probolinggo Academic Year 2015/2016
PUTRA
No Inisial Sekolah Formal Kelas Nilai
ê ëì í ìëî ï ð ñ êò6
2 MFó MA NJ ð èôò6
3 IMí íMK NJ ð è õ òè
è ë ö í ìëî ï ð ÷ øõ òè é ùë í ìëî ï ð ø6.6
6 íMí MA NJ 55.2
7 MID MA NJ ð èñò ø
8 Hí MA NJ ð øú ò6
9 Aó û MA NJ ð ú è ò ø
êô ïë ì ëî ï ð ú ú ò6
11 MIA íMA NJ ð ÷ èôò ø
ê ø ëë ì ëî ï ð è ú ò6
13 íA íMK NJ ð úêò8
14 Aü íMK NJ ð ú8
15 IH MA NJ ð èñòè
ê6 MCA MA NJ ð é øò ø
ê ñ ì ë ý í ìëî ï ð é ñò6
18 MAí íMK NJ ð èôòè
êõ ë öþ í ìëî ï ð ÷ úøò8
20 IN MA NJ ð úô
ø ê öó ì ëî ï ð ú ú ò8
22 FAL MA NJ ð úéòè
øú ìÿþ ì ëî ï ð èøò ø øè ì ì íì î ï ð ø êòè ø é ✁ëë ì ëî ï ð è6.4
26 MIM MA NJ ð ú è òè
ø ñ ë✂ë íì î ï ð ÷ øú ò8
(56)
PUTRI Al-Bayan
No Inisial Sekolah Formal Kelas Nilai
✄ ☎ ✆ ✝ ✞☎✟ ✠ ✡☛ ☞✌ ✍✌ ☞ ✎✏ ✝ ✞☎✟ ✠ ✡ ☞ ☞ ✑ ✒ ✓✒✔ ✝ ✞✕✟ ✠ ✡☛ ☞ ☞✍ ☞ ✌ ✎✖☎✖ ✝ ✞☎✟ ✠ ✡ ✑✗✍6
5 ✔☎A ✝MA NJ ✡☛ ☞✘
6 ✔☎M MA NJ ✡ ✑ ✑✍8
7 A✝ ✙ ✝MA NJ ✡ ✑ ✑
8 JNB MA NJ ✡ ✑8.4
9 K ✝MA NJ ✡☛ ☞✘✍6
10 F✚ ✝MA NJ ✡ ✑6.2
11 FN✖ ✝MA NJ ✡ ✑ ✄
✄ ☞ ✞✕✖ ✝ ✞☎✟ ✠ ✡ ✌ ✛ ✍✌ ✄ ✑ ☎✏✚ ✝ ✞☎✟ ✠ ✡ ✑6.8
14 AM✜ MA NJ ✡ ✌☞✍8
15 ✝M MA NJ ✡ ☞✛ ✍✌
✄6 I✝ MAN ✡☛ ☞ ✑✍✌ ✄✗ ☛✏ ☎ ✆ ✞☎✟ ✠ ✡ ☞ ✄✍✌
✄8 DA✔✖ MA NJ ✡ ✌✄✍6
19 AF✜ MA NJ ✡ ✌✑✍8
20 ✔✢ MA NJ ✡ ✑ ✑✍8
21 MM✜ ✝MA NJ 37.8
22 AH ✝MA NJ ✡ ✑✗✍✌
☞ ✑ ✒✓✞ ✝ ✞☎✟ ✠ ✡ ☞✌ ☞✌ ☛✝ ✝ ✞☎✟ ✠ ✡ ☞✌ ✍6
25 ✣ ✤☛✏ ✞☎ ✟ ✡☛ ☞✘✍6
26 ✔☎ MA NJ ✡ ✑ ☞✍✌
Table 4.2
PUTRI Al-Hasyimiyah
No Inisial Sekolah Formal Kelas Nilai
✄ ☎☎✥ ✞☎✟✠ ✡ 60.2
2 NDM ✝MA NJ ✡ 64
3 A✔ ✟M ✝MA NJ ✡ ✘✗✍ ☞
✌ ✏✣✠ ✝✞☎✟ ✠ ✡☛ ✘✘✍ ☞ ✘ ✣✥ ✝✞☎✟ ✠ ✡ ✌ ✛
6 ✔ ✝MA NJ ✡☛ ✘ ✘
(57)
8 NK ✧MA NJ ★ ✩ ✪✫ ✬ ✭ ✮ ✯ ✧✰ ✱✲ ✳ ★ ✴ ✵✫✴ ✪✶ ✲ ✷✸ ✧✰ ✱✲ ✳ ★ ✴✩ ✫✴ ✪✪ ✱✧ ✧✰ ✱✲ ✳ ★ ✩✵✫6
12 JF MA NJ ★ ✩✩ ✫✴
✪✹ ✰✺ ✻ ✧✰ ✱✲ ✳ ★ ✴8.6
14 FH ✧MA NJ ★ ✩✴
✪✩ ✼✱✲ ✧✰ ✱✲ ✳ ★ ✩ ✪✫8
16 ✽H MAN ★ ✻ ✴6.2
17 ✺ ✾ MA NJ ★ ✻ ✩✶✫6
18 ✿IA ✧MA NJ ★ 61.2
19 ✽D ✧MA NJ ★ ✩6.6
20 GAMFA MA NJ ★ ✩✪
✬ ✪ ❀✿ ✧✰ ✱✲ ✳ ★ ✩8
22 N✧ ✧MA NJ ★ ✩✩ ✫ ✬
✬ ✹ ✿✰ ✧✰ ✱✲ ✳ ★ ✩ ✹✫8
24 AC MA NJ ★ ✩✵✫6
25 FH❁ ✧MK NJ ★ ✻ ✩ ✩
✬6 ✺ ✾❂ ✧MA NJ ★ ✻ ✴8.8
27 ✽LM MAN ★ ✻ ✴ ✵✫6
28 ✧CM ✧MA NJ ★ ✩✶✫6
29 ✧NQ MA NJ XI 46.2
30 AAB SMK NJ X 47.6
31 BPA SMA NJ X 51.4
32 AFA SMA NJ X 45
33 EW SMA NJ X 49.2
Table 4.3
2. FLDI’s Final Examination of the First Semester (2015/2016)
These tables consist of first semester final examination scores held by FLDI on January 2016. The scores are the average or cumulative scores from all section in the final examination.
(58)
Recapitulation of First Semester Final Exam Scores Foreign Language Development Institute (FLDI)
Academic Year 2015/2016 Tingkat: Elementary A Pusat
NO Inisial Nilai Numerik Nilai Huruf
❃ ❄ ❅ ❆❆ ❇
❈ ❅❉ ❊ ❆❋ ●❍ ■
❍ ❏❅❑ ❆❈●▲ ■
❋ ❄❉ ❆❈●❍ ■
▼ ◆❄ ❆❈●❈ ■
6 ❑M❑ 72.1 C
7 MID 72.1 C
8 H❑ 71.9 C
9 A❊❇ 71 C
10 JA 70.4 C
11 MIA 70.2 C
12 AA 69.4 C
13 ❑A 69 C
14 A❖ 66.8 C
Table 4.4
Tingkat : Elementary B Pusat
No Inisial Nilai Numerik Nilai Huruf
1 IH 72.7 C
2 MCA 72.5 C
3 MAG 72.5 C
4 MA❑ 72.5 C
5 AFH 72.1 C
6 IN 71.8 C
7 F❊ 71.3 C
8 FAL 71.3 C
9 MP ◗ 70.1 C
10 MM 70.1 C
11 DAA 68.3 C
12 MIM 67.6 C
13 ACA 63.3 C
(59)
Tingkat : Elementary A Glory Al-Bayan
No Inisial Nilai Numerik Nilai Huruf
❘ ❙❚ ❯ ❱ ❲8 B
2 CF 78.0 B
3 LDL❳ 76.3 B
4 C❨A❨ 75.8 C
5 ❳❙A 75.7 C
6 ❳❙M 75.7 C
7 A❩❬ 75.5 C
8 JNB 75.3 C
9 K 75.2 C
10 F❭ 75.2 C
11 FN❨ 75.0 C
12 MK❨ 74.8 C
13 AF❭ 74.7 C
14 AM❪ 73.2 C
15 ❩M 73.1 C
16 I❩ 72.2 C
17 IFAH 71.7 C
Table 4.6
Tingkat : Elementary B Glory Al-Bayan
No Inisial Nilai Numerik Nilai Huruf
1 DA❳❨ 80.3 B
2 AF❪ 77.8 B
3 ❳❫ 75.6 C
4 MM❪ 75.1 C
5 AH 74.8 C
6 LDM 74.7 C
7 I❩ 73.6 C
8 ❴❵❛ ❜ ❯❘❲❯ ❝
❱ ❳❙ ❞ ❱ ❲❡ ❢
(60)
Tingkat : Elementary A Al-Hasyimiyah
No Inisial Nilai Numerik Nilai Huruf
❣ ❤❤ ✐ 82.0 B
2 NDM 81.7 B
3 A❥ ❦M 79.8 B
4 F❧J 79.7 B
5 ❧ ✐ 78.6 B
6 ❥ 78.1 B
7 IA♠ ♥♥ ♦ ♥ ♣
8 NK 77.2 B
9 FD 76.9 B
10 NKh 76.3 B
11 Aq 76.0 B
12 JF 75.7 C
13 M❥ r 75.2 C
14 FH 75.2 C
15 sAN 73.9 C
16 tH 70.9 C
17 ❥✉ 70.8 C
Table 4.8
Tingkat : Elementary B Al-Hasyimiyah
No Inisial Nilai Numerik Nilai Huruf
1 ❧IA 83.3 B
2 tD 80.4 B
3 GAMFA 77.8 B
4 L❧ 77.6 B
5 Nq 77.0 B
6 ❧M 76.5 B
7 AC 76.3 B
8 FH✈ 74.7 C
9 ❥✉✇ 74.6 C
10 tLM 74.1 C
11 qCM 73.8 C
12 qNQ 72.0 C
13 AAB 71.6 C
14 BPA 71.3 C
15 AFA 70.3 C
16 EW 70.0 C
(61)
Those tables are the data given by FLDI officer. The tables
representing students’ selection test scores in correlation step was not separated, meaning that those data in those tables were in a table when the researcher correlated it. That circumstance also happened to the tables of students’first semester final examination score.
From the tables of the recapitulation of first semester final examination for elementary level of FLDI, we can assume that their progress in learning and mastering English is in very right direction. Those tables show us great amount of achievement after learning process in the institution.
3. The Result of Statistic Data Analysis Using Pearson Correlation Product Moment
The researcher uses Pearson product moment correlation in correlating the two variables through SPSS version 16.0 app. The result of the statistical analysis is recorded as follows:
Correlations
Selection_Test (X)
Final_Exam (Y)
Selection_Test Pearson Correlation
1 .403**
Sig. (1-tailed) 0
N
86 86
Final_Exam Pearson Correlation
.403** 1
Sig. (1-tailed) 0
N
86 86
**. Correlation is significant at the 0.01 level (1-tailed).
(1)
54
variables have positive correlation with high significance is a proof that
predictive validity of the test is valid.
Recalling Donald Ary et al statement, that if both variables, the scores of
aptitude test (predictor) and examination (criterion), shows positive relation,
the predictor can be used in predicting test-takers’ future-like performance.69
So, the selection test conducted by FLDI can be used again in next academic
year because it is proved that the test could predict the future-like
performance of test-takers with positive correlation and high significance.
69
D➍➎➏➐➑ A➒➓, ➔→➏➐➣ ↔↕➙ ➛➜➝ ➞➟ ➠ ➛➡ ➝➙➢ ➝➤➥➦➥➧ ➜➟ ➨↕ ➙➩➞➠ ➟ ➧ ➛➡ ➝➙➫(➭➯A: ➲ ➏➑ ➳➵ ➍➒→h, C➔➎g➏g➔ L➔➏➒➎i➎g, 2010), 229.
(2)
➸ ➸
A. Conclusion
Based on the previous chapter, the result of the statistical analysis shows
that correlation between the scores of the selection test (variable X) and the
final semester examination (variable Y) reaching 0.403 is that the significance
of the correlation is smaller than the significance level decided (0.00 < 0.05)
which means that both variables have high significance. The researcher
successfully shows that the selection test conducted by FLDI as predictor
proved to have valid predictive validity. The selection test can also be used
again with the same purposes in next academic year because of the positive
correlation with high significance of the variables (X and Y).
Recalling back the first chapter about null-hypothesis (H0) and
hypothesis alternative (H1), the result of the analysis shows that H0 rejected
and H1accepted.
B. Suggestion
For next research that willing to analyze predictive validity of a test
generally, the researcher of this study suggests that the next research to
include external factors out of test because there is possibility that the
analyzed test is very good in prediction, but in the end test-takers cannot
reach their predicted capability because of some external factors out of test
such as sickness, improper place for the test, etc. To know the external
(3)
➺6
next research will use questionnaire or interview to get accurate data about
the external factors influencing test-takers performance.
Although this study proves the acceptance of the H1 by showing the
positive correlation with high significance between variables, but there is a
thing that needed to be watched over. Reminding the statement of Kusuma
that stated there is no test has prefect prediction,70 so it is need to be
underlined that the predictor, selection test in this case, cannot always predict
test-takers future-like performance accurately. The selection test may misses
in predicting the criterion because of external factors that can affect the result
of the test, such as the unfitness of test-takers condition or their business
outside the learning process that can make them losing focus while taking the
selection test or the final examination and choosing wrong answer.
Especially for FLDI officer and management, the researcher offers a
suggestion to always upgrade their capability in teaching, making and
conducting a test, because even though their selection test is proved to have
accurate prediction, still there is possibility for the test to be irrelevant, in
another time, because of the external factors’ influence that can affects the
test’s result as explained before.
70
Mochtar Kusuma, Mochtar Kusuma,➻➼➽ ➾ ➚➽ ➪➶➹➘➴ ➷➶ ➷ ➶ ➬➽ ➴ ➮➹➘➴ ➱➽ ➴ ✃➽ ❐➮❒ ❮❰ Ï➘✃➘➴ ➪➶➷ ➽➴ Ð❰Ï ➾➘❰ ➘➴ ✃➽➪➶, (Yogyakarta: Parama Ilmu, 2016), 53.
(4)
Brown, H. Douglas.Language Assessment-Principles and Classroom Practices,
New York: Pearson Education ESL, 2004
Waugh, C. Keith - Gronlund, Norman E.Assessment of Student Achievement
Tenth EditionCalifornia: Peachpit Press, 2012
Messick, Samuel–Linn, Ruth. (Ed.),Educational Measurement: Validity,New York: McMillan, 1989
Hughes, Arthur.Testing for Language Teacher,Cambridge: Cambridge University Press, 2003
Fraenkel, Jack R., et.al.How to Design and Evaluate Research in Education,New York: McGraw-Hill Education, 2011
Sugiono.Metode Penelitian Kuantitatif Kualitatif Dan R&D, Bandung: ALFABETA CV, 2012
Ary, Donald, Lucy Cheser Jacobs, Chres Sorensen, Asghar Razavieh.
Introdcution To Research In Education,USA:Wadsworth, Cengage
Learning, 2010
Denscombe, Martvn.The Good Research Guide, for Small-Scale Social Research
Projects, Fourth Edition, England: Open University Press, 2010
Taniredja, Tukiran–Mustafidah, Hidayati.Penelitian Kuantitatif (Sebuah
Pengantar), Bandung: ALFABETA CV, 2014
Kusuma, Mochtar.Evaluasi Pendidikan, Pengantar, Kompetensi dan
(5)
3
Kerstjens, Mary -Nery, Caryn, 2000. “Predictive Validity in the IELTS Test: A
Study of the Relationship between IELTS Scores and Students’ Academic Performance”,IELTS Research ReportsVol. 3, 2000
Patricia Dooey– Rhonda Oliver, 2002. “An Investigation into the Predictive
Validity of the IELTS TEST as an Indicator of FutureAcademic Success”,
ProspectVol. 17 No. 1, April, 2002
Kurniawan, Irwan Nuryana–Fahmie, Arief, 2005. “Validitas Prediktif Ujian
Penerimaan Calon Mahasiswa Universitas Islam Indonesia terhadap Indeks
Prestasi Kumulatif Mahasiswa”,FenomenaVol. 3 No. 1, Maret 2005
American Educational Research Association, American Psychological
Association, and National Council on Measurment in Education,Standards
for Educational and Psychological test,Washington, DC: American
Educational Research Association, 1999
ZhangYan, Master Thesis: “Predictive Validity of TOEFL Scores on First Term’s
GPA as the Criterion for International Exchange Students”. Vancouver:
University of British Columbia, 1995
Adityawarman, M. Zakaria, Minithesis Degree:“Predictive Validity of the
Collage Entrance English Test of UINSA Surabaya toward Students’
Achievement of Teacher English Education Department (PBI)”Surabaya:
State University of Islamic Studies Sunan Ampel, 2016
Joppe, Marion, “The Research Process”University of Guelph-School of Hospitality, Food and Tourist Management
(https://www.uoguelph.ca/hftm/research-process, accessed on April 25, 2016)
LembaÑ ÒPengembangan Bahasa Asing,“Profil Lembaga”Festival Bahasa LPBA
2014Óhttps://festivalbahasa14.blogspot.co.id/p/lomba-lomba.htmlÔaccessed on 29 June, 2016Õ
(6)
Suzanti, Elvi, “Validitas Prediktiv Tes Kompetensi Berbahasa Indonesia Menurut Minat Belajar”Badan Pengembangan dan Pembinaan Bahasa Kementrian Pendidikan dan Kebudayaan
(http://badanbahasa.kemdikbud.go.id/lamanbahasa/produk/1319, accessed on 23 August, 2016)
Renisti Mudela, “Validitas Prediktif Skor Advanced Progressive Matrice (APM) dan Skor Skala Minat Pekerjaan Terhadap Prestasi Belajar Siswa: Studi Deskriptif Korelasional Skor Inteligensi (APM), Skor Skala Minat Terhadap Prestasi Belajar Siswa Kelas X dan XI, academic year 2013/2014”UPI Digital Repository, Indonesia University of Education,