Writ ing, adm inist ering and m arking t est s

© M ichael Harris and Paul M cCann 1994. This page may be photocopied for use in the classroom. 49

2.4 Writ ing, adm inist ering and m arking t est s

As a t eacher, you m ay of t en f ind t hat you are expect ed t o w rit e, adm inist er and m ark t est s. All of t hese t asks can be t im e-consum ing and w e need t o produce pract ical t est s w hich can assess our st udent s as reliably as possible in t he t im e t hat w e have available t o us. Writ ing t est s W hen w e st art w rit ing t est s w e need t o avoid conf usion and am biguit y w hich m ay m ake our t est unreliable or invalid. Having done an init ial draf t of t he t est w e t hen need t o check w hat w e have done. First ly, w e should do t he t est ourselves t o see if w e can spot any changes t hat need t o be m ade. Secondly, w e can ask a colleague t o do t he t est . W hat m ay be obvious t o you m ay not be so obvious t o anot her person - a colleague should help advise you on ot her changes. One w ay of t est ing w rit ing w hich can m ake t his process easier is by w rit ing your t est s w it h ot her colleagues f rom t he beginning. First you should agree on w hat you w ant t o t est . Then you can eit her divide up t he w ork one person t o do a list ening t est , anot her a reading t est or you can sit dow n and w ork on each t est t oget her. A collect ive approach t o w rit ing t est s m akes t his diff icult t ask easier and at t he sam e t im e it provides you w it h an opport unit y t o com pare ideas and at t it udes. It is also an excellent t im e t o discuss t he crit eria t hat you are going t o use t o assess perf orm ance in t he t est s. This of course assum es t hat t im e is available f or t hese act ivit ies - you m ay have t o com prom ise. The f irst area t o t hink about is t hat of w rit ing inst ruct ions. It is very easy t o concent rat e only on t he cont ent of t he t est it self and t o f orget t hat st udent s w ill need t o know w hat t o do during t he t est . St udent s only know w hat t hey have t o do during t he t est by reading or hearing t he inst ruct ions - t his is of t en called t he rubric. You m ay have a t est w hich appears t o be pract ical and valid, but w it hout a clear and concise rubric t he t est w ill soon lose it s validit y if st udent s are unclear as t o w hat is expect ed of t hem . Clarit y is essent ial in rubrics. The rubric should t ell t he st udent s exact ly w hat t hey have t o do, how t hey have t o do it and w hat t he m arker is looking f or. If t he language is unclear, st udent s m ay not know w hat is expect ed of t hem and could f ail t o perf orm t o t heir f ull capabilit y in t he t est . Consider which of the following rubrics is clearer, a or b: a Listen to the tape and put the right answer to the questions. b Listen to the tape and answer questions 1–10 below by putting a cross X in the correct box next to each question. In t he above cases, rubric b is clearer, as it t ells st udent s w hat t hey have t o w rit e and w here t hey have t o w rit e it w hereas rubric a does not t ell st udent s exact ly w hat t hey m ust do t o com plet e t he t est . Anot her im port ant point is conciseness. The rubric should not be t oo long. If t he rubric is so long t hat t he st udent has t o t ake in large am ount s of inf orm at ion, t hen it is likely t hat heshe m ay concent rat e on w hat and w here t o produce t heir answ er rat her t han on get t ing t he right answ er it self . W it h unnecessarily long rubrics, t he object ive of m easuring t he st udent ’s t rue perf orm ance, as f ar as t his is possible, w ill be com prom ised. 50 © M ichael Harris and Paul M cCann 1994. This page may be photocopied for use in the classroom. Consider which of the following rubrics is more concise, a or b: a Read the text which appears on the second page of the text booklet you have been provided with. Look at the answer box which has columns for information. Underneath the heading of each column, you will find spaces in which you should write your answer according to the text. Use the words you hear in the text to fill in each of the columns. Sometimes, you may find that a column has no corresponding information in the text. In these cases, do not write anything in that column. Leave that column blank. All the other columns will require you to write some information. b Read text number 7 and fill in the information in each column of the box next to the text. Write the exact word used in the text in the appropriate column. If there is no information for any particular column, leave the column blank. In t he above cases, b is m ore concise t han a and has t he advant age t hat st udent s do not have t o spend t im e t rying t o decipher t he m eaning of rubric a w hilst t hey are at t em pt ing t o do t he t est . One w ay of avoiding any pot ent ial problem s can be t o put t he t est rubric in t he st udent s’ m ot her t ongue. W hen you are checking t he t asks t hem selves you also need t o t hink about background know ledge. It should be im possible t o answ er any quest ions correct ly w it hout reading or list ening t o a t ext . St udent s should not be able t o use t heir know ledge of t he w orld or background know ledge of cert ain subject areas t o answ er t he quest ions. Consider the following item, taken from a reading test. Can you answer this question? When was the Boeing 747 Jumbo Jet first built? a the 1950’s b the 1970’s c the 1990’s M any st udent s m ay be able t o answ er t his quest ion w it hout ref erring t o t he reading t ext in w hich t he answ er appears as a result of t heir know ledge of t he subject area. Theref ore, t hose st udent s w ho happened t o know w hen t he Boeing 747 w as built w ould have an unf avourable advant age over t hose st udent s w ho did not happen t o know t he dat e. It is also im port ant t o look out f or any kind of cult ural bias w hen w rit ing or checking t est quest ions. No it em s in your t est should depend on specif ic know ledge of cert ain cult ures or cust om s. The t est should not require t he st udent t o dem onst rat e know ledge of a part icular cult ure. Consider the following item, taken from a reading test: The Smiths are a typical English working class family and they have their meals at normal times. They have their evening tea when they get back from work. What time do the Smiths have their evening tea? a 3pm b 6pm c 8pm © M ichael Harris and Paul M cCann 1994. This page may be photocopied for use in the classroom. 51 This quest ion t est s know ledge of cust om s in Brit ain, rat her t han abilit y t o read. You cannot answ er t he quest ion if you do not know w hat t im e people eat and w hen t hey norm ally f inish w ork in Brit ain. One of t he m ost im port ant considerat ions is t he cont ent of t he t est . The t est should ref lect w hat t he st udent s have been doing in class ie should accurat ely ref lect t he syllabus and design of t he course in t erm s of cont ent and f orm at . For exam ple, w hen w rit ing a progress t est af t er t he f irst seven unit s of a course, it w ould be unw ise t o base t he t est on t he f irst t w o unit s alone. A t est at t he end of t he course should at t em pt t o ref lect t he w hole course or year as f ar as possible ie t he t est should sam ple as w idely and unpredict ably as possible f rom t he cont ent of t he course, bearing in m ind t im e const raint s. Theref ore, w hen you are checking your t est it is w ort h asking yourself how w ell it ref lect s w hat you have been doing w it h your st udent s. All it em s in your t est should be also relevant in t erm s of real w orld language use. The t ask w hich st udent s are expect ed t o perf orm should correspond as closely as possible t o som e use of t he language in t he real w orld. The key here is t o m ake t he t ask appear t o t he st udent s as som et hing w hich t hey m ight act ually have t o do w it h t he language. Consider the following item, taken from a writing test. Is this a task that students might have to do in the real world?: Composition: Write a short composition about 250 words about what you did in your summer holidays. Include details about the journey, the place and the accommodation where you stayed. W it hout searching t oo f ar f or pot ent ial relevance, it is diff icult t o t hink of a cont ext in t he real w orld w hen a language learner m ight need t o produce t his kind of w rit t en w ork. The sam e t ype of product ion m ight be elicit ed in a m ore aut hent ic m anner by set t ing t he w rit t en piece in t he cont ext of , f or exam ple, answ ering a let t er f rom a penf riend asking about t he sum m er holidays. Tim e is also very im port ant and your t est should not place undue pressure on t he st udent s in t erm s of t im e needed t o com plet e t he t asks set . In reading t est s, st udent s should not be required t o answ er it em s in a lim it ed t im e w hich w ould be com plet ely unrealist ic f or a reader in real lif e. Tim e should be allow ed t o read t he t ext , read t he it em s and answ er t he it em s, including a t im e allow ance f or re-reading f or clarif icat ion and possible am endm ent t o answ ers. In list ening t est s, st udent s should not be required t o answ er t oo m any it em s in a short period of t im e, especially if t he list ening t ext gives answ ers t o it em s in rapid succession, perhaps not even giving t im e t o not e dow n required answ ers. In w rit ing t est s, st udent s should not be required t o produce x am ount of w ords in an unrealist ic t im e lim it , eg a let t er t o a prospect ive em ployer in 5-10 m inut es. Tim e should be given f or planning, w rit ing, reading and possible re-w rit ing. In speaking t est s w here t here is an int erlocut or, unrealist ic t im e pressures on answ ers and oral product ion should be avoided, eg hurrying st udent s int o an answ er w it h no t im e f or ref lect ion or checking underst anding or asking f or repet it ion. 52 © M ichael Harris and Paul M cCann 1994. This page may be photocopied for use in the classroom. Finally, w hen you are w rit ing your t est s you should also t ake int o account pract ical adm inist rat ive f act ors. For exam ple, you should not w rit e t est s t hat require a lot of phot ocopying if it is diff icult and expensive t o do in your sit uat ion. You should also avoid using equipm ent like video recorders w hich is diff icult f or you t o obt ain. Checking tests Look at the test items on the next few pages. Identify the problems in test items. Use this checklist to help you: • Is there more than one possible answer? • Is there no correct answer? • Is there enough context provided to choose the correct answer? • Could a test-wise student guess the answer without reading or listening to the text? • Does it test what it says it is going to test? or does it test something else? • Does it test the ability to do puzzles, or IQ in general, rather than language? • Does it test students’ imagination rather than their linguistic ability? • Does it test students skills or content knowledge of other academic areas? • Does it test general knowledge of the world? • Does it test cultural knowledge rather than language? • Are the rubrics instructions clear and concise? Is the language in the instructions more difficult than that in the text? • Will it be very time-consuming to mark and difficult to work out scores? • Are there any typing errors that make it difficult to do? Speaking 1 Int erview er: ‘Right , now M aría w hat do you t hink about cricket ?’ 2 Int erview er: ‘Can you t ell me t he names of t he animals in t he pict ure?’ 3 Int erview er: ‘What do you t hink about t he polit ical sit uat ion in Sout h Af rica?’ List ening 4 List en t o t he t alk and choose t he correct answ er: a All elephant s live in Af rica. b Elephant s live in Sout h Am erica. c Elephant s live in Af rica and India. 5 List en t o t he dialogue and answ er t he quest ions. You must w rit e complet e sent ences remember adverbs of f requency. Exam ple: W hat is t he w eat her like in Sydney in June? In June it is rainy and som et im es it is cold. You get t hree m arks f or each correct sent ence, one m ark f or each adverb of f requency used and half f or each w ord you use relat ed t o w eat her. © M ichael Harris, Paul M cCann 1994, M acmillan Publishers Ltd 53 Writ ing 6 Writ e a ghost st ory 200 w ords. Try t o make it dramat ic and excit ing 7 Writ e inst ruct ions f or a scient if ic experiment . Draw diagrams t o illust rat e it . 8 Writ e a report of an int erview w it h a f amous pop st ar. Use report ed st at ement s and quest ions. Example: I asked him about his f amily and he t old me t hat ... Reading 9 Read t he t ext below and answ er t his quest ion: What did t he capt ain hear? a A sheep or goat . b A st range noise f rom t he sea. c Som ebody in pain. It w as a dark and m ist y night . The capt ain w as sleeping in his cabin and everyt hing w as quiet . Suddenly, t he capt ain w oke up. He heard a st range noise com ing f rom t he deck. It sounded like an anim al in pain, m aybe a sheep or a goat . He got dressed and caut iously clim bed t he ladder. 10 Complet e t he sent ences w it h one of t hese w ords: a how ever b but c in spit e of d alt hough They cont inued playing f oot ball .............. it w as raining Vocabulary 11 What are t hese w ords? nsyun ureuf nit r angroe dolc 12 Which is t he odd one out ? a basket ball b t able t ennis c ice-hockey d cricket Grammar 13 Complet e t he sent ence w it h ‘w ill’ or ‘going t o’: Tom orrow I t hink it ..................rain. 14 Choose t he alt ernat ive t hat is closest in meaning t o t he w ord w hich is underlined: I’ ve just f inished it . a A m inut e ago. b Yest erday. c A w eek ago. 15 M arking scheme: Exercise A: 1 = 2.5 2 = 2 3 = 3.25 4 = 1.75 5 = 2 6 = .5 TOTAL = 12 Adm inist ering t est s As w e point ed out in t he int roduct ion t o t his book, being t est ed as a st udent and t est ing learners as a t eacher can be t raum at ic. Theref ore w e need t o do w hat w e can t o reduce t ension. At t he sam e t im e, w e need t o m ake sure t hat f orm al assessm ent does t ake place under t est condit ions ie t hat st udent s cannot copy or help each ot her. In norm al classroom condit ions w e w ant t o encourage co- operat ion, but w hen carrying out f orm al assessm ent w e need t o do t he opposit e, t o m ake sure t hat w e are t est ing t he perf orm ance of each individual learner. W e t hus need t o reduce t o a m inim um any cheat ing t hat m ight go on in our classroom . 54 © M ichael Harris, Paul M cCann 1994, M acmillan Publishers Ltd The f irst t hing t o consider is t he place. M ost of t he t im e w e give our t est s in our ow n classroom s. Bef ore giving out t he t est papers it is a good idea t o t ry t o separat e st udent s as m uch as possible, by m oving desks or placing st udent s around t he room . If it is im possible t o do t his it m ay be w ort h t rying t o m ove t o anot her room . If you do t his, it is w ort h choosing a place w here t here w ill be a m inim um of int errupt ions and out side background noise. Tim e also has t o be considered. First ly, you should t ell st udent s how long t hey have at t he beginning as w ell as w rit ing it on t he paper. In your ow n classroom t est s you can be relat ively f lexible and give st udent s a bit longer if you see t hat t hey are having problem s. If it is a school t est it is m ore im port ant t o avoid any unf air advant age f or som e st udent s w ho m ight benef it f rom m ore t im e. It is im port ant t o have m at erials w ell prepared in advance. M ake sure t hat bef ore you adm inist er t he t est you have all t he necessary print ed m at erial and t hat t here are adequat e supplies of all t est papers, m aps, chart s or any ot her print ed m at t er f or t he num ber of st udent s w ho w ill be t aking t he t est . M ake sure t hat phot ocopying is of a sat isf act ory qualit y and t hat no errors have been m ade in print ing and preparat ion. If t here are any errors, eg spelling, repeat ed quest ions, inf orm st udent s bef ore t he t est begins. In addit ion, m ake sure t hat you have checked any recorded m at erial bef ore t he t est is t o be adm inist ered. Som et im es, you m ay f ind t hat t here is not hing you can do about a cert ain problem , eg poor qualit y of recording or varying sound recording levels. In t hese cases, st udent s should sim ply be w arned. How ever, all at t em pt s should be m ade t o rem edy t he problem bef ore t he next adm inist rat ion of t he t est . Elect rical equipm ent also needs t o be checked bef orehand. M ake sure t hat any audio equipm ent is adequat e in t erm s of sound qualit y and acoust ics in t he room w here t he t est is t o be adm inist ered. Your st udent s w ill need t o be prepared f or t he t est . If you have short and regular progress t est s t hey w ill be m uch less w orried t han if t hey have f ew er but m ore im port ant t est s. Tell st udent s in advance t hat you w ill be giving t hem a t est and at w hat t im e t hey are expect ed t o arrive, w hat t im e t he t est w ill st art and w hat t im e t he t est w ill f inish. St udent s should also know w hat m at erials t hey need t o bring w it h t hem t o t he t est , eg pens, pencils, erasers et c. If dict ionaries are t o be used in any t est , st udent s should be t old t o bring t heir ow n copy of pref erably t he sam e edit ion so t hat no st udent has an unf air advant age. Tell st udent s about t he t est condit ions. For exam ple, t here should be no t alking and if t hey w ish t o ask a quest ion t hey should raise t heir hand. St udent s should also be seat ed in a w ay t hat t hey cannot copy answ ers f rom a neighbour or com m unicat e answ ers t o a neighbour. You can also t ell t hem w hat w ill happen if t hey are caught cheat ing, f or exam ple, t hey m ight be given a zero score or even disciplinary act ion m ight be t aken. If you are adm inist ering a t est w it h your colleagues you need t o agree on condit ions. Decisions should be t aken bef orehand about w hat t o do in cases of st udent s arriving lat e, st udent s copying, st udent s f inishing bef ore t he allocat ed t im e et c. M any of t hese decisions w ill diff er f rom inst it ut ion t o inst it ut ion and m ay depend on policy and int ernal rules and regulat ions. If you or your colleagues are t o act as int erlocut ors in speaking t est s, it is essent ial t hat everybody agrees on how t o act as int erlocut or and w hat is t o be expect ed of st udent s in t he t est . All t hose w ho are t o act as int erlocut ors should hold m eet ings bef ore t he t est t o agree on crit eria f or perf orm ance and t o pract ise using t he t est m at erials. © M ichael Harris, Paul M cCann 1994, M acmillan Publishers Ltd 55 M arking t est s M arking is one of t he m ost t im e-consum ing part s of m any t eachers’ jobs. As w e suggest ed earlier in t he sub-sect ion on planning assessm ent program m es, short and regular assessm ent t asks spread t he t est m arking load over t he w hole t erm . This not only avoids st ress and exhaust ion on your part , but it should also m ean t hat you w ill be able t o m ark m ore accurat ely and reliably. W e have also m ent ioned t he need t o consider m arking t im e w hen choosing t est f orm at s. W it h very sm all classes w e can af f ord t o have m ore labour-int ensive t est f orm at s f or exam ple open com posit ions and oral int erview s. If w e have very large classes w e w ill need t o choose f orm at s t hat w ill enable us t o m ark large num bers of t est s in a short period of t im e like m ult iple choice quest ions. On t he one hand w e have discret e it em or object ive t est s. These t est s are so called because of t he w ay t hey are m arked. An object ive t est could in t heory be m arked by any person capable of int erpret ing and applying a m arking key w hich gives t he correct answ ers w hich are unique and not negot iable. An exam ple w ould be a t em plat e or m ask as of t en used t o m ark m ult iple-choice quest ions. For exam ple, t he only possible answ er t o quest ion 6 is bopt ion. The person m arking t he t est w ill sim ply apply t he t em plat e or m arking key and w ill be able t o t ot al t he correct answ ers w hich w ill give a raw score. In t his sense, object ive t est ing can be considered as m arking by count ing. Object ive t est s are easily and quickly m arked by non-specialist s. How ever, t hey can be dif f icult t o w rit e so t hat t hey are reliable, eg in t he case of t he m ult iple-choice quest ion paper it is of t en diff icult t o devise suit able and plausible dist ract ers and t he guessing f act or is also very high. Anot her disadvant age is in t he case w here m any variant s of a sim ilar answ er w ould be a suit able and correct answ er t o an it em . For exam ple in a list ening t est , w hat is an accept able answ er t o t he quest ion ‘ W hat does t he m an w ant ?’ The answ er key m ight st at e t hat t he correct answ er is ‘ He w ant s t o buy pet rol’ . But w ould ‘ Buy pet rol’ , or ‘ Pet rol – buy’ or ‘ pet rol’ also be accept able answ ers? This w ould depend on t he t est designer, w ho w ould be responsible f or ensuring t hat answ er keys cont ained all possible answ ers. Subject ive t est s , as opposed t o object ive t est s, are not based on count ing, but depend on som ebody’s opinion, a judgem ent , a decision about candidat e perf orm ance. The person w ho is t o m ake t he judgem ent is expect ed t o be qualif ied t o m ake t hat judgem ent , eg you as a language t eacher could m ake judgem ent s about oral perf orm ances of st udent s in a speaking t est . On t he ot her hand, m any of t he people w ho m ay be perf ect ly qualif ied t o m ark your object ive, m ult iple-choice list ening t est m ay not be suit able f or use as rat ers of oral perf orm ance. Subject ive t est s can provide a w ide sam ple of st udent s’ language in a relat ively short t im e – t hink of how m uch your st udent s could act ually say in 15 m inut es. They can be object ivised by using rat ing scales w hich out line a descript ion of w hat each point on a scale m eans, eg 5 = t he abilit y t o ... . Subject ive t est s m ay t ake up a lot of your t im e. For exam ple, if a class of f ort y st udent s can be t est ed at t he sam e t im e using a f ort y m inut e reading t est , t his is m uch m ore pract ical in t erm s of t im e t han holding f ort y int erview s of f if t een m inut es each or m arking f ort y w rit t en com posit ions. Reliabilit y of rat ers is t he great est problem area. The key quest ion is how t o ensure t hat diff erent rat ers apply t he scales in t he sam e w ay. This is called int er-rat er reliabilit y. Also, anot her quest ion is how t o ensure t hat t he sam e rat er w ill apply t he scales in t he sam e w ay on diff erent days or at diff erent t im es of t he day. This is 56 © M ichael Harris, Paul M cCann 1994, M acmillan Publishers Ltd called int ra-rat er reliabilit y. These issues can be addressed by rat ing w orkshops and t raining packages, alt hough t his im plies m ore t im e spent by t eachers and any ot her person expect ed t o carry out rat ing of t est s. It m ust be point ed out here t hat object ively m arked t est s are not t o be considered as good and subject ively m arked t est s as bad. They are sim ply diff erent w ays of m arking. W hen you are m arking a t est t hat you have w rit t en yourself it is a good idea t o def ine bef orehand t he answ er key especially w hen ot her t eachers are going t o m ark t he t est . The key should be easy t o use and leave no doubt in t he m arker’s m ind as t o w hat is a correct and incorrect answ er. This can be a relat ively sim ple process in t he case of m ult iple-choice it em s or rat her m ore com plicat ed in t he case of open- ended quest ions. W it h open-ended quest ions, ensure t hat t he answ er key covers all possible answ ers. Af t er producing your answ er key, it is a good idea t o show it t o a colleague if possible. Your colleague could check t he answ ers f or possible errors, addit ions or delet ions t o be m ade. Af t er t hat , you should agree on t he answ er key w it h all t hose w ho w ill be using it t o m ark t he t est . It is also a good idea t o w ork out how you are going t o dist ribut e m arks bef ore adm inist ering t he t est . It can be very usef ul t o w rit e t he m arking schem e on t he t est paper it self , so t hat t he st udent s also know how m uch each sect ion is w ort h. W hen ot her t eachers are going t o use t he t est you w ill need t o produce a m arking schem e. The m arking schem e should be easy t o use and should leave no doubt in t he m arker’s m ind as t o how m any point s each it em is w ort h. M arkers should not be expect ed t o perf orm com plicat ed m at hem at ical calculat ions t o arrive at a st udent ’s f inal score. Look at t he exam ple of a m arking key on page 48. Subject ive t est s and rat ing For subject ive t est s w e need t o look at rat ing ie w here result s are based on som ebody’s opinion about candidat e perf orm ance, eg in an oral t est or a w rit t en com posit ion. We looked at t he subject in t he chapt er on inf orm al assessm ent part icularly in t he sub-chapt ers on speaking and w rit ing. How ever, f or f orm al t est s it is not only im port ant t o est ablish crit eria. If m ore t han one t eacher is adm inist ering t he t est , w e w ill need t o agree on int erpret at ion of rat ing crit eria. W e have already m ent ioned t he t w o kinds of rat ing scales w hich w e can use. Rat ing scales can eit her cont ain descript ions of all act ivit ies w it hin one level or can break dow n t he act ivit ies int o separat e scales and provide descript ions f or each act ivit y at each level. The f irst is holist ic rat ing and t he second exam ple is called analyt ic rat ing see page 13. The advant age of holist ic rat ing f or t est ing is t hat rat ers can int ernalise t he descript ions in a relat ively short period of t im e, eg af t er pract ice w it h a f ew sam ple perf orm ances. This syst em is t heref ore pract ical and quick t o adm inist er. The disadvant age is t hat st udent perf orm ances can of t en cut across t he descript ions, eg one act ivit y m ay belong t o level 3 and anot her act ivit y t o level 4. How ever, rat ers are t rained in all cases t o choose t he closest descript ion of t he perf orm ance. The advant age of analyt ic rat ing is t hat rat ers m ay f ind it easier t o assign a cert ain level using sim plif ied and discret e scales. The disadvant age is t hat it w ill probably be less pract ical t han holist ic rat ing in t erm s of t im e, paper and t raining. Look at page 46 f or an exam ple of oral rat ing scales. © M ichael Harris and Paul M cCann 1994. This page may be photocopied for use in the classroom. 57 W hen you are assessing your st udent s in t his w ay, it is im port ant t o achieve int ra- rat er reliabilit y – t o m ake sure t hat you rat e t hem consist ent ly. In an at t em pt t o increase our ow n int ra-rat er reliabilit y w e can look at a piece of st udent s’ w ork and t hen look at it again t w o w eeks lat er. Then w e can com pare assessm ent s and if t here are dif f erences, t hink about w hy and w here w e w ent w rong. An im port ant f act or w hen m ore t han one t eacher is m arking a t est is int er-rat er reliabilit y – t o ensure t hat all rat ers assess in t he sam e w ay and t hat all rat ers agree on t he int erpret at ion and m eaning of t he descript ions in t he rat ing scales. The object ive is t o m inim ise t he possibilit y of a st udent ’s m ark being aff ect ed by t he rat er w ho assesses t heir perf orm ance. In an at t em pt t o m axim ise int er-rat er reliabilit y, you could hold m eet ings w it h your colleagues t o discuss rat ing scales and sam ples of st udent perf orm ances could be provided. Rat ers could discuss t he sam ple perf orm ances, rat e t hem , and t hus see if t hey are applying t he sam e crit eria. Writ t en perf orm ances should be easy t o supply. In t he case of oral perf orm ances, t hese could be recorded on audio or video t ape. Inter-rater reliability activity Written performance • Look at the example of the writing assessment task on page 39. • Then look at the criteria for marking it on page 39. • Give copies of three answers on page 58 to one or more of your colleagues. Also give them a copy of the marking criteria. • Read the letters yourself and rate them. • Ask your colleagues to use the marking criteria to rate the performances. • Compare your results and if there are any discrepancies, discuss the performances and the rating scales. Try to come to an agreement about your rating of the compositions. Then compare your marks with those on page 62. • The next time you are marking written performance from a test, do the same activity with your colleagues. 58 © M ichael Harris and Paul M cCann 1994. This page may be photocopied for use in the classroom. © M ichael Harris and Paul M cCann 1994. This page may be photocopied for use in the classroom. 59 By doing activities like these, you will achieve a much greater degree of reliability for tests at your school. You will also need to work out criteria that is clear for both you and your colleagues. Interintra-rater reliability activity • Look at the example of a paired oral test on page 45. • Give the test to students in your class and record some of the discussions onto audio tape. Make sure that the tapes are labelled with student names. You might want to use numbers rather than names for increased reliability of the activity • Assess the student performances using the rating scales given on page 46 and keep a record of the marks assigned. Inter-rater reliability • Arrange a meeting with one or more of your colleagues and explain that you want to see if you can agree on the level of student performances and whether the rating scales are easy to use. • Give your colleagues the oral test material and the rating scales and give them time to familiarise themselves with the task. • Play a selection of student performances perhaps 3 or 4 including, if possible, what in your opinion is a clear pass, a clear fail and a borderline case. • Ask your colleagues to use the rating scales to rate the performances using the rating scales. • Compare your original results and if there are discrepancies, discuss the performances and the rating scales. Try to come to an agreement on the criteria to be used for each descriptor in the rating scales. Don’t worry if there are minor differences in rating – these are to be expected. Intra-rater reliability • Keep the recordings and original marks in a safe place and, if you have time, set aside an hour or so one day about two or three weeks after you administered the oral test. • Play the recordings again and rate the performances again, in a random order, using the same rating scales. Do not refer to the original marks. • When you have finished, compare your new ratings with the original marks. • If the marks are the same, you appear to be quite a reliable rater within your own performance If the marks are a little different, don’t worry – this might be expected. If the marks are very different, think about reasons why this might be so – are you stricter or more lenient? Do you think that perhaps the rating scales are confusing? How could the problem be solved? 60 © M ichael Harris, Paul M cCann 1994, M acmillan Publishers Ltd

2.5 Result s f rom f orm al assessm ent