A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT

Yıl: 2020 Cilt: 7 Sayı: 3 Sayfa Aralığı: 1126 - 1141 Metin Dili: İngilizce İndeks Tarihi: 14-11-2020

A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT

Öz:
The assessment of speaking skills in foreign language testing has always had some pros(testing learners’ speaking skills doubles the validity of any language test) and cons (many testrelevant/irrelevant variables interfere) since it is a multi-dimensional process. In the meantime,exploring grader behaviours while scoring learners’ speaking skills is necessary not only forinter/intra-rater reliability estimations but also for identifying the potential stringent and lenientgraders in the rater-group to act accordingly to settle the best matches for graders when pairedrater-scorings or cross-marking-gradings are preferred for increasing the objectivity. In thisexploratory study, which was implemented in 2019, 6 expert speaking graders scored 24English language learners’ speaking interviews from their video recordings including anindividual and a pair discussion task for each student. A Rasch model in which MFRM (ManyFaceted Rasch Measurement) was utilised to explore the scoring behaviours of the expertgraders in terms of stringency and find out if their grading habits significantly affect languagelearners’ overall speaking performances. The results of the present research showed thatgraders had significant score differences among each other and some of them scored tooleniently or too stringently that might affect learners’ speaking grades significantly.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge, UK: Cambridge University Press.
  • Brown, A. (1995). The effect of rater variables in the development of an occupation- specific language performance test. Language Testing, 12, 1-15.
  • Cohen, A. D. (1994). Assessing language ability in the classroom. (2nd ed.) Boston, MA: Heinle & Heinle.
  • Congdon, P.J., & McQueen, J. (2000). The stability of rater severity in large-scale assessment programs. Journal of Educational Measurement, 37(2), 163–178.
  • Di Nisio, R. (2010). Measuring school learning through Rasch Analysis: the interpretation of results. Procedia - Social and Behavioural Sciences, Volume 9, 2010, Pages 373-377. https://doi.org/10.1016/j.sbspro.2010.12.167
  • Ducasse, A., & Brown, A. (2009). Assessing paired orals: Raters’ orientation to interaction. Language Testing, 26(3), 423–443. https://doi.org/10.1177/0265532209104669
  • Eckes, T. (2009). On common ground? How raters perceive scoring criteria in oral proficiency testing. In A. Brown & K. Hill (Eds.), Tasks and criteria in performance assessment: Proceedings of the 28th Language Testing Research Colloquium (pp. 43–73). Frankfurt, Germany: Lang.
  • Engelhard, G. (2002). Monitoring raters in performance assessments. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 261–287). Mahwah, NJ: Erlbaum.
  • Fulcher, G. (2003). Testing Second Language Speaking. London: Pearson Education Limited.
  • Hubbard, C., Gilbert, S., & Pidcock, J. (2006). Assessment processes in speaking tests: A pilot verbal protocol study. Research Notes, 24, 14–19.
  • Kubinger, K. D. (2005). Psychological test calibration using the Rasch model: Some critical suggestions on traditional approaches. International Journal of Testing, 5, 377–394.
  • Koizumi, R., Kaneko, E., Setoguchi, R., Innami, Y., & Naganuma, N. (2019). Examination of CEFR-J spoken interaction tasks using many-facet Rasch measurement and generalizability theory. Language Testing and Assessment 8(2), 1-33.
  • Lane, S., & Stone, C.A. (2006). Performance Assessment. In R. L. Brennan (Ed.): Educational Measurement (pp 387-431). Wesport, CT: ACE/Praeger.
  • Linacre, J.M. (2002). Optimizing Rating Scale Category Effectiveness. Journal of Applied Measurement, 3, 85-106.
  • Linacre, J.M., & Wright, B.D. (2002). Construction of Measures from Many-Facet Data. Journal of Applied Measurement, 3, 484-509.
  • Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to raters? Language Testing 19/3: 246-276.
  • Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12/1: 54–71.
  • Lunz, M. E., Wright, B. D., & Linacre, J. M. (1990). Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3(4), 331-345.
  • McMillan, P.D. (2000). Classical, Generalizability, and multifaceted Rasch detection of interrater variability in large, sparse data sets. Journal of Experimental Education, 68(2), 167–190.
  • McNamara, T. F. (2000). Language testing. Oxford, UK: Oxford University Press.
  • McNamara, T., & Ryan, K. (2011). Fairness versus justice in language testing: The place of English literacy in the Australian Citizenship Test. Language Assessment Quarterly, 8(2), 161-178.
  • Milanovic, M., Saville, N. & Shen, S. (1996). A study of the decision-making behavior of composition markers. In: Milanovic, M., Saville, N. (Eds.), Studies in LanguageTesting 3: Performance Testing, Cognition and Assessment. Cambridge University Press, Cambridge.
  • Mirici, I.H. (2003). The factors affecting the success in English proficiency exams and possible contributions of the internet. Turkish Online Journal of Distance Education. 4(1): 1-8.
  • Myford, C.M., & Wolfe, E.W. (2004). Detecting and Measuring Rater Effects Using ManyFacet Rasch Measurement: Part I. In E. V. Smith y R.M. Smith (Eds.). Introduction to Rasch Measurement (pp. 460-515). Maple Grove, MN: JAM Press.
  • Orr, M. (2002). The FCE speaking test: Using rater reports to help interpret test scores. System, 30/2: 143-154.
  • Pollitt, A. & Murray, N.L. (1996). What raters really pay attention to? In: Milanovic, M., Saville, N. (Eds.), Studies in Language Testing 3: Performance Testing, Cognition and Assessment. Cambridge University Press, Cambridge.
  • Rasch, G. (1980). Probabilistic models for some intelligence and attainments tests. Chicago IL: Mesa Press.
  • Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25, 465-493.
  • Semerci, Ç. (2011). The evaluation of students on ideas about the department of computer education and instructional technology (CEIT) according to Rasch measurement model 5th International Computer & Instructional Technologies Symposium Proceedings.
  • Shi, I. (2001). Native and non-native speaking EFL teachers’ evaluation of Chinese students’ English writing. Language Testing, 18, 303-325.
  • Shohamy, E. (1983). “Interrater and intrarater reliability of the oral interview and concurrent validity with cloze procedure in Hebrew”. In J.W.Oller (ed.). Issues in Language Testing Research. Rowley, MA: Newbury House.
  • Taylor, L., & Wigglesworth, G. (2009). Are two heads better than one? Pair work in L2 assessment contexts. Language Testing, 26, 325–339.
  • Uyanık, G.K., Güler, N., Teker, G.T., & Demir, S. (2018). Fen bilimleri dersi etkinliklerinin çok düzeyli Rasch modeliyle analizi. Kastamonu Eğitim Dergisi, 27 (1): 139-150.
  • Wang Haizhen. (2008). A Study on Raters’ Interpretation and Application of the Rating Criteria in TEM4-Oral. Theory and Practice of Foreign Languages Teaching 2:33-39.
  • Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15, 263-287.
  • Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305–319. https://doi.org/10.1177/026553229301000306
  • Wolfe, E.W. (2004). Identifying rater effects using latent trait models. Psychology Science, 46(1), 35–51.
  • Wolfe, E. W., & Dobria, L. (2008). Applications of the multifaceted Rasch model. In J. W. Osborne (Ed.), Best practices in quantitative methods (pp. 71–85). Los Angeles: Sage.
APA AKAY M (2020). A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT. , 1126 - 1141.
Chicago AKAY Murat POLAT Emel A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT. (2020): 1126 - 1141.
MLA AKAY Murat POLAT Emel A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT. , 2020, ss.1126 - 1141.
AMA AKAY M A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT. . 2020; 1126 - 1141.
Vancouver AKAY M A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT. . 2020; 1126 - 1141.
IEEE AKAY M "A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT." , ss.1126 - 1141, 2020.
ISNAD AKAY, Murat POLAT Emel. "A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT". (2020), 1126-1141.
APA AKAY M (2020). A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT. IOJET, 7(3), 1126 - 1141.
Chicago AKAY Murat POLAT Emel A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT. IOJET 7, no.3 (2020): 1126 - 1141.
MLA AKAY Murat POLAT Emel A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT. IOJET, vol.7, no.3, 2020, ss.1126 - 1141.
AMA AKAY M A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT. IOJET. 2020; 7(3): 1126 - 1141.
Vancouver AKAY M A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT. IOJET. 2020; 7(3): 1126 - 1141.
IEEE AKAY M "A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT." IOJET, 7, ss.1126 - 1141, 2020.
ISNAD AKAY, Murat POLAT Emel. "A RASCH ANALYSIS OF RATER BEHAVIOUR IN SPEAKING ASSESSMENT". IOJET 7/3 (2020), 1126-1141.