A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory

Alkan, Meral; DOĞAN, NURİ

doi:10.21031/epod.1210917

A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory

Meral ALKAN, (Gazi Üniversitesi, Ankara, Türkiye)

Nuri DOĞAN (Hacettepe Üniversitesi, Eğitim Fakültesi, Ankara, Türkiye)

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

1 0

Yıl: 2023 Cilt: 14 Sayı: 2 Sayfa Aralığı: 106 - 117 Metin Dili: İngilizce DOI: 10.21031/epod.1210917 İndeks Tarihi: 07-07-2023

A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory

Öz:

This study compares the different designs obtained through four raters’ scoring the open-ended items used in PISA 2009 reading literacy altogether or alternately according to the Generalizability Theory. The sample of the research was composed of 362 students (out of 4996 students participating in PISA 2009) who responded to the items of reading skills and who were scored by more than one rater. Two designs were created so as to be used in generalizability theory in the study. One of them was the crossed design symbolized as “s x i x r” (student x item x rater), in which students are scored by each rater in terms of the same skills. The second was the nested design symbolized as “(r:s) x i”, where each rater scored only a group of students and raters are nested in students and the items were crossed with these variables. On comparing the s x i x r design with (r:s) x i design, it was found that the relative and absolute error variances estimated for (r:s) x i design were smaller than those for s x i x r design and that therefore the G and Phi coefficients took on bigger values. On increasing the number of raters in both designs, the G and Phi coefficients also increased in the D study. While acceptable values of G and Phi coefficients were reached on reducing the number of raters by half in Booklet 2, raising the number of raters seemed more appropriate in Booklet 8.

Anahtar Kelime: Generalizability theory reliability G study D study PISA 2009

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık

Atılgan, H. (2008). Using generalizability theory to assess the score realibility of the special ability selection examinations for music education programmes in higher education. International Journal of Research and Method Education, 31(1), 63-76. https://doi.org/10.1080/17437270801919925.
Atılgan, H., Kan, A. & Doğan, N. (2011). Eğitimde ölçme ve değerlendirme. (5. Baskı). Anı Yayıncılık.
Balbağ, M., Leblebicier, K., Karaer G., Sarıkahya E. & Erkan Ö. (2016). Türkiye'de fen eğitimi ve öğretimi sorunları. Eğitim ve Öğretim Araştırmaları Dergisi, 5(3), 1-12. http://www.jret.org/FileUpload/ks281142/File/02.m._zafer_balbag.pdf
Baykul, Y. (2000). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. ÖSYM
Bernardin, H. J. & Villanova, P. (2005). Research streams in rater self-efficacy. Group and Organizational Management, 30, 61-88. https://doi.org/10.1177/1059601104267675
Biemer, L. (1993). Trends-social studies /authentic assessment. Educational Leadership, 50 (8). https://www.ascd.org/el/articles/-authentic-assessment
Brennan, R. L. (2001). Generalizability theory. Springer-Verlag Publishing. https://doi.org/10.1007/978-1-4757-3456-0
Demir, E. (2010). Uluslararası öğrenci değerlendirme programı (PISA) bilişsel alan testlerinde yer alan soru tiplerine göre Türkiye’de öğrenci başarıları (Yayınlanmamış yüksek lisans tezi). Hacettepe Üniversitesi.
EARGED (2010). PISA 2009 projesi, ulusal ön raporu. 15 Mart 2011 tarihinde http://earged.meb.gov.tr/pdf/pisa2009rapor.pdf adresinden erişilmiştir.
Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercises Science, 5(1), 13-34. https://doi.org/10.1207/S15327841MPEE0501_2
Güler, N. (2013). Eğitimde ölçme ve değerlendirme (5. Baskı). Pegem Akademi.
Hathcoat, J. D., & Penn, J. D. (2012). Generalizability of student writing across multiple tasks: A challenge for authentic assessment. Research & Practice in Assessment, 7, 16-28. https://files.eric.ed.gov/fulltext/EJ1062689.pdf
Karasar, N. (1998). Araştırmalarda rapor hazırlama yöntemi. Pars Matbaacılık
Khodi, A. (2021). The affectability of writing assessment scores: A G-theory analysis of rater, task and scoring method contribution. Language Testing in Asia 11, Article 30 https://doi.org/10.1186/s40468-021-00134-5
Konak, Ö. A. (2010). Eğitim ve öğretim etkinlikleri üzerine. Cito Eğitim: Kuram ve Uygulama Dergisi, 10, 4-5.
Kutlu, Ö. (2006). Üst düzey zihinsel süreçleri belirleme yolları: Yeni durum belirleme yaklaşımları. Çağdaş Eğitim Dergisi, 31(335), 15-21. https://search.trdizin.gov.tr/tr/yayin/detay/74516/
Lee, Y. W. (2005). Dependability of scores for a new ESL speaking test: Evaluating prototype tasks. ETS. http://www.ets.org/Media/Research/pdf/RM-04-07.pdf
Mcbee, M., & Barnes, L. (1998), The generalizability of a performance assessment measuring achievement in eighth-grade mathematics. Applied Measurement in Education, 11(2), 179-194. https://doi.org/10.1207/s15324818ame1102_4
MEB (2017). Akademik becerilerin izlenmesi ve değerlendirilmesi (ABİDE) projesi. 1 Eylül 2022 tarihinde http://abide.meb.gov.tr/proje-hakkinda.asp adresinden erişilmiştir.
Mushquash, C., & O’Connor, B.P. (2006). SPSS and SAS programs for generalizability theory analyses. Behavior Research Methods 38, 542–547 https://doi.org/10.3758/BF03192810
Nalbantoğlu, F. & Gelbal, S. (2011). İletişim becerileri istasyonu örneğinde genellenebilirlik kuramıyla farklı desenlerin karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 41, 509-518. http://www.efdergi.hacettepe.edu.tr/shw_artcl-718.html
OECD (2012). PISA 2009 technical report, PISA, OECD Publishing. http://dx.doi.org/10.1787/9789264167872-en
OECD (2017), OECD (2017), PISA 2015 assessment and analytical framework: science, reading, mathematic, financial literacy and collaborative problem solving, PISA, OECD Publishing http://dx.doi.org/10.1787/9789264281820-en
ÖSYM (2013). Açık uçlu sorularla deneme sınavı: Soru/cevap kitapçığının yayımlanması www.osym.gov.tr/belge/1-19413/acik-uclu-sorularla-deneme-sinavi-sorucevap-kitapcigini-.html adresinden erişim sağlanmıştır.
ÖSYM. (2017). Açık uçlu sorular hakkında bilgilendirme ve açık uçlu soru örnekleri. https://www.osym.gov.tr/TR,12909/2017-lisans-yerlestirme-sinavlari-2017-lys-acik-uclu-sorular-hakkinda-bilgilendirme-ve-acik-uclu-soru-ornekleri-05012017.html adresinden erişim sağlanmıştır.
Özçelik, D. A. (2010). Ölçme ve değerlendirme. Pegem Akademi.
Polat, M. & Turhan, N. (2021) Applying generalizability theory in language testing: Comparing nested and crossed scoring designs in the assessment of speaking skills, International Journal of Curriculum and Instruction,13(3), 3344–3358. https://ijci.globets.org/index.php/IJCI/article/view/825/409
Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94 (1), 31-37. https://doi.org/10.5951/MT.94.1.0031
Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing 22(1) 1-30. https://doi.org/10.1191/0265532205lt295oa
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6), 956–970 https://doi.org/10.1037/0021-9010.85.6.956
Sharma, F. & Weathers, D. (2003). Assessing generalizability of scales used in cross-national research. International Journal of Research in Marketing, 20, 287-295. http://dx.doi.org/10.1016/S0167-8116(03)00038-7
Shavelson, R. J. & Webb, N. M. (1991). Generalizability theory: A primer. Sage Publications
Smith, Teresa A. (1997 March 24-28). The Generalizability of Scoring TIMSS Open-Ended Items. (Report). Annual Meeting of the American Educational Research Association, Chicago, USA
Turgut, F. M. (1992) Eğitimde ölçme ve değerlendirme metotları. (9. Baskı). Saydam Matbaacılık.
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263–287. https://doi.org/10.1177/026553229801500205
Wexley, K. N. & Youtz, M. A. (1985). Rater beliefs about others: Their effect on rating errors and rater accuracy. Journal of Occupational Psychology, 58, 265-275. https://psycnet.apa.org/doi/10.1111/j.2044-8325.1985.tb00200.x
Zorba, İ. (2020). Personel alımında kullanılan bir yazılı sınav sonucunun genellenebilirlik kuramındaki farklı desenlerle karşılaştırılması (Yayımlanmamış yüksek lisans tezi). Ankara Üniversitesi.

APA	Alkan M, DOĞAN N (2023). A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. , 106 - 117. 10.21031/epod.1210917
Chicago	Alkan Meral,DOĞAN NURİ A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. (2023): 106 - 117. 10.21031/epod.1210917
MLA	Alkan Meral,DOĞAN NURİ A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. , 2023, ss.106 - 117. 10.21031/epod.1210917
AMA	Alkan M,DOĞAN N A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. . 2023; 106 - 117. 10.21031/epod.1210917
Vancouver	Alkan M,DOĞAN N A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. . 2023; 106 - 117. 10.21031/epod.1210917
IEEE	Alkan M,DOĞAN N "A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory." , ss.106 - 117, 2023. 10.21031/epod.1210917
ISNAD	Alkan, Meral - DOĞAN, NURİ. "A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory". (2023), 106-117. https://doi.org/10.21031/epod.1210917

APA	Alkan M, DOĞAN N (2023). A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 14(2), 106 - 117. 10.21031/epod.1210917
Chicago	Alkan Meral,DOĞAN NURİ A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi 14, no.2 (2023): 106 - 117. 10.21031/epod.1210917
MLA	Alkan Meral,DOĞAN NURİ A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, vol.14, no.2, 2023, ss.106 - 117. 10.21031/epod.1210917
AMA	Alkan M,DOĞAN N A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi. 2023; 14(2): 106 - 117. 10.21031/epod.1210917
Vancouver	Alkan M,DOĞAN N A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi. 2023; 14(2): 106 - 117. 10.21031/epod.1210917
IEEE	Alkan M,DOĞAN N "A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory." Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 14, ss.106 - 117, 2023. 10.21031/epod.1210917
ISNAD	Alkan, Meral - DOĞAN, NURİ. "A Comparison of Different Designs in Scoring of PISA 2009 Reading Open Ended Items According to Generalizability Theory". Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi 14/2 (2023), 106-117. https://doi.org/10.21031/epod.1210917