Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması

DOĞAN, Nuri; BIKMAZ BİLGEN, ÖZGE

Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması

Özge BIKMAZ BİLGEN, (Adnan Menderes Üniversitesi, Eğitim Fakültesi, Temel Eğitim Bölümü, Aydın-Türkiye)

Nuri DOĞAN (Hacettepe Üniversitesi, Eğitim Bilimleri Enstitüsü, Eğitimde Ölçme ve Değerlendirme Bilim Dalı, Ankara Türkiye)

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

164 18

Yıl: 2017 Cilt: 8 Sayı: 1 Sayfa Aralığı: 63 - 78 Metin Dili: Türkçe İndeks Tarihi: 29-07-2022

Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması

Öz:

Bu araştırmada dereceli puanlama anahtarı türü ve puanlayıcı sayısı değişiminin, puanlayıcı güvenirliğini belirlemede kullanılan tekniklerden elde edilen sonuçlar üzerindeki etkisi incelenmiştir. Araştırmanın çalışma grubu, 50 öğrenci ve puanlama yapan 10 öğretmenden oluşmaktadır. Betimsel nitelik taşıyan araştırmada puanlayıcı güvenirliğini belirlemede Kappa istatistik tekniği, log linear analiz tekniği ve Krippendorff alfa tekniği kullanılmıştır. Puanlayıcı sayısı değişiminin puanlayıcı güvenirliğine etkisini incelemek adına belirtilen üç teknik kullanılarak iki, beş ve on puanlayıcı arasındaki uyum düzeyleri hesaplanmıştır. Araştırmada üç teknikten elde edilen analiz sonuçlarında, analitik puanlama anahtarı kullanımıyla elde edilen puanlarda, puanlayıcı sayısı artışının güvenirlik düzeyini düşürdüğü tespit edilmiştir. Üç teknikle yapılan analizlerde, en yüksek güvenirlik değerleri iki puanlayıcı kullanıldığında elde edilmiş, puanlayıcı sayısı artırıldıkça güvenirliğin düştüğü saptanmıştır. Analitik puanlama anahtarını oluşturan kategoriler incelendiğinde kategoriler arasında objektiflik düzeyine dayalı olarak, puanlayıcıların uyum düzeylerinde değişkenlik olduğu saptanmıştır. Araştırmanın sonucunda, kullanılan tekniklerden Kappa tekniği ve Krippendorff alfa tekniğinin paralel sonuçlar verdiği görülmüştür. Bununla birlikte Krippendorff alfa tekniğinin puanlayıcı sayısı değişiminden Kappa tekniğine göre daha az etkilendiği belirlenmiştir. Log-linear analiz tekniğinin ise değişkenler arasındaki etkileşimleri ve uyumsuzluk kaynağını gösteren daha kapsamlı ve geniş bilgi sağladığı tespit edilmiştir. Sonuç olarak, daha detaylı ölçme sonuçları elde edilmek istendiğinde alt kategorilerden oluşan analitik puanlama anahtarı kullanılarak toplanan puanların, kategorik veri analizi için uygun olan log-linear analiz tekniğinin; daha genel ölçme sonuçlarına ulaşılmak istendiğinde ise bütünsel puanlama anahtarı ile elde edilen puanların Krippendorff alfa tekniğinin kullanılmasının uygun olduğu düşünülmektedir

Anahtar Kelime:

Konular: Tarih

The Comparison of Interrater Reliability Estimating Techniques

Öz:

The aim of this study is to analyse the effects of the number of raters and the types of rubric on the results obtained by the techniques used to estimate the interrater reliability. The research group consists of 50 students and 10 teachers who rated. As a descriptive study, in this paper the Kappa statistical technique, the log linear analysis technique, and the Krippendorff alpha technique were used to determine the rater reliability. In order to investigate the effects of the number of raters on the interrater reliability, the level of agreement between 2, 5, and 10 raters was calculated by using those three techniques. The findings obtained from the three techniques demonstrated that the use of analytic rubric provided much more reliable ratings than holistic rubric. Moreover, it was also found based on the analysis results obtained through all three techniques that maximum reliability values were obtained by using two raters, reliability values decreased with the increase in the number of raters. On examining the categories constituting analytic rubric, it was found that there was variability in the levels of raters’ agreement on the basis of objectivity. It was observed from the results that Kappa statistics and Krippendorff Alpha techniques yielded similar results. Moreover, Krippendorff alpha technique was found to be affected less by the number of raters. Log linear analysis technique, on the other hand, provided more comprehensive and extensive knowledge through showing the source of disagreement and interaction among the variants. As a result, it is thought that analyzing the scores obtained by using the analytic rubric which is composed of sub-categories using log-linear analysis technique would be more appropriate when the purpose is to obtain more detailed measurement results whereas analyzing the scores obtained through holistic rubric by using the Krippendorff technique would be more appropriate when the purpose is to obtain more general results

Anahtar Kelime:

Konular: Tarih

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık

Abedi, J., Baker, E L., & Herl, H. (1995). Comparing reliability indices obtained by different approaches for performance assessments. Los Angeles: University of California, CSE Technical Report, 401.
Airasian, P. W. (1994). Classroom assessment. New York: McGraw-Hill.
Agresti, A. (1996). An introduction to categorical data analysis. New York: John Wiley & Sons, INC.
Agresti, A. & Yang, M. (1987). An empirical investigation of some effects of sparseness in contingency tables. Computational Statistics & Data Analysis, 5, 9-21.
Atılgan, H., Kan, A. ve Doğan, N. (2007). Eğitimde ölçme ve değerlendirme (2. Basım). Ankara: Anı.
Baykul, Y. (2000). Eğitim ve psikolojide ölçme: Klasik Test Teorisi ve uygulaması. Ankara: ÖSYM.
Brennen, R. L., & Prediger, D. J. (1981). Coefficient Kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41(1981), 687-699.
Burry-Stock, J. A., Shaw, D. G., Laurie, C., & Chissom, B. S. (1996). Rater-agreement indexies for performance assessment. Educational and Psychological Measurement, 56(2), 251-262.
Cohen. J. R., Swerdlik M. E., & Phillips, S. M. (1996). Psychological testing and assessment. (3th Ed.). London: Mayfield.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46.
Crawforth, K. (2001). Measuring the interrater reliability of a data collection instrument developed to evaluate anesthetic outcomes (Doctoral Dissertation). Available from Proquest Dissertations and Theses database. (UMI No. 3037063)
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Ohio: Centage Learning.
Fitzpatrick, R., & Morrison, E. J. (1971). Performance and product evaluation. In R. L. Thorndike (Ed.), Educational measurement (p. 237–270). Washington DC: American Council on Education.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378- 382.
Goodrich, H. (1997). Understanding rubric. Educational Leadership, 54(4), 14-17.
Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Psychical Education and Exercises Science, 5(1), 13-14.
Haladyna, M. T. (1997). Writing test items to evaluate higher order thinking. Needham Heights: Allyn and Bacon.
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Rewiew, 2(2007), 130-144.
Korkmaz, H. (2004). Fen ve teknoloji eğitiminde alternatif değerlendirme yaklaşımları. Ankara: Yeryüzü.
Krippendorff, K. (1995). On the reliability of unitizing continuous data. Sociological Methodology, 25, 47-76.
Krippendorff, K. (2004). Measuring the reliability of qualitative text analysis data. Humanities, Social Sciences and Law, 38(6), 787-800.
Krippendorff, K. (2007). Computing Krippendorff’s alpha reliability. 7 Eylül 2015 tarihinde http://repository.upenn.edu/asc_papers/43/ adresinden erişildi.
Kutlu, Ö., Doğan, D. C. ve Karakaya, İ. (2009). Öğrenci başarısının belirlenmesi: performansa ve portfolyaya dayalı durum belirleme. Ankara: Pegem Akademi.
Landis, J, R., & Koch, G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174.
Mertler, C. A. (2001). Designing scoring rubrics for your classroom. Pratical Assessment Research and Evaluation, 7(25). Available online: http://PAREonline.net/getvn.asp?v=7&n=25.
Moskal, B. M. (2000). Scoring rubrics: What, when and how? Practical Assessment Research and Evaluation, 7(3). Available online: http://PAREonline.net/getvn.asp?v=7&n=3.
Nitko, A. J. (2001). Educational assessment of students. (3th ed). New Jersey: Prentice Hall.
Nying, E. (2004). A comparative study of interrater reliability coefficients obtained from different statistical procedures using monte carlo simulation tecniques (Doctoral Dissertation). Available from Proquest Dissertations and Theses database. (UMI No. 3138768).
Sim, J., & Wright, C. C. (2005) The Kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Theraphy, 85(3), 258-268.
Tanner, M. A., & Young, M. A. (1985). Modeling agreement among raters. Journal of the American Statistical Association, 80(389). 175-180.
Viere, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: The Kappa statistic. Family Medicine, 37(5), 360-362.
Von Eye, A., & Mun, E. Y. (2005). Analyzing rater agreement: Manifest variable methods. New Jersey: Lawrence Erlbaum Associates.

APA	BIKMAZ BİLGEN Ö, DOĞAN N (2017). Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. , 63 - 78.
Chicago	BIKMAZ BİLGEN ÖZGE,DOĞAN Nuri Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. (2017): 63 - 78.
MLA	BIKMAZ BİLGEN ÖZGE,DOĞAN Nuri Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. , 2017, ss.63 - 78.
AMA	BIKMAZ BİLGEN Ö,DOĞAN N Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. . 2017; 63 - 78.
Vancouver	BIKMAZ BİLGEN Ö,DOĞAN N Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. . 2017; 63 - 78.
IEEE	BIKMAZ BİLGEN Ö,DOĞAN N "Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması." , ss.63 - 78, 2017.
ISNAD	BIKMAZ BİLGEN, ÖZGE - DOĞAN, Nuri. "Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması". (2017), 63-78.

APA	BIKMAZ BİLGEN Ö, DOĞAN N (2017). Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(1), 63 - 78.
Chicago	BIKMAZ BİLGEN ÖZGE,DOĞAN Nuri Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi 8, no.1 (2017): 63 - 78.
MLA	BIKMAZ BİLGEN ÖZGE,DOĞAN Nuri Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, vol.8, no.1, 2017, ss.63 - 78.
AMA	BIKMAZ BİLGEN Ö,DOĞAN N Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi. 2017; 8(1): 63 - 78.
Vancouver	BIKMAZ BİLGEN Ö,DOĞAN N Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi. 2017; 8(1): 63 - 78.
IEEE	BIKMAZ BİLGEN Ö,DOĞAN N "Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması." Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8, ss.63 - 78, 2017.
ISNAD	BIKMAZ BİLGEN, ÖZGE - DOĞAN, Nuri. "Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması". Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi 8/1 (2017), 63-78.