Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması

GÜNDEĞER, Ceylan; DOĞAN, Nuri

doi:10.21031/epod.401077

Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması

Ceylan GÜNDEĞER, (Hacettepe Üniversitesi, Eğitim Fakültesi, Ankara, Türkiye)

Nuri DOĞAN (Hacettepe Üniversitesi, Eğitim Fakültesi, Ankara)

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

7 3

Yıl: 2018 Cilt: 9 Sayı: 2 Sayfa Aralığı: 161 - 177 Metin Dili: Türkçe DOI: 10.21031/epod.401077 İndeks Tarihi: 11-01-2019

Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması

Öz:

Bu çalışmada Bireyselleştirilmiş Bilgisayarlı Sınıflama Testleri’nin (BBST) etkililiğinin sınıflama kriterlerine,madde seçme ve yetenek kestirim yöntemlerine göre nasıl değiştiğinin belirlenmesi amaçlanmıştır. Bu amaçla3 Parametreli Lojistik Model temel alınmış; belirlenen kesme noktası ve etrafında yüksek bilgi verecek şekilde500 maddelik bir havuz oluşturulmuş; birey yetenekleri (N(0,1)) 3000 kişi üzerinden türetilmiş ve bireylerinmadde cevap örüntüleri R yazılımda rasgele türetilmiştir. Sınıflama kriterlerinden Ardışık Olasılık Oran Testi(AOOT), Genelleştirilmiş Olabilirlik Oranı (GOO) ve Güven Aralığı (GA) yöntemleri; yetenek kestirimyöntemlerinden Beklenen Sonsal Dağılım (BSD) ve Ağırlıklandırılmış Olabilirlik Kestirimi (AOK)yöntemleri; madde seçme yöntemlerinden ise kesme noktasında (KN) ve kestirilen yetenek (KY) temelindeMaksimum Fisher Bilgisi (MFB) ve Kullback-Leibler Bilgisi (KLB) yöntemleri çaprazlanarak 48 koşuloluşturulmuştur. R yazılımında yürütülen BBST simülasyonu sonunda, ortalama test uzunluğu (OTU),ortalama sınıflama doğruluğu (OSD), bireylerin gerçek yetenek düzeyleri ile kestirilen yetenek düzeyleriarasındaki korelasyon (r), yanlılık, RMSE ve ortalama mutlak hata (OMH) değerlerinin 25 tekrara aitortalamaları hesaplanmıştır. Araştırma sonuçlarına göre test etkililiği bakımından GOO ve GA yöntemlerinin;ölçme kesinliği bakımından ise AOOT’nin daha iyi performans gösterdiği; sınıflama kriterlerinin farksızlıkbölgesi genişledikçe veya hata düzeyi değeri küçüldükçe test etkililiğinin arttığı; sınıflama kriterlerinintümünün her koşulda oldukça yüksek düzeyde sınıflama doğruluğuna sahip olduğu belirlenmiştir. Bireyleringerçek yetenek düzeyleri ile kestirilen yetenek düzeyleri arasındaki korelasyon bakımından BSD ve AOKyetenek kestirim yöntemlerinin her ikisinin de başarılı kestirimlerde bulundukları ancak ölçme kesinliğibakımından BSD’nin daha iyi performans sergilediği; madde seçme yöntemlerinin ise tümünün birbirinebenzer çalıştığı ancak MFB-KY’nin tüm bağımlı değişkenler açısından tüm koşullarda daha iyi performansgösterdiği görülmüştür.

Anahtar Kelime:

Konular: Tarih

A Comparison of Computerized Adaptive Classification Test Criteria in Terms of Test Efficiency and Measurement Precision

Öz:

In this study, it was aimed to determine how the efficiency of the Computerized Adaptive Classification Testing (CACT) changes according to classification criteria, item selection and ability estimation methods. For this purpose, a pool of 500 items, which is based on 3 PLM and informs at the arbitrary cut-point and around, has been generated; individual abilities have been generated using normal distribution (N(0,1)) for 3000 individuals and the item response patterns have been generated randomly in R software with the Monte Carlo simulation. As classification criteria, Sequential Probability Ratio Test (SPRT), Generalized Likelihood Ratio (GLR) and Confidence Interval (CI) methods; as ability estimation methods, Expected a Posteriori (EAP) and Weighted Likelihood Estimation (WLE) methods; and as item selection methods, Maximum Fisher Information (MFI) and Kullback-Leibler Information (KLI) methods on the basis of cut-point (CP) and estimated ability (EA) have been crossed and 48 conditions have been investigated. At the end of the CACT simulations in R, the mean values of Average Test Length (ATL), Average Classification Accuracy (ACA), correlation between the true thetas and estimated thetas (r), bias, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) for 25 replications have been calculated. According to the results of the study, it has been observed that the GLR and the CI classification criteria perform better in terms of test efficiency, however the SPRT works better in terms of the measurement precision; test efficiency increases as the indifference region of classification criteria expands or the error value decreases; all classification criteria have considerably high level of the classification accuracy in all conditions. It has been concluded that both ability estimation methods have successful estimation results in terms of the correlation between true and estimated thetas (r); whereas the EAP relatively performs better in terms of the measurement precision; and all of the item selection methods work similarly to each other however the MFI-EA performs better for all conditions in terms of all dependent variables.

Anahtar Kelime:

Konular: Tarih

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık

Yi, Q., Wang, T. & Ban, J. (2000). Effects of scale transformation and test termination rule on the precision of ability estimates in CAT. ACT Research Report Series, 2000-2. [Online: https://onlinelibrary.wiley.com/doi/full/10.1111/j.1745-3984.2001.tb01127.x, Accessed date: 17.5.2018.]
Yang, X, Poggio, J. C. & Glasnapp, D. R. (2006). Effects of estimation bias on multiple category classification with an IRT-based adaptive classification procedure. Educational and Psychological Measurement, 66(4), 545-564
Wouda, J. T. & Eggen, T. J. H. M. (2009). Computerized classification testing in more than two categories by using stochastic curtailment. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. Accessed date: [17.5.2018] from http://iacat.org/sites/default/files/biblio/cat09wouda.pdf
Weiss, D. J. & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375
Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6(4), 473-492
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450
Wang, S. & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317–331
Wang, T., Hanson, B. A. & Lau, C. A. (1999). Reducing bias in CAT trait estimation: a comparison of approaches. Applied Psychological Measurement, 23(3), 263-278
Wainer, H. (2000). Computerized adaptive testing: a primer. New Jersey: Lawrence Erlbaum Associates Wald, A. (1947). Sequential analysis. New York: John Wiley
van der Linden, W. J. (1990). Applications of decision theory to test-based decision making. In R. K. Hambleton & J. N. Zaal (Eds.). Advances in educational and psychological measurement. Massachusetts: Kluwer-Nijhof.
Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical assessment, Research & Evaluation, 16(4), 1-7
Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793
Thompson, N. A. (2007b). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1), 1-13
Thompson, N. A. (2007a). A comparison of two methods of polytomous Computerized classification testing for multiple cutscores. (Unpublished Doctoral Dissertation). University of Minnesota.
Thompson, N. A. & Ro, S. (2007). Computerized classification testing with composite hypotheses. In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. Accessed date: [17.5.2018] from http://iacat.org/sites/default/files/biblio/cat07nthompson.pdf
Şencan, H. (2005). Sosyal ve davranışsal ölçümlerde güvenirlilik ve geçerlilik. Ankara: Seçkin Yayıncılık.
Spray, J. A. & Reckase, M. D. (1996). Comparison of SPRT and sequential bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405-414.
Spray, J. A. & Reckase, M. D. (1994, April). The selection of test items for decision making with a computer adaptive test. Paper presented at the annual meeting of the National Council on Measurement in Education, NewOrleans, LA.
Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.). New horizonsin testing: latent trait theory and computerized adaptive testing. New York: Academic Press.
R Core Team. (2013). R: A language and environment for statistical computing, (Version 3.0.1), Vienna, Austria: R Foundation for Statistical Computing. Online: http://www.R-project.org/
Nydick, S. W. (2014). catirt: An R Package for Simulating IRT-Based Computerized Adaptive Tests. [Online: https://cran.r-project.org/web/packages/catIrt/catIrt.pdf, Accessed date: 17.5.2018.]
Nydick, S. W. (2013). Multidimensional mastery testing with CAT. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3607925)
Nydick, S. W., Nozawa, Y. & Zhu, R. (2012, April). Accuracy and efficiency in classifying examinees using computerized adaptive tests: an application to a large scale test. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
Miller, I. & Miller, M. (2004). John E. Freund’s mathematical statistics with applications. New Jersey: Prentice Hall
McBride, J. R. (1985). Computerized adaptive testing. Educational Leadership, 43(2), 25 -28
Lin, C. J. & Spray, J. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. ACT Research Report Series 2000-8. [Online: https://eric.ed.gov/?id=ED445066, Accessed date: 17.5.2018.]
Lau, C. A. & Wang, T. (1999, April). Computerized classification testing under practical constraints with a polytomous model. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.
Lau, C. A. & Wang, T. (1998, April). Comparing and combining dichotomous and polytomous items with SPRT procedure in computerized classification testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Kingsbury, G. G. & Weiss, D. J. (1980). A Comparison of Adaptive, Sequential and Conventional Testing Strategies for Mastery Decisions. Research Report 80-4. [Online: http://iacat.org/sites/default/files/biblio/ki80-04.pdf , Accessed date: 17.5.2018.]
Jiao, H. & Lau, A. C. (2003). The Effects of Model Misfit in Computerized Classification Test. The annual meeting of the National Council of Educational Measurement. Chicago, IL, April 2003. [Online: http://iacat.org/sites/default/files/biblio/ji03-01.pdf , Accessed date: 17.5.2018.]
Huebner, A. (2012). Item overexposure in computerized classification tests using sequential item selection. Practical Assessment, Research & Evaluation, 17(12), 1-9.
Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: principles and applications. Boston: Kluwer Nijhoff Publishing
Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologist. London: Lawrence Erlbaum Associates Publishers
Eggen, T. J. H. M. & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261
Dooley, K. (2002). Simulation research methods. In J. Baum (Ed.). Companion to organizations. London: Blackwell.
Cheng, P. E. & Liou, M. (2000). Estimation of trait level in computerized adaptive testing. Applied Psychological Measurement, 24(3), 257–265
Boyd, A. M. (2003). Strategies for controlling testlet exposure rates in computerized adaptive testing systems. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3110732)

APA	GÜNDEĞER C, DOĞAN N (2018). Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. , 161 - 177. 10.21031/epod.401077
Chicago	GÜNDEĞER Ceylan,DOĞAN Nuri Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. (2018): 161 - 177. 10.21031/epod.401077
MLA	GÜNDEĞER Ceylan,DOĞAN Nuri Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. , 2018, ss.161 - 177. 10.21031/epod.401077
AMA	GÜNDEĞER C,DOĞAN N Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. . 2018; 161 - 177. 10.21031/epod.401077
Vancouver	GÜNDEĞER C,DOĞAN N Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. . 2018; 161 - 177. 10.21031/epod.401077
IEEE	GÜNDEĞER C,DOĞAN N "Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması." , ss.161 - 177, 2018. 10.21031/epod.401077
ISNAD	GÜNDEĞER, Ceylan - DOĞAN, Nuri. "Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması". (2018), 161-177. https://doi.org/10.21031/epod.401077

APA	GÜNDEĞER C, DOĞAN N (2018). Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 9(2), 161 - 177. 10.21031/epod.401077
Chicago	GÜNDEĞER Ceylan,DOĞAN Nuri Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi 9, no.2 (2018): 161 - 177. 10.21031/epod.401077
MLA	GÜNDEĞER Ceylan,DOĞAN Nuri Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, vol.9, no.2, 2018, ss.161 - 177. 10.21031/epod.401077
AMA	GÜNDEĞER C,DOĞAN N Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi. 2018; 9(2): 161 - 177. 10.21031/epod.401077
Vancouver	GÜNDEĞER C,DOĞAN N Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi. 2018; 9(2): 161 - 177. 10.21031/epod.401077
IEEE	GÜNDEĞER C,DOĞAN N "Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması." Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 9, ss.161 - 177, 2018. 10.21031/epod.401077
ISNAD	GÜNDEĞER, Ceylan - DOĞAN, Nuri. "Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması". Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi 9/2 (2018), 161-177. https://doi.org/10.21031/epod.401077