Yıl: 2021 Cilt: 11 Sayı: 2 Sayfa Aralığı: 357 - 378 Metin Dili: İngilizce DOI: 10.19126/suje.963046 İndeks Tarihi: 09-02-2022

Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*

Öz:
The purpose of this study is to examine the assessment of an open-ended mathematics exam to reveal the effects of the Group Moderation Assessment Model. The Group Moderation Assessment Model is a process in which teachers share their expectations and understanding standards with each other to improve the consistency of their students’ learning decisions. In this study, one of the non- random sampling methods, the appropriate sampling method was used. The exam papers used in this study belong to a total of 22 students studying in the 10th grade. The students’ exam papers (for three math exams) were evaluated by an assessment team of five mathematics teachers in the group moderation assessment model. The findings show that the raters were positively influenced by each other and that they formed a reliable evaluation system by making judgments. In addition, it was found that the raters scored in a consistent way with each other in the exams conducted after the group moderation assessment model workshops. In conclusion, in the workshops held during the implementation of the group moderation assessment model, it was found that the teachers’ knowledge and opinion with each other positively affected the teachers’ ability to assess exam papers.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • Adie, L. E. (2013). The development of teacher assessment identity through participation in online moderation. Assessment in Education: Principles, Policy & Practice, 20(1), 91–106.
  • Aiken, L. R. (2000). Psychological testing and assessment (10. Edition). Boston: Allyn and Bacon.
  • Allal, L., & Mottier Lopez, L. (2014). Teachers’ professional judgment in the context of collaborative assessment practice. In C. Wyatt-Smith, V. Klenowski & P. Colbert (Eds.), Designing Assessment for Quality Learning (pp. 151-165). London: Springer (The Enabling Power of Assessment).
  • Association for Advanced Training, (1988). Association for Advanced Training in The Behavioral Sciences. Pub: Los Angeles.
  • Baykul, Y. (2000). Eğitimde ve Psikolojide Ölçme: Klasik Test Teorisi ve Uygulaması [Measurement in Education and Psychology: Classical Test Theory and Practice]. Ankara, ÖSYM Yayınları.
  • Benton, T., & Gallacher, T. (2018). Is comparative judgement just a quick form of multiple marking? Research Matters: A Cambridge Assessment Publication, 24, 37–40.
  • Black, P., & Wiliam, D. (2010). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 92(1), 81-90.
  • Black, P., Harrison, C., Hodgen, J., Marshall, B., & Serret, N. (2010). Validity in teachers’ summative assessments. Assessment in Education: Principles, Policy & Practice 17(2), 217–34.
  • Bramley, T., & Vitello, S. (2018). The effect of adaptivity on the reliability coefficient in adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 26(1), 43–58.
  • Büyükkıdık, S., & Anıl, D. (2015). Performansa dayalı durum belirlemede güvenirliğin genellenebilirlik kuramında farklı desenlerde incelenmesi [Examination of reliability in performance-based assessment in different designs in generalizability theory]. Eğitim ve Bilim, 40(177), 285-296.
  • Büyüköztürk, Ş. (2014). Sosyal Bilimler için Veri Analizi El Kitabı İstatistik, Araştırma Deseni SPSS Uygulamaları ve Yorum [Data Analysis Handbook for Social Sciences Statistics, Research Design SPSS Applications and Interpretation] (20th ed.). Ankara: Pegem Akademi Yayıncılık.
  • Cooksey, R. W., Freebody, P., & Wyatt-Smith, C. (2007). Assessment as judgment-in-context: Analysing how teachers evaluate students' writing. Educational Research and Evaluation, 13(5), 101–434.
  • Clarke, S. (2011). Formative Assessment in Action Weaving The Elements Together. Londres: Hodder Murray.
  • Cunningham, G. K. (1998). Assessment in the classroom: constructing and interpreting texts. Psychology Press.
  • Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117–135.
  • DeLuca, C., & Johnson, S. (2017). Developing assessment capable teachers in this age of accountability. Assessment in Education: Principles, Policy & Practice, 24(2), 121–126.
  • Doğan C. D., & Anadol, H.Ö. (2017). Genellenebilirlik Kuramında Tümüyle Çaprazlanmış ve Maddelerin Puanlayıcılara Yuvalandığı Desenlerin Karşılaştırılması [Comparison of Patterns in Generalizability Theory with Fully Crossed and Items Nested to Raters]. Kastamonu Üniversitesi Kastamonu eğitim Dergisi, 25(1), 361-372.
  • Downing, S.M. (2009). Written Tests: Constructed-Response and Selected-Response Formats. In Downing, S.M. & Yudkowsky, R. (Eds.) Assessment in Health Professions Education (pp. 149- 184). New York and London: Routledge.
  • Earle, S. (2020). Balancing the demands of validity and reliability in practice: Case study of a changing system of primary science summative assessment. London Review of Education, 18(2), 221–235.
  • Evans-Hampton, T. N., Skinner, C. H., Henington, C., Sims, S., & McDaniel, C. E. (2002). An investigation of situational bias: Conspicuous and covert timing during curriculum-based measurement of mathematics across African American and Caucasian students. School Psychology Review, 31(4), 529–539.
  • Gipps, C., & Stobart, G. (2003). Alternative Assessment (Vol. 2). Los Angelas, London, New Delhi, Singapore: SAGE Publications.
  • Gipps, C.V. (1994). Beyond testing. London: The Farmer Press.
  • Gipps, C.V. (1996). Assessment for learning. In Assessment in transition: Learning, monitoring and selection in international perspective, ed. A. Little and A. Wolf, 251–61. Oxford: Pergamon.
  • Goodwin, L. D. (2001). Interrater agreement and reliability. Measurement in Physical Education and Exercise Science, 5(1), 13-34.
  • Gravetter, F. J., & Forzano, L. B. (2012). Research Methods for the Behavioral Sciences (4th ed.). Belmont, CA: Wadsworth.
  • Gronlund, N.E., & Linn, R.L. (1990) Measurement and Evaluation in Teaching. McMillan Company, New York.
  • Güler, N., & Gelbal, S. (2010). A Study Based on Classical Test Theory and Many Facet Rasch Model. Eurasian Journal of Educational Research, 38, 108-125.
  • Güler, N., & Teker Taşdelen, G. (2015). Açık Uçlu Maddelerde Farklı Yaklaşımlarla Elde Edilen Puanlayıcılar Arası Güvenirliğin Değerlendirilmesi [Evaluation of Inter-rater Reliability Obtained by Different Approaches in Open-Ended Items]. Journal of Measurement and Evaluation in Education and Psychology, 6(1), 12-24.
  • Harlen, W. (2005). Teachers' summative practices and assessment for learning – tensions and synergies. The Curriculum Journal, 16, 207 - 223.
  • Harlen, W. (2010). Professional learning to support teacher assessment. In J. Gardner, W. Harlen, L. Hayward, & G. Stobart (Eds.), Developing teacher assessment (1st ed). Open University Press.
  • Humphry, S. M., & Heldsinger, S. (2019). A two-stage method for classroom assessments of essay writing. Journal of Educational Measurement, 56(3), 505–520.
  • Humphry, S. M., & Heldsinger, S. (2020) A Two-Stage Method for Obtaining Reliable Teacher Assessments of Writing. Frontiers in Education, 5(6), 1-10.
  • Hutchinson, C., & Hayward, L. (2005) The journey so far: assessment for learning in Scotland. Curriculum Journal, 16(2), 225-248.
  • İlhan, M. (2016). Açık uçlu sorularla yapılan ölçmelerde klasik test kuramı ve çok yüzeyli Rasch modeline göre hesaplanan yetenek kestirimlerinin karşılaştırılması [Comparison of ability estimations calculated according to classical test theory and multi-faceted Rasch model in measurements made with open-ended questions.]. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 31(2), 346-368.
  • İlhan, M., & Çetin, B. (2014). Performans Değerlendirmeye Karışan Puanlayıcı Etkilerini Azaltmanın Yollarından Biri Olarak Puanlayıcı Eğitimleri [Rater Trainings as One of the Ways to Reduce Rater Effects Interfering with Performance Evaluation], Journal of European Education, 4(2), 29-38.
  • Kamış, Ö., & Doğan, C. (2017). Genellenebilirlik Kuramında Gerçekleştirilen Karar Çalışmaları Ne Kadar Kararlı? [How Stable Are Decision Studies Performed in Generalizability Theory?]. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 37(2), 591-610.
  • Kan, A. (2005). Yazılı yoklamaların puanlanmasında puanlama cetveli ve yanıt anahtarı kullanımının (aynı) puanlayıcı güvenirliğine etkisi. Eğitim Araştırmaları Dergisi, 5(20), 166-177.
  • Kerlinger, F. N. (1992). Foundations of Behavioral Research. New York: Harcourt Brace College Publishers.
  • Kim, Y.K. (2009). Combining constructed response items and multiple-choice items using a hierarchical rater model (PhD Thesis). Teachers College, Columbia University.
  • Klenowski, V., & Wyatt-Smith, C. (2013). Assessment for Education: Standards, Judgement and Moderation.
  • Klenowski, V., & Wyatt-Smith, C. (2010). Standards, Teacher Judgement and Moderation in the Contexts of National Curriculum and Assessment Reform. Assessment Matters, 2, 107-131.
  • Lane, S., & Sabers, D. (1989) Use of Generalizability Theory for Estimating the Dependability of a Scoring System for Sample Essays. Applied Measurement in Education, 2(3), 195-205.
  • Lim, G. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28, 543- 560.
  • London, M., & Wohlers, A. J. (1991). Agreement between subordinate and self-ratings in upward feedback. Personnel Psychology, 44(2), 375–390.
  • McNamara, T. F. (1996). Measuring second language performance. London: Longman
  • Maxwell, G., & Gipps, C. (1996). Teacher Assessments of Performance Standards: A Cross-National Study of Teacher Judgements of Student Achievement in the Context of National Assessment Schemes. Application for funding to the ARC: Interdisciplinary and International Research.
  • Malone, L., Long, K., & De Lucchi, L. (2004). All things in moderation. Science and Children, 41(5), 30-34.
  • Maxwell, G.S. (2007). Implications for moderation of proposed changes to senior secondary school syllabuses. Brisbane: Queensland Studies Authority.
  • Meister, D. (1985). Behavioral Analysis and Measurement Methods. Publisher: Wiley-Interscience.
  • Moskal, Barbara M., & Leydens, J.A. (2000). Scoring rubric development: validity and reliability. Practical Assessment, Research & Evaluation, 7(10), 1-6.
  • Nalbantoğlu Yılmaz, F., Başusta, B. (2015). Genellenebilirlik Kuramıyla Dikiş Atma ve Alma Becerileri İstasyonu Güvenirliğinin Değerlendirilmesi [Evaluation of Stitching and Removal Skills Station Reliability with Generalizability Theory]. Journal of Measurement and Evaluation in Education and Psychology, 6(1), 107-116.
  • Özçelik, D. A. (1992). Ölçme ve Değerlendirme [Measurement and Evaluation], Ankara, ÖSYM Yayınları. No:2.
  • Page, T. J., & Iwata, B. A. (1986). Interobserver agreement: History, theory and current methods. In A. Poling & R. W. Fuqua (Eds.), Research methods in applied behavior analysis: Issues and advances (pp. 99– 126). New York: Plenum.
  • Reiner, C. M., Bothell, T. W., Sudweeks, R. R., & Wood, B. (2002). Preparing effective essay questions: A self-directed workbook for educators: New Forums Press.
  • Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94(1), 31-37.
  • Sadler, D. (1998). Formative Assessment: revisiting the territory. Assessment in Education: Principles, Policy & Practice, 5, 77-84.
  • Shavelson, R. J., Yin, Y., Furtak, E. M., Ruiz-Primo, M. A., & Ayala, C. C. (2008). On the role and impact of formative assessment on science inquiry teaching and learning. In J. Coffey, R. Douglas & C. Stearns (Eds.), Assessing science learning: Perspectives from research and practice. Arlington, VA: NSTA Press.
  • Shermis, M. D., & Di Vesta, F. J. (2011). Classroom assessment in action. Lanham, MD: Rowman & Littlefied.
  • Smaill, E. (2020). Using involvement in moderation to strengthen teachers’ assessment for learning capability. Assessment in Education: Principles, Policy & Practice, DOI: 10.1080/0969594X.2020.1777087.
  • Smaill, E. (2018). Social moderation: Assessment for teacher professional learning. Doctoral thesis, University of Otago. https://ourarchive.otago.ac.nz/handle/10523/7850.
  • Spiller, D. (2012). Assessment Matters: Self-assessment and peer assessment. Teaching Development Unit, University of Waikato, New Zealand.
  • Stecher, B. (2010). Performance assessment in an era of standards-based educational accountability. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.
  • Stecker, P. M., & Fuchs, L. S. (2000). Effecting superior achievement using curriculum-based measurement: The importance of individual progress monitoring. Learning Disabilities Research and Practice, 15, 128–134.
  • Strachan, J. (2002). Assessment in change: Some reflections on the local and international background to the National Certificate of Educational Achievement (NCEA). New Zealand Annual Review of Education, 11, 245- 258.
  • Swartz, C. W., Hooper, S. R., Montgomery, J. W., Wakely, M. B., de Kruif, R. E. L., Reed, M., Brown, T. T., Levine, M. D., & White, K. P. (1999). Using generalizability theory to estimate the reliability of writing scores derived from holistic and analytical scoring methods. Education and Psychological Measurement, 59, 492–506.
  • Takunyacı, M. (2016). Çoktan seçmeli sorulara dayalı olmayan bir kitle matematik sınavı sürecinin değerlendirilmesi: Grup uyumu değerlendirme modeli [Evaluation of a mass mathematics exam process not based on multiple choice questions: Group cohesion assessment model]. Unpublished Doctoral Dissertation, Marmara University, Institute of Educational Sciences.
  • Tekin, H. (2000). Eğitimde ölçme ve değerlendirme [Measurement and evaluation in education] (14th Ed.). Yargı Yayınları, Ankara.
  • Thurber, R. S., Shinn, M. R., & Smolkowski, K. (2002). What is measured in mathematics tests? Construct validity of curriculum-based mathematics measures. School Psychology Review, 31(4), 498–513.
  • Tsui, A. S., & Ohlott, P. (1988). Multiple assessment of managerial effectiveness: Interrater agreement and consensus in effectiveness models. Personnel Psychology, 41(4), 779-803. Retrieved from Google Scholar.
  • Turgut, M.F. (1992). Eğitimde ölçme ve değerlendirme [Measurement and evaluation in education] (9th ed.). Ankara: Saydam Matbaacılık.
  • Weigle, S. C. (1994). Effects of training on raters of ESL compositions. Language Testing, 11(2), 197-223.
  • Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263– 287.
  • Wheadon, C., Barmby, P., Christodoulou, D., & Henderson, B. (2019). A comparative judgement approach to the large-scale assessment of primary writing in England. Assessment in Education: Principles, Policy & Practice, doi: 10.1080/0969594X.2019.1700212.
  • Wilson, M. (2004). Assessment, accountability, and the classroom: A community of judgment. In M. Wilson (Ed.), Toward coherence between classroom assessment and accountability. 103rd yearbook of the National Society for the Study of Education. Chicago, IL: The University of Chicago Press.
  • Wohlers, A. J., & London, M. (1989). Ratings of managerial characteristics: Evaluation difficulty, co-worker agreement, and self-awareness. Personnel Psychology, 42(2), 235–261.
  • van Daal, T., Lesterhuis, M., Coertjens, L., Donche, V., & De Maeyer, S. (2019). Validity of comparative judgement to assess academic writing: Examined implications of its holistic character and building on a shared consensus. Assessment in Education: Principles, Policy & Practice, 26(1), 59–74.
  • Verhavert, S., Bouwer, R., Donche, V., & De Maeyer, S. (2019). A meta-analysis on the reliability of comparative judgement. Assessment in Education: Principles, Policy & Practice. doi:10.1080/ 0969594X.2019.1602027.
APA TAKUNYACI M, Aydin E (2021). Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*. , 357 - 378. 10.19126/suje.963046
Chicago TAKUNYACI Mithat,Aydin Emin Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*. (2021): 357 - 378. 10.19126/suje.963046
MLA TAKUNYACI Mithat,Aydin Emin Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*. , 2021, ss.357 - 378. 10.19126/suje.963046
AMA TAKUNYACI M,Aydin E Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*. . 2021; 357 - 378. 10.19126/suje.963046
Vancouver TAKUNYACI M,Aydin E Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*. . 2021; 357 - 378. 10.19126/suje.963046
IEEE TAKUNYACI M,Aydin E "Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*." , ss.357 - 378, 2021. 10.19126/suje.963046
ISNAD TAKUNYACI, Mithat - Aydin, Emin. "Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*". (2021), 357-378. https://doi.org/10.19126/suje.963046
APA TAKUNYACI M, Aydin E (2021). Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*. Sakarya University Journal of Education, 11(2), 357 - 378. 10.19126/suje.963046
Chicago TAKUNYACI Mithat,Aydin Emin Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*. Sakarya University Journal of Education 11, no.2 (2021): 357 - 378. 10.19126/suje.963046
MLA TAKUNYACI Mithat,Aydin Emin Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*. Sakarya University Journal of Education, vol.11, no.2, 2021, ss.357 - 378. 10.19126/suje.963046
AMA TAKUNYACI M,Aydin E Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*. Sakarya University Journal of Education. 2021; 11(2): 357 - 378. 10.19126/suje.963046
Vancouver TAKUNYACI M,Aydin E Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*. Sakarya University Journal of Education. 2021; 11(2): 357 - 378. 10.19126/suje.963046
IEEE TAKUNYACI M,Aydin E "Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*." Sakarya University Journal of Education, 11, ss.357 - 378, 2021. 10.19126/suje.963046
ISNAD TAKUNYACI, Mithat - Aydin, Emin. "Group Moderation Assessment Model: An Example of an Open-Ended Mathematics Exam*". Sakarya University Journal of Education 11/2 (2021), 357-378. https://doi.org/10.19126/suje.963046