Yıl: 2022 Cilt: 24 Sayı: 71 Sayfa Aralığı: 425 - 436 Metin Dili: İngilizce DOI: 10.21205/deufmd.2022247109 İndeks Tarihi: 10-08-2022

Gender Bias in Occupation Classification from the New York Times Obituaries

Öz:
Technological developments such as artificial intelligence can strengthen social prejudices prevailing in society, regardless of the developer's intention. Therefore, researchers should be aware of the ethical issues that may arise from a developed product/solution. In this study, we investigate the effect of gender bias on occupational classification. For this purpose, a new dataset was created by collecting obituaries from the New York Times website and is provided in two different versions: With and without gender indicators. Category distributions from this dataset show that gender and occupation variables have dependence. Thus, gender affects occupation classification. To test the effect, we perform occupation classification using SVM (Support Vector Machine), HAN (Hierarchical Attention Network), and DistilBERT-based classifiers. Moreover, to get further insights into the relationship of gender and occupation in classification problems, a multi-tasking model in which occupation and gender are learned together is evaluated. Experimental results reveal that there is a gender bias in job classification. Keywords: Gender Bias, Occupation Classification, Multi-task Learning, Obituaries.
Anahtar Kelime:

New York Times Anma Yazılarından Meslek Sınıflandırmasında Cinsiyet Yanlılığı

Öz:
Yapay zeka gibi teknolojik yenilikler, geliştiricilerin niyetlerinden bağımsız olarak toplumda mevcut olan ön yargıyı arttırabilirler. Bu sebeple, araştırmacılar geliştirilen bir ürün/çözüm ile birlikte gelebilecek etik sorunların farkında olmalıdırlar. Bu çalışmada, sosyal ön yargılardan biri olan cinsiyet yanlılığının meslek sınıflandırması üzerindeki etkisi araştırılmaktadır. Bunun için New York Times web sitesinden anma yazıları toplanarak yeni bir veri kümesi oluşturulmuş ve bu anma yazıları cinsiyet göstergeleri dahil ve hariç olmak üzere iki farklı versiyonuyla sunulmuştur. Bu veri kümesindeki sınıf dağılışları incelendiğinde cinsiyet ve meslek değişkenleri arasında bir bağımlılık ilişkisi görülmektedir. Dolayısıyla cinsiyet göstergelerinin meslek tahmini üzerinde bir etkisi olması beklenmektedir. Bu etkiyi sınamak üzere, SVM (Karar Destek Makineleri), HAN (Hiyerarşik İlgi Ağı) ve DistilBERT algoritmaları kullanılarak meslek sınıflandırması yapılmıştır. Sadece meslek sınıflandırması yapan bu modellerin yanında meslek ve cinsiyetin eş zamanlı öğrenildiği bir model de değerlendirilmiştir. Deneysel sonuçlar, meslek tahmininde cinsiyet yanlılığının etkili olduğunu ortaya koymaktadır. Anahtar Kelimeler: Cinsiyet Yanlılığı, Meslek Sınıflandırması, Çok Görevli Öğrenme, Anma Yazıları
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] Thelwall, M. 2018. Gender Bias in Sentiment Analysis: Online Information Review. Vol. 42, p. 7, DOI: https://doi.org/10.1108/OIR05-2017-0139
  • [2] Bölükbaşı, T., Chang, K.-W., Zou, J., Saligrama, V., Kalai, A. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embedding. Proceedings of the 30th International Conference on Neural Information Processing Systems, 5-10 December, Barcelona, Spain, 4356–4364.
  • [3] Caliskan, A., Bryson, J. J., Narayanan, A. 2017. Semantics derived automatically from language corpora contain human-like biases: Science, Vol. 356, p. 183-186, DOI: 10.1126/science.aal4230
  • [4] Garg, N., Schiebinger, L., Jurafsky, D., Zou, J. 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes: Proceedings of the National Academy of Sciences, Vol. 115, p. E3635-E3644. DOI: 10.1073/pnas.1720347115
  • [5] Buolamwini, J. and Gebru, T. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 23-24 February, New York, USA, 77-91.
  • [6] Caplar, N., Tacchella, S., Birrer, S. 2017. Quantitative Evaluation of Gender Bias in Astronomical Publications from Citation Counts: Nature Astronomy, Vol. 1, p. 8, DOI: https://doi.org/10.1038/s41550-017-0141
  • [7] Fu, L., Danescu-Niculescu-Mizil, C., Lee, L. 2016. Tie Breaker: Using Language Models to Quantify Gender Bias in Sports Journalism. Proceedings of IJCAI workshop on NLP meets Journalism, 10 July, New York, USA.
  • [8] De-Artega, M., Romanov, A., Wallach, H., Chayes, J., Borgs, C., Couldechova, A., Geyik, S., Kenthapadi, K., Kalai, A. T. 2019. Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Settings. ACM Conference on Fairness, Accountability, and Transparency, 29- 31 January, New York, USA, 120-128.
  • [9] Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L. 2018. Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2018), 1-6 June, New Orleans, USA, 2227-2237. DOI: 10.18653/v1/N18-1202
  • [10] Devlin, J., Chang, M., Lee, K., Toutanova, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2019), 2-7 June, Minneapolis, USA, 4171-4186. DOI: 10.18653/v1/N19-1423
  • [11] Basta, C., Costa-Jussa, M., Casas, N. 2019. Evaluating the Underlying Gender Bias in Contextualized Word Embeddings. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), 28 July-2 August, Florence, Italy, 33-39. DOI: 10.18653/v1/W19-3805
  • [12] Tan, Y.C., Celis, Y.E., 2019. Assessing Social and Intersectional Biases in Contextualized Word Representations. Advances in Neural Information Processing Systems (NEURIPS 2019), 8-14 December, Vancouver, Canada, 13209-13220.
  • [13] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D. 2020. Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems (NEURIPS 2020), 1877-1901.
  • [14] Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., Ureña-López, L.A. 2021. A Survey on Bias in Deep NLP, Appl. Sci, Vol. 11, p. 3184. DOI: 10.3390/ app11073184
  • [15] Webster, K., Wang, X., Tenney, I., Beutel, A., Pitler, E., Pavlick, E., Chen, J., Petrov, S. 2020. Measuring and Reducing Gendered Correlations in Pre-trained Models. https://arxiv.org/abs/2010.06032
  • [16] Romanov, A., De-Arteaga, M., Wallach, H., Chayes, J., Borgs, C., Chouldechova, A., Geyik, S., Kenthapadi, K., Rumshisky, A., Kalai, A. 2019. What's in a Name? Reducing Bias in Bios without Access to Protected Attributes. Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2019), 2-7 June, Minneapolis, USA, 4187-4195. DOI: 10.18653/v1/N19-1424
  • [17] Kiritchenko, S., Mohammad, S. 2018. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, June 5-6, New Orleans, USA, 43-53. DOI: 10.18653/v1/S18-2005
  • [18] Stanovsky, G., Smith, N.A., Zettlemoyer, L. 2019. Evaluating Gender Bias in Machine Translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), 28 July-2 August,Florence, Italy, 1679-1684. DOI: 10.18653/v1/P19-1164
  • [19] Prates, M.O.R., Avelar, P.H., Lamb, L.C. 2020. Assessing gender bias in machine translation: a case study with Google Translate, Neural Comput & Applic, Vol. 32, p. 6363-6381. DOI: 10.1007/s00521-019-04144-6
  • [20] Subramanian, S., Han, X., Baldwin, T., Cohn, T., Frermann, L. 2021. Evaluating Debiasing Techniques for Intersectional Biases. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), 2492-2498.
  • [21] Basu Roy Chowdhury, S., Ghosh, S., Li, Y., Oliva, J., Srivastava, S., Chaturvedi, S. 2021. Adversarial Scrubbing of Demographic Information for Text Classification. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), 550-562.
  • [22] Bureau of Labor Statistics. https://www.bls.gov/soc/2018. (Access Time: 20 September 2019).
  • [23] Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E. Hierarchical Attention Networks for Document Classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June, San Diego, USA, 1480-1489. DOI: 10.18653/v1/N16-1174
  • [24] Sanh, V., Debut, L., Chaumond, L., Wolf, T. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. http://arxiv.org/abs/1910.01108
  • [25] Ruder, S. 2017. An Overview of MultiTask Learning in Deep Neural Networks. https://ruder.io/multi-task/. (Access Date: December 2019).
APA Atik C, Tekir S (2022). Gender Bias in Occupation Classification from the New York Times Obituaries. , 425 - 436. 10.21205/deufmd.2022247109
Chicago Atik Ceren,Tekir Selma Gender Bias in Occupation Classification from the New York Times Obituaries. (2022): 425 - 436. 10.21205/deufmd.2022247109
MLA Atik Ceren,Tekir Selma Gender Bias in Occupation Classification from the New York Times Obituaries. , 2022, ss.425 - 436. 10.21205/deufmd.2022247109
AMA Atik C,Tekir S Gender Bias in Occupation Classification from the New York Times Obituaries. . 2022; 425 - 436. 10.21205/deufmd.2022247109
Vancouver Atik C,Tekir S Gender Bias in Occupation Classification from the New York Times Obituaries. . 2022; 425 - 436. 10.21205/deufmd.2022247109
IEEE Atik C,Tekir S "Gender Bias in Occupation Classification from the New York Times Obituaries." , ss.425 - 436, 2022. 10.21205/deufmd.2022247109
ISNAD Atik, Ceren - Tekir, Selma. "Gender Bias in Occupation Classification from the New York Times Obituaries". (2022), 425-436. https://doi.org/10.21205/deufmd.2022247109
APA Atik C, Tekir S (2022). Gender Bias in Occupation Classification from the New York Times Obituaries. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 24(71), 425 - 436. 10.21205/deufmd.2022247109
Chicago Atik Ceren,Tekir Selma Gender Bias in Occupation Classification from the New York Times Obituaries. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 24, no.71 (2022): 425 - 436. 10.21205/deufmd.2022247109
MLA Atik Ceren,Tekir Selma Gender Bias in Occupation Classification from the New York Times Obituaries. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, vol.24, no.71, 2022, ss.425 - 436. 10.21205/deufmd.2022247109
AMA Atik C,Tekir S Gender Bias in Occupation Classification from the New York Times Obituaries. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi. 2022; 24(71): 425 - 436. 10.21205/deufmd.2022247109
Vancouver Atik C,Tekir S Gender Bias in Occupation Classification from the New York Times Obituaries. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi. 2022; 24(71): 425 - 436. 10.21205/deufmd.2022247109
IEEE Atik C,Tekir S "Gender Bias in Occupation Classification from the New York Times Obituaries." Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 24, ss.425 - 436, 2022. 10.21205/deufmd.2022247109
ISNAD Atik, Ceren - Tekir, Selma. "Gender Bias in Occupation Classification from the New York Times Obituaries". Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 24/71 (2022), 425-436. https://doi.org/10.21205/deufmd.2022247109