Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder

Dişken, Gökay; Tufekci, Zekeriya

doi:10.18466/cbayarfbe.1132319

Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder

Gökay DİŞKEN, (Adana Alparslan Türkeş Bilim ve Teknoloji Üniversitesi, Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü, Adana, Türkiye)

Zekeriya TÜFEKÇİ (Çukurova Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Adana, Türkiye)

Celal Bayar Üniversitesi Fen Bilimleri Dergisi

0 1

Yıl: 2023 Cilt: 19 Sayı: 2 Sayfa Aralığı: 167 - 174 Metin Dili: İngilizce DOI: 10.18466/cbayarfbe.1132319 İndeks Tarihi: 11-07-2023

Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder

Öz:

Audio spoof detection gained attention of the researchers recently, as it is vital to detect spoofed speech for automatic speaker recognition systems. Publicly available datasets also accelerated the studies in this area. Many different features and classifiers have been proposed to overcome the spoofed speech detection problem, and some of them achieved considerably high performances. However, under additive noise, the spoof detection performance drops rapidly. On the other hand, number of studies about robust spoofed speech detection is very limited. The problem becomes more interesting as the conventional speech enhancement methods reportedly performed worse than no enhancement. In this work, i-vectors are used for spoof detection, and discriminative denoising autoencoder (DAE) network is used to obtain enhanced (clean) i-vectors from their noisy counterparts. Once the enhanced i-vectors are obtained, they can be treated as normal i-vectors and can be scored/classified without any modifications in the classifier part. Data from ASVspoof 2015 challenge is used with five different additive noise types, following a similar configuration of previous studies. The DAE is trained with a multicondition manner, using both clean and corrupted i-vectors. Three different noise types at various signal-to-noise ratios are used to create corrupted i-vectors, and two different noise types are used only in the test stage to simulate unknown noise conditions. Experimental results showed that the proposed DAE approach is more effective than the conventional speech enhancement methods.

Anahtar Kelime: Deep learning denoising autoencoder i-vector spoofing detection

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık

[1] Z. Wu, E. S. Chng, and H. Li, “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,” in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2012, vol. 2, pp. 1698–1701.
[2] A. Nautsch et al., “ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech,” IEEE Trans. Biometrics, Behav. Identity Sci., vol. 3, no. 2, pp. 252–265, Apr. 2021.
[3] H. Delgado et al., “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements,” in Odyssey 2018 The Speaker and Language Recognition Workshop, 2018, pp. 296–303.
[4] Z. Wu et al., “ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 588–604, Jun. 2017.
[5] M. Todisco, H. Delgado, and N. Evans, “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Comput. Speech Lang., vol. 45, pp. 516–535, Sep. 2017.
[6] J. Yang and L. Liu, “Playback speech detection based on magnitude-phase spectrum,” Electron. Lett., vol. 54, no. 14, pp. 901–903, Jul. 2018.
[7] A. T. Patil, H. A. Patil, and K. Khoria, “Effectiveness of energy separation-based instantaneous frequency estimation for cochlear cepstral features for synthetic and voice-converted spoofed speech detection,” Comput. Speech Lang., vol. 72, no. 1, p. 101301, Mar. 2022.
[8] J. Yang, R. K. Das, and N. Zhou, “Extraction of Octave Spectra Information for Spoofing Attack Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 2373–2384, Dec. 2019.
[9] C. Zhang, C. Yu, and J. H. L. Hansen, “An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 684–694, Jun. 2017.
[10] S. Scardapane, L. Stoffl, F. Rohrbein, and A. Uncini, “On the use of deep recurrent neural networks for detecting audio spoofing attacks,” Proc. Int. Jt. Conf. Neural Networks, vol. 2017-May, pp. 3483–3490, 2017.
[11] C. Hanilçi, T. Kinnunen, M. Sahidullah, and A. Sizov, “Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise,” Speech Commun., vol. 85, pp. 83–97, Dec. 2016.
[12] X. Tian, Z. Wu, X. Xiao, E. S. Chng, and H. Li, “An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions,” in INTERSPEECH 2016, 2016, pp. 1715–1719.
[13] A. Gómez Alanís, A. M. Peinado, J. A. Gonzalez, and A. Gomez, “A Deep Identity Representation for Noise Robust Spoofing Detection,” in Interspeech 2018, 2018, pp. 676–680.
[14] A. Gomez-Alanis, A. M. Peinado, J. A. Gonzalez, and A. M. Gomez, “A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 12, pp. 1985–1999, Dec. 2019.
[15] S. Mahto, H. Yamamoto, and T. Koshinaka, “i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition,” in Interspeech 2017, 2017, pp. 3722–3726.
[16] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-End Factor Analysis for Speaker Verification,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 19, no. 4, pp. 788–798, May 2011.
[17] W. Rao et al., “Neural networks based channel compensation for i-vector speaker verification,” in 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2016, pp. 1–5.
[18] H. Yamamoto and T. Koshinaka, “Denoising autoencoder-based speaker feature restoration for utterances of short duration,” in Interspeech 2015, 2015, pp. 1052–1056.
[19] A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, no. 3, pp. 247–251, Jul. 1993.
[20] D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. H. Rahman, and S. Sridharan, “The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition,” in Interspeech 2015, 2015, pp. 3456–3460.
[21] C. Zhang et al., “Joint information from nonlinear and linear features for spoofing detection: An i-vector/DNN based approach,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 5035–5039.
[22] A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel, “Joint Speaker Verification and Antispoofing in the i-Vector Space,” IEEE Trans. Inf. Forensics Secur., vol. 10, no. 4, pp. 821–832, Apr. 2015.
[23] D. Martinez, L. Burget, T. Stafylakis, Y. Lei, P. Kenny, and E. Lleida, “Unscented transform for ivector-based noisy speaker recognition,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4042–4046.
[24] D. Ribas and E. Vincent, “An Improved Uncertainty Propagation Method for Robust I-Vector Based Speaker Recognition,” in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6331–6335.
[25] W. Ben Kheder, D. Matrouf, M. Ajili, and J.-F. Bonastre, “A Unified Joint Model to Deal With Nuisance Variabilities in the i-Vector Space,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 3, pp. 633–645, Mar. 2018.
[26] W. Ben Kheder, D. Matrouf, J.-F. Bonastre, M. Ajili, and P.-M. Bousquet, “Additive noise compensation in the i-vector space for speaker recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 4190–4194.
[27] W. Ben Kheder, D. Matrouf, P.-M. Bousquet, J.-F. Bonastre, and M. Ajili, “Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition,” Comput. Speech Lang., vol. 45, pp. 104–122, Sep. 2017.
[28] G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, “Speaker adaptation of neural network acoustic models using i-vectors,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013, pp. 55–59.
[29] W. Wang, W. Song, C. Chen, Z. Zhang, and Y. Xin, “I-vector features and deep neural network modeling for language recognition,” Procedia Comput. Sci., vol. 147, pp. 36–43, 2019.
[30] Y. Qian, N. Chen, H. Dinkel, and Z. Wu, “Deep Feature Engineering for Noise Robust Spoofing Detection,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 10, pp. 1942–1955, Oct. 2017.
S. O. Sadjadi, M. Slaney, and L. Heck, “MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research,” IEEE Speech Lang. Process. Tech. Comm. Newsl., pp. 1–4, 2013.

APA	Dişken G, Tufekci Z (2023). Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. , 167 - 174. 10.18466/cbayarfbe.1132319
Chicago	Dişken Gökay,Tufekci Zekeriya Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. (2023): 167 - 174. 10.18466/cbayarfbe.1132319
MLA	Dişken Gökay,Tufekci Zekeriya Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. , 2023, ss.167 - 174. 10.18466/cbayarfbe.1132319
AMA	Dişken G,Tufekci Z Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. . 2023; 167 - 174. 10.18466/cbayarfbe.1132319
Vancouver	Dişken G,Tufekci Z Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. . 2023; 167 - 174. 10.18466/cbayarfbe.1132319
IEEE	Dişken G,Tufekci Z "Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder." , ss.167 - 174, 2023. 10.18466/cbayarfbe.1132319
ISNAD	Dişken, Gökay - Tufekci, Zekeriya. "Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder". (2023), 167-174. https://doi.org/10.18466/cbayarfbe.1132319

APA	Dişken G, Tufekci Z (2023). Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar Üniversitesi Fen Bilimleri Dergisi, 19(2), 167 - 174. 10.18466/cbayarfbe.1132319
Chicago	Dişken Gökay,Tufekci Zekeriya Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar Üniversitesi Fen Bilimleri Dergisi 19, no.2 (2023): 167 - 174. 10.18466/cbayarfbe.1132319
MLA	Dişken Gökay,Tufekci Zekeriya Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar Üniversitesi Fen Bilimleri Dergisi, vol.19, no.2, 2023, ss.167 - 174. 10.18466/cbayarfbe.1132319
AMA	Dişken G,Tufekci Z Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar Üniversitesi Fen Bilimleri Dergisi. 2023; 19(2): 167 - 174. 10.18466/cbayarfbe.1132319
Vancouver	Dişken G,Tufekci Z Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder. Celal Bayar Üniversitesi Fen Bilimleri Dergisi. 2023; 19(2): 167 - 174. 10.18466/cbayarfbe.1132319
IEEE	Dişken G,Tufekci Z "Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder." Celal Bayar Üniversitesi Fen Bilimleri Dergisi, 19, ss.167 - 174, 2023. 10.18466/cbayarfbe.1132319
ISNAD	Dişken, Gökay - Tufekci, Zekeriya. "Noise-Robust Spoofed Speech Detection Using Discriminative Autoencoder". Celal Bayar Üniversitesi Fen Bilimleri Dergisi 19/2 (2023), 167-174. https://doi.org/10.18466/cbayarfbe.1132319