The Challenges of Developing a Living Arabic Phonetic Dictionary for Speech Recognition System: A Literature Review

Mohammad Husam Alhumsi; Saleh Belhassen

doi:10.21467/ajss.8.1.164-170

Authors

Mohammad Husam Alhumsi Saudi Electronic University https://orcid.org/0000-0002-0189-4443
Saleh Belhassen Saudi Electronic University

DOI:

https://doi.org/10.21467/ajss.8.1.164-170

Abstract

Phonetic dictionaries are regarded as pivotal components of speech recognition systems. The function of speech recognition research is to generate a machine which will accurately identify and distinguish the normal human speech from any other speaker. Literature affirmed that Arabic phonetics is one of the major problems in Arabic speech recognition. Therefore, this paper reviews previous studies tackling the challenges faced by initiating an Arabic phonetic dictionary with respect to Arabic speech recognition. It has been found that the system of speech recognition investigated areas of differences concerning Arabic phonetics. In addition, an Arabic phonetic dictionary should be initiated where the Arabic vowels’ phonemes should be considered as a component of the consonants’ phonemes. Thus, the incorporation of developed machine translation systems may enhance the quality of the system. The current paper concludes with the existing challenges faced by Arabic phonetic dictionary.

Keywords:

Arabic Phonetic Dictionary, Speech Recognition System, Phonetics

Downloads

References

Abdou, S. M., Hamid, S. E., Rashwan, M., Samir, A., Abdel-Hamid, O., Shahin, M., & Nazih, W. (2006). Computer aided pronunciation learning system using speech recognition techniques. In Ninth International Conference on Spoken Language Processing.‏

Abushariah, M. A., Ainon, R. N., Zainuddin, R., Elshafei, M., & Khalifa, O. O. (2010, May). Natural speaker-independent Arabic speech recognition system based on Hidden Markov Models using Sphinx tools. In International Conference on Computer and Communication Engineering (ICCCE'10) (pp. 1-6). IEEE.‏

Abushariah, M. A. A., Ainon, R. N., Zainuddin, R., Elshafei, M., & Khalifa, O. O. (2012). Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. Int. Arab J. Inf. Technol., 9(1), 84-93.‏

Afify, M., Nguyen, L., Xiang, B., Abdou, S., & Makhoul, J. (2005). Recent progress in Arabic broadcast news transcription at BBN. In Ninth European conference on speech communication and technology.

Alghmadi, M. (2003). KACST arabic phonetic database. In the Fifteenth International Congress of Phonetics Science, Barcelona (pp. 3109-3112).

‏Alghamdi, M., Almuhtasib, H., & Elshafei, M. (2004). Arabic phonological rules. King Saud University Journal: Computer Sciences and Information, 16, 1-25.‏

Ali, M., Elshafei, M., Al-Ghamdi, M., Al-Muhtaseb, H., & Al-Najjar, A. (2008, December). Generation of Arabic phonetic dictionaries for speech recognition. In 2008 International Conference on Innovations in Information Technology (pp. 59-63). IEEE.

Ali, M., Elshafei, M., Al-Ghamdi, M., Al-Muhtaseb, H., & Al-Najjar, A. (2009). Arabic phonetic dictionaries for speech recognition. Journal of Information Technology Research (JITR), 2(4), 67-80.‏

Al-Otaibi, F. (2001) speaker-dependant continuous Arabic speech recognition. M.Sc. thesis, King Saud University.

Alsharhan, E., & Ramsay, A. (2019). Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions. Information Processing & Management, 56(2), 343-353.‏

Alsharhan, E., & Ramsay, A. (2020). Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition. Language Resources and Evaluation, 54(4), 975-998.‏

Azmi, M., Tolba, H., Mahdy, S., & Fashal, M. (2008). Syllable-based automatic arabic speech recognition in noisy-telephone channel. WSEAS Transactions on Signal Processing, 4(4), 211-220.‏

Bahi, H., & Sellami, M. (2001, June). Combination of vector quantization and hidden Markov models for Arabic speech recognition. In Proceedings ACS/IEEE International Conference on Computer Systems and Applications (pp. 96-100). IEEE.‏

Bahi, H., & Sellami, M. (2003, July). A hybrid approach for Arabic speech recognition. In ACS/IEEE International Conference on Computer Systems and Applications, 2003. Book of Abstracts. (p. 107). IEEE.‏

Billa, J., Noamany, M., Srivastava, A., Liu, D., Stone, R., Xu, J., & Kubala, F. (2002, May). Audio indexing of Arabic broadcast news. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. I-5). IEEE.‏

Choi, F., Tsakalidis, S., Saleem, S., Kao, C. L., Meermeier, R., Krstovski, K., & Natarajan, P. (2008, December). Recent improvements in BBN's English/Iraqi speech-to-speech translation system. In 2008 IEEE Spoken Language Technology Workshop (pp. 245-248). IEEE.‏

El-Henawy, I. M., Khedr, W. I., ELkomy, O. M., & Abdalla, A. Z. M. (2014). Recognition of phonetic Arabic figures via wavelet based Mel Frequency Cepstrum using HMMs. HBRC Journal, 10(1), 49-54.‏

Elmahdy, M., Gruhn, R., Minker, W., & Abdennadher, S. (2009, October). Modern standard Arabic based multilingual approach for dialectal Arabic speech recognition. In 2009 Eighth International Symposium on Natural Language Processing (pp. 169-174). IEEE.‏

Elmisery, F. A., Khalil, A. H., Salama, A. E., & Hammed, H. F. (2003, December). A FPGA-based HMM for a discrete Arabic speech recognition system. In Proceedings of the 12th IEEE International Conference on Fuzzy Systems (Cat. No. 03CH37442) (pp. 322-325). IEEE.‏

Elshafei, M. (1991). Toward an Arabic text-to-speech system. The Arabian Journal for Science and Engineering, 16(4B), 565-583.‏

Elshafei, M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2002). Techniques for high quality Arabic speech synthesis. Information sciences, 140(3-4), 255-267.‏

Elshafei, M., Al-Muhtaseb, H., & Alghamdi, M. (2006). Statistical methods for automatic diacritization of Arabic text. In The Saudi 18th National Computer Conference. Riyadh (Vol. 18, pp. 301-306).‏

Elshafei, M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2008, December). Speaker-independent natural Arabic speech recognition system. In The International Conference on Intelligent Systems ICIS, Bahrain.

Gales, M. J., Diehl, F., Raut, C. K., Tomalin, M., Woodland, P. C., & Yu, K. (2007, December). Development of a phonetic system for large vocabulary Arabic speech recognition. In 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) (pp. 24-29). IEEE.‏

Hiyassat, H. (2007). Automatic Pronunciation Dictionary Toolkit for Arabic Speech Recognition Using SPHINX Engine (Doctoral dissertation, Department of Computer Information system, Arab Academy for Banking and Financial Sciences).‏

Hyassat, H., & Zitar, R. A. (2008). Arabic speech recognition using SPHINX engine. International Journal of Speech Technology, 9(3), 133-150.‏

Khasawneh, M., Assaleh, K., Sweidan, W., & Haddad, M. (2004, July). The application of polynomial discriminant function classifiers to isolated Arabic speech recognition. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541) (Vol. 4, pp. 3077-3081). IEEE.‏

Labidi, M., Maraoui, M., & Zrigui, M. (2016, September). New birth of the Arabic phonetic dictionary. In 2016 International Conference on Engineering & MIS (ICEMIS) (pp. 1-9). IEEE.‏

Lamel, L., Messaoudi, A., & Gauvain, J. L. (2009). Automatic speech-to-text transcription in Arabic. ACM Transactions on Asian Language Information Processing (TALIP), 8(4), 1-18.‏

Masmoudi, A., Bougares, F., Ellouze, M., Estève, Y., & Belguith, L. (2018). Automatic speech recognition system for Tunisian dialect. Language Resources and Evaluation, 52(1), 249-267.‏

Masmoudi, A., Estève, Y., Khmekhem, M. E., Bougares, F., & Belguith, L. H. (2014). Phonetic tool for the Tunisian Arabic. In: The 4th International Workshop on Spoken Language Technologies for Under-Resourced Languages.‏

Mokhtar, M. A., & El-Abddin, A. Z. (1996, October). A model for the acoustic phonetic structure of Arabic language using a single ergodic hidden Markov model. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP'96 (Vol. 1, pp. 330-333). IEEE.‏

Muhammad, G., AlMalki, K., Mesallam, T., Farahat, M., & Alsulaiman, M. (2011, March). Automatic Arabic digit speech recognition and formant analysis for voicing disordered people. In 2011 IEEE Symposium on Computers & Informatics (pp. 699-702). IEEE.‏

Nofal, M., Reheem, E. A., El Henawy, H., & Kader, N. A. (2004, September). The development of acoustic models for command and control Arabic speech recognition system. In International Conference on Electrical, Electronic and Computer Engineering, 2004. ICEEC'04. (pp. 702-705). IEEE.

Park, J., Diehl, F., Gales, M. J. F., Tomalin, M., & Woodland, P. C. (2009, April). Training and adapting MLP features for Arabic speech recognition. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4461-4464). IEEE.‏

Rambow, O., Chiang, D., Diab, M., Habash, N., Hwa, R., Sima’an, K., ... & Shareef, S. (2005). Parsing Arabic Dialects. Final Report. In 2005 JHU Summer Workshop.‏

Sagheer, A., Tsuruta, N., Taniguchi, R. I., & Maeda, S. (2005, December). Hyper column model vs. fast DCT for feature extraction in visual Arabic speech recognition. In Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005. (pp. 761-766). IEEE.‏

Satori, H., Harti, M., & Chenfour, N. (2007). Introduction to Arabic speech recognition using CMUSphinx system. arXiv preprint arXiv:0704.2083.‏

Shoaib, M., Rasheed, F., Akhtar, J., Awais, M., Masud, S., & Shamail, S. (2003, December). A novel approach to increase the robustness of speaker independent Arabic speech recognition. In 7th International Multi Topic Conference, 2003. INMIC 2003. (pp. 371-376). IEEE.‏

Soltau, H., Saon, G., Kingsbury, B., Kuo, J., Mangu, L., Povey, D., & Zweig, G. (2007, April). The IBM 2006 Gale arabic ASR system. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 4, pp. IV-349). IEEE.‏

Taha, M., Helmy, T., & Alez, R. A. (2007, November). Multi-agent based Arabic speech recognition. In 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Workshops (pp. 433-436). IEEE.‏

Young, S. (1996). A review of large-vocabulary continuous-speech. IEEE signal processing magazine, 13(5), 45.‏

Zarnoufi, R., Jaafar, H., Bachri, W., & Abik, M. (2020, December). MANorm: A Normalization Dictionary for Moroccan Arabic Dialect Written in Latin Script. In Proceedings of the Fifth Arabic Natural Language Processing Workshop (pp. 155-166).‏