Academic

Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education

arXiv:2603.20255v1 Announce Type: new Abstract: Speech-based AI educational applications have gained significant interest in recent years, particularly for children. However, children speech research remains limited due to the lack of publicly available datasets, especially for low-resource languages such as Arabic.This paper presents Abjad-Kids, an Arabic speech dataset designed for kindergarten and primary education, focusing on fundamental learning of alphabets, numbers, and colors. The dataset consists of 46397 audio samples collected from children aged 3 - 12 years, covering 141 classes. All samples were recorded under controlled specifications to ensure consistency in duration, sampling rate, and format. To address high intra-class similarity among Arabic phonemes and the limited samples per class, we propose a hierarchical audio classification based on CNN-LSTM architectures. Our proposed methodology decomposes alphabet recognition into a two-stage process: an initial grouping

arXiv:2603.20255v1 Announce Type: new Abstract: Speech-based AI educational applications have gained significant interest in recent years, particularly for children. However, children speech research remains limited due to the lack of publicly available datasets, especially for low-resource languages such as Arabic.This paper presents Abjad-Kids, an Arabic speech dataset designed for kindergarten and primary education, focusing on fundamental learning of alphabets, numbers, and colors. The dataset consists of 46397 audio samples collected from children aged 3 - 12 years, covering 141 classes. All samples were recorded under controlled specifications to ensure consistency in duration, sampling rate, and format. To address high intra-class similarity among Arabic phonemes and the limited samples per class, we propose a hierarchical audio classification based on CNN-LSTM architectures. Our proposed methodology decomposes alphabet recognition into a two-stage process: an initial grouping classification model followed by specialized classifiers for each group. Both strategies: static linguistic-based grouping and dynamic clustering-based grouping, were evaluated. Experimental results demonstrate that static linguistic-based grouping achieves superior performance. Comparisons between traditional machine learning with deep learning approaches, highlight the effectiveness of CNN-LSTM models combined with data augmentation. Despite achieving promising results, most of our experiments indicate a challenge with overfitting, which is likely due to the limited number of samples, even after data augmentation and model regularization. Thus, future work may focus on collecting additional data to address this issue. Abjad-Kids will be publicly available. We hope that Abjad-Kids enrich children representation in speech dataset, and be a good resource for future research in Arabic speech classification for kids.

Executive Summary

This article presents Abjad-Kids, a unique Arabic speech dataset designed for primary education, addressing the limited availability of children's speech datasets in low-resource languages. The dataset consists of 46,397 audio samples from children aged 3-12, covering 141 classes. To address the high intra-class similarity among Arabic phonemes and limited samples per class, the authors propose a hierarchical audio classification using CNN-LSTM architectures. Experimental results demonstrate the effectiveness of static linguistic-based grouping and CNN-LSTM models combined with data augmentation. However, the study notes a challenge with overfitting due to the limited number of samples. The authors hope that Abjad-Kids will enrich children's representation in speech datasets and serve as a valuable resource for future research in Arabic speech classification for kids.

Key Points

  • Abjad-Kids is a unique Arabic speech dataset designed for primary education, addressing the limited availability of children's speech datasets in low-resource languages.
  • The dataset consists of 46,397 audio samples from children aged 3-12, covering 141 classes.
  • The authors propose a hierarchical audio classification using CNN-LSTM architectures to address the high intra-class similarity among Arabic phonemes and limited samples per class.

Merits

Strength in Addressing Low-Resource Language Limitations

The Abjad-Kids dataset addresses a significant limitation in children's speech research by providing a comprehensive resource for low-resource languages, such as Arabic.

Effective Use of CNN-LSTM Architectures

The authors demonstrate the effectiveness of CNN-LSTM models combined with data augmentation in addressing the high intra-class similarity among Arabic phonemes and limited samples per class.

Demerits

Limited Number of Samples

The study notes a challenge with overfitting due to the limited number of samples, highlighting the need for further data collection to address this issue.

Expert Commentary

The Abjad-Kids dataset represents a significant contribution to the field of children's speech research, particularly in low-resource languages. The authors' use of CNN-LSTM architectures and data augmentation demonstrates a nuanced understanding of the challenges associated with children's speech classification. However, the limitations of the study, including the limited number of samples, highlight the need for further research and data collection to address these challenges. The implications of this study are far-reaching, with the potential to enrich children's representation in speech datasets and support the development of effective AI educational applications.

Recommendations

  • Future research should prioritize the collection of additional data to address the limitations of the Abjad-Kids dataset and support the development of more accurate AI educational applications.
  • Researchers should consider the use of other architectures and techniques, such as transfer learning and domain adaptation, to address the challenges associated with children's speech classification in low-resource languages.

Sources

Original: arXiv - cs.CL