Detection and Classification of Ideological Texts in the Kazakh Language Using Machine Learning and Transformers
Oglądaj/ Otwórz
Data
2025-12-30Autor
Bolatbek, Milana
Mussiraliyeva, Shynar
Baisylbayeva, Kymbat
Metadata
Pokaż pełny rekordStreszczenie
Modern information technologies enable the automatic analysis of textual data to detect extremist and propagandistic content. This paper examines deep learning methods and transformers models for the automatic classification of ideologically charged texts in the Kazakh language. A comparison was conducted between neural network models (CNN, BiLSTM, GRU, Hybrid CNN+BiLSTM) and modern transformers (DistilBERT). The performance evaluation of the models was based on accuracy, recall, precision, and F1-score metrics, as well as error analysis. Experimental results showed that hybrid CNN+BiLSTM demonstrated the highest accuracy (95.11%), outperforming other models. CNN, BiLSTM and GRU also achieved high results (92-93%), making them effective for this task. Among transformers, DistilBERT proved to be the most balanced (85.74%). This study demonstrates that hybrid neural network models (CNN+BiLSTM) are the most effective solution, while DistilBERT performs best among transformer models. The findings can be utilized for developing automatic monitoring and filtering systems for Kazakh-language texts, capable of efficiently identifying ideologically charged content.
Collections
