Applications of mathematical techniques for characterization of Hindi language texts by considering the roman alphabet transforms of the texts
Vol 2, Issue 1, 2025
VIEWS - 37 (Abstract)
Abstract
The present paper is an attempt to describe the writing pattern of Hindi language texts with the help of mathematical techniques. The analyses of the selected texts have been done by the use of the roman alphabet transforms of the texts. An attempt has been made to characterize texts mathematically on the basis of the presence of different letters of alphabets and by means of quantification of the texts with the help of entropy of pattern of occurrence of letters. The characteristic curves have been formed depending on the presence of different letters in the corresponding roman text and the entropy of the pattern of occurrence of letters has also been calculated. The determined curve and the entropic extent have also been compared with the same type of curve and entropies for two texts in the English language. The work has significance in the process of language identification, as the determined curve and the specific entropic quantitative measure can be considered useful tools.
Keywords
Full Text:
PDFReferences
1. Chen M. A Guide: Text Analysis, Text Analytics & Text Mining. Available online: https://medium.com/data-science/a-guide-text-analysis-text-analytics-text-mining-f62df7b78747 (accessed on 20 October 2024).
2. Swati. Hands-on Hindi Text Analysis using Natural Language Processing (NLP). Available online: https://www.analyticsvidhya.com/blog/2021/10/hands-on-hindi-text-analysis-using-natural-language-processing-nlp/ (accessed on 20 October 2024).
3. Sahu B, Joshi BK. A Tool for Statistical Analysis of Alphabets and Words of Hindi. In: Kumar A, Paprzycki M, Gunjan V (editors). ICDSMLA 2019. Springer Singapore; 2020.
4. Lui M, Lau JH, Baldwin T. Automatic Detection and Language Identification of Multilingual Documents. Transactions of the Association for Computational Linguistics. 2014; 2: 27-40. doi: 10.1162/tacl_a_00163
5. Indhuja K, Indu MG, Sreejith C, Raj P. Text Based Language Identification System for Indian Languages Following Devanagiri Script. International journal of engineering research and technology. 2014.
6. Singh G, Sharma S, Kumar V, et al. Spoken Language Identification Using Deep Learning. Computational Intelligence and Neuroscience. 2021; 2021(1). doi: 10.1155/2021/5123671
7. Alashban AA, Qamhan MA, Meftah AH, et al. Spoken Language Identification System Using Convolutional Recurrent Neural Network. Applied Sciences. 2022; 12(18): 9181. doi: 10.3390/app12189181
8. Pande H, Dhami HS. Mathematical Modelling of Occurrence of Letters and Word’s Initials in Texts of Hindi Language. SKASE Journal of Theoretical Linguistics. 2010.
9. Pande H, Dhami HS. Analysis and Mathematical Modelling of the Pattern of Occurrence of VariousDevanāgariLetter Symbols according to the Phonological Inventory of Indic Script in Hindi Language. Journal of Quantitative Linguistics. 2014; 22(1): 22-43. doi: 10.1080/09296174.2014.974457
10. Pande H. Applications of Mathematical Techniques for the determination of the distinctive curves for Hindi language texts. In: Pant R, Pandey V, Pandey P (editors). Artificial intelligence: a modern approach in different fields. Laxmi Book Publication; 2024.
11. Sahu B, Joshi BK. Statistical Properties of Pure Hindi and Practical Hindi. International Journal of Computer Science and Information Security. 2021. doi: 10.5281/ZENODO.5674300
12. Bentz C, Alikaniotis D. The Word Entropy of Natural Languages. Available online: https://arxiv.org/abs/1606.06996#:~:text=The%20average%20uncertainty%20associated%20with,of%20quantitative%20and%20computational%20linguistics (accessed on 20 October 2024).
13. Bentz C, Alikaniotis D, Cysouw M, et al. The Entropy of Words—Learnability and Expressivity across More than 1000 Languages. Entropy. 2017; 19(6): 275. doi: 10.3390/e19060275
14. Arora A, Meister C, Cotterell R. Estimating the Entropy of Linguistic Distributions. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics; 2022. doi: 10.18653/v1/2022.acl-short.20
DOI: https://doi.org/10.24294/pnmai10126
Refbacks
- There are currently no refbacks.
License URL: https://creativecommons.org/licenses/by/4.0/
This site is licensed under a Creative Commons Attribution 4.0 International License.