Named entity recognition in government domain: A systematic literature review

Tosan Wiar Ramdhani, Indra Budi, Betty Purwandari

Article ID: 9789
Vol 8, Issue 15, 2024

VIEWS - 164 (Abstract)

Abstract


Named Entity Recognition (NER), a core task in Information Extraction (IE) alongside Relation Extraction (RE), identifies and extracts entities like place and person names in various domains. NER has improved business processes in both public and private sectors but remains underutilized in government institutions, especially in developing countries like Indonesia. This study examines which government fields have utilized NER over the past five years, evaluates system performance, identifies common methods, highlights countries with significant adoption, and outlines current challenges. Over 64 international studies from 15 countries were selected using PRISMA 2020 guidelines. The findings are synthesized into a preliminary ontology design for Government NER.


Keywords


named entity recognition; information extraction; machine learning; deep learning; government domain; natural language processing

Full Text:

PDF


References


Agarwal, A., Toshniwal, D. (2019). Face off: Travel Habits, Road Conditions, and Traffic City Characteristics Bared Using Twitter. IEEE Access. https://ieeexplore.ieee.org/stamp/stamp.jsp? arnumber=8715356

Agarwal, I.Y, Rana, D.P, Shaikh,M, Poudel,S. (2020). “Spatio-temporal approach for classification of COVID-19 pandemic fake news,” Social Network Analysis and Mining, Springer Vol 12 No 1, 2022Jayakumar, H, Krishnakumar, M.S, Peddagopu, V.V.V, Sridhar, R. RNN based question answer generation and ranking for financial documents using financial NER. Sadhana - Academy Proceedings in Engineering Sciences. Springer. https://link.springer.com/article/10.1007/s12046-020-01501-3

Al-Laith, A, Shahbaz, M. (2021). Tracking sentiment towards news entities from Arabic news on social media. Future Generation Computer Systems. Elsevier B.V. https://doi.org/10.1016/j.future.2021.01.015

Aria, M, Cuccurullo, C. (2017) Bibliometrix: An R-tool for comprehensive science mapping analysis. Elsevier https://doi.org/10.1016/j.joi.2017.08.007

Azzouzi, M.E, Coatrieux, G, Bellafqira, R, Delamarre, D, Riou, C, Oubenali, N, Cabon, S, Cuggia, M, Bouzillé, G (2024), Automatic de-identification of French electronic health records: a cost-effective approach exploiting distant supervision and deep learning models. BMC Medical Informatics and Decision Making. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85185310221&doi=10.1186

Bach,N.X., Thuy,N,T,T., Chien,D,B., Duy,T,K., Hien,T,M., Phuong,T,M. (2019). Reference Extraction from Vietnamese Legal Documents. The Tenth International Symposium on Information and Communication Technology. ACM. https://doi.org/10.1145/3368926.3369731

Bajaj, G, Kursuncu, U, Gaur, M, Lokala, U, Hyder, A, Parthasarathy, S, Sheth, A. (2022). Knowledge-Driven Drug-Use NamedEntity Recognition with Distant Supervision. Studies in Health Technology and Informatics. IOS Press BV. https://doi.org/10.3233/shti220048

Barachi, M.E., Mathew, S.S., Alkhatib, M. (2022). Combining Named Entity Recognition and Emotion Analysis of Tweets for Early Warning of Violent Actions. 2022 7th International Conference on Smart and Sustainable Technologies, SpliTech 2022. https://ieeexplore.ieee.org/document/9854231

Begen, P.N and Vidisaova, L. (2022). Development of an algorithm for fixing the citizens’ assessments of digital transformation processes based on text analysis. ACM International Conference Proceeding Series. https://10.1145/3560107.3560203

Bouabdallaoui, I, Guerouate, F, Bouhaddour, S, Saadi, C, Sbihi, M. (2022). Named Entity Recognition applied on Moroccan tourism corpus. Procedia Computer Science. Elsevier B.V. https://doi.org/10.1016/j.procs.2021.12.256

Cardenas, P., Obara, B., Theodoropoulos, G., Kureshi, I. (2019). Defining an Alert Mechanism for Detecting Likely Threats to National Security. Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. https://ieeexplore.ieee.org/document/8622569

Cenikj, G,Vitanova, B. Eftimov, T. (2021). Skills Named-Entity Recognition for Creating a Skill Inventory of Today’s Workplace. Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021. https://ieeexplore.ieee.org/document/9671435

Chen, J, Huang, Y, Yang, F, Li, C. (2020). A novel named entity recognition approach of judicial case texts based on BiLSTM-CRF. 12th International Conference on Advanced Computational Intelligence, ICACI 2020. IEEE. https://ieeexplore.ieee.org/document/9177731

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. (2019). NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, pp. 4171-4186. https://doi.org/10.48550/arXiv.1810.04805

Dogra, V, Singh, A, Verma, S, Alharbi, A, Alosaimi, W. (2021). Event study: Advanced machine learning and statistical technique for analyzing sustainability in banking stocks. MDPI. https://doi.org/10.3390/math9243319

Eva, M.B, Weber, N. (2022). Councils in Action: Automating the Curation of Municipal Governance Data for Research. Proceedings of the Association for Information Science and Technology. John Wiley and Sons Inc. https://doi.org/10.1002/pra2.601

Gangadharan, V, Gupta, D. (2020). Recognizing Named Entities in Agriculture Documents using LDA based Topic Modelling Techniques. Procedia Computer Science. Elsevier B.V. https://doi.org/10.1016/j.procs.2020.04.143

Garat, D, and Wonsever, D. (2022). Automatic Curation of Court Documents: Anonymizing Personal Data. Information (Switzerland). MDPI. https://doi.org/10.3390/info13010027

Gasmi, H., Laval, J., Bouras, A. (2019). Information extraction of cybersecurity concepts: An LSTM approach. Applied Sciences (Switzerland) MDPI, 9 (19), art. no. 3945. https://doi.org/10.3390/app9193945

Grishman, R and Sundheim, B. (1995). Design of the MUC-6 Evaluation. In Sixth Message Understanding Conference (MUC-6): Proceedings of a Conference Held in Columbia, Maryland, November 6-8, 1995. https://doi.org/10.3115/1072399.1072401

Han, X, Wang, J (2019). Earthquake Information Extraction and Comparison from Different Sources Based on Web Text. Geo-Information. MDPI. https://doi.org/10.3390/ijgi8060252

He, S, Yang, H, Zheng, X, Wang, B, Zhou, Y, Xiong, Y, Zeng, D. (2019). Massive meme identification and popularity analysis in geopolitics. 2019 IEEE International Conference on Intelligence and Security Informatics, ISI 2019. IEEE. https://doi.org/10.1109/ISI.2019.8823294

Ivanin, V, Artemova, E, Batura, T, Ivanov, V, Sarkisyan, V, Tutubalina, E, Smurov, I. (2021). RuREBus: A Case Study of Joint Nameds Entity Recognition and Relation Extraction from E-Government Domain. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. Springer Science and Business Media Deutschland GmbH. https://doi.org/10.48550/arXiv.2010.15939

Kulkarni, P., Deshmukh, V., & Rane, K. (2023). A framework for providing structured invoice document using optimized Bert enabled deep convolutional neural network classifier. 2023 Proceedings of the 7th International Conference on I-SMAC. IEEE. https://doi.org/10.1109/I-SMAC58438.2023.10290498

Lane, H., Nelson, C., & Sorgente, T. (2018). Named Entity Recognition with Python. O’Reilly Media, Inc. https://www.oreilly.com/library/view/natural-language-processing/9781787285101/ ch03s02.html

Li, J., Sun, A., Han, J., Li, C. (2022). A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34 (1), pp. 50-70. https://doi.org/10.1109/TKDE.2020.2981314

Liu, Q, Wang, D, Zhou, M, Li, P, Qi, B, Wang, B. (2018). Chinese Governmental Named Entity Recognition. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. Springer Verlag. http://dx.doi.org/10.1007/978-3-030-03520-4_2

Lutfi, S, Yasin, R, Barachi, M.E, Oroumchian, F, Imene, A, Samuel Mathew, S. (2021). Temporal behavioral analysis of extremists on social media: A machine learning based approach. 2021 6th International Conference on Smart and Sustainable Technologies, SpliTech 2021. IEEE. https://doi.org/10.23919/SpliTech52315.2021.9566446

Mansouri, A, Affendey, L.S, Mamat, A. (2008), Named Entity Recognition Approaches. Journal of Computer Science, vol. 8, no.2, pp. 339–344, 2008. https://www.researchgate.net/publication/ 238607553_Named_Entity_Recognition_Approaches

Martinez-Seis, B., Pichardo-Lagunas, L., Koff, H., Equihua, H., Perez-Maqueo, O., Hernandez-Huerta, A. (2022). Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws. Data, MDPI. https://doi.org/10.3390/data7070091

Maurel, D, Morale, E, Thouvenin, N, Ringot, P, Turri, A. (2019). Istex: A database of twenty million scientific papers with a mining tool which uses named entities. Information (Switzerland). MDPI. https://doi.org/10.3390/info10050178

Nemes, L., Kiss, A. (2021). Information extraction and named entity recognition supported social media sentiment analysis during the COVID-19 pandemic. Applied Sciences (Switzerland), 11 (22), art. no. 11017. https://doi.org/10.3390/app112211017

Niu, H, Zeng, Z. (2018), A New Efficiency Approach for Chinese Litigants Extraction. Procedia Computer Science. Elsevier B.V. https://doi.org/10.1016/j.procs.2018.03.049

Park, J. S, Kim, G.W, Lee, D.H. (2020). Sensitive Data Identification in Structured Data through GenNER Model based on Text Generation and NER. ACM International Conference Proceeding Series. ACM. https://doi.org/10.1145/3398329.3398335

Pena, P, Aznar, R, Montanes, R, Del Hoyo, R. (2018). Open Data for Public Administration: Exploitation and semantic organization of institutional web content. Procesamiento del Lenguaje Natural. Sociedad Espanola para el Procesamiento del Lenguaje Natural. https://rua.ua.es/dspace/bitstream/10045/81360/1/PLN_61_21.pdf

Pimpisal, Simud, A, Sanglerdsinlapachai, T, Surasvadi,N, Plangprasopchok,N, Anon. (2021). Named Entity Recognition of Thai Documents using CRF with a Simple Data Masking Technique. 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2021. https://doi.org/10.1109/iSAI-NLP54397.2021.9678156

PRISMA 2020. PRISMA Statement: Checklist. http://www.prisma-statement.org/PRISMAStatement/Checklist

Pugliese, D.P.L, Guerriero, F, Macrina, G, Messina, E. (2021). A Natural Language Processing Tool to Support the Electronic Invoicing Process in Italy. Proceedings of the 11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2021. https://doi.org/10.1109/IDAACS53288.2021.9660987

Ramdhani, T.W, and Budi, I, Purwandari, B. (2021). Optical Character Recognition Engines Performance Comparison in Information Extraction. International Journal of Advanced Computer Science and Applications. https://dx.doi.org/10.14569/IJACSA.2021.0120814

Sandescu, C, Dinisor, A, Vladescu, C.V, Grigorescu, O, Corlatescu, D, Dascalu, M, Rughinis, R. (2022). Extracting Exploits And Attack Vectors From Cybersecurity News Using NLP. UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science. Politechnica University of Bucharest. https://www.scientificbulletin.upb.ro/rev_docs_arhiva/ full90f_939135.pdf

Sensuse, D.I, Putro, P.A.W, Rachmawati, R., Sunindyo, W.D. (2022) Initial Cybersecurity Framework in the New Capital City of Indonesia: Factors, Objectives, and Technology. Information 2022, 13, 580. https://doi.org/10.3390/info13120580

Shen, Z, Spruit, M. (2021). Automatic extraction of adverse drug reactions from summary of product characteristics. Applied Sciences (Switzerland). MDPI. https://doi.org/10.3390/app11062663

Silvestri, S, Gargiulo, F, Ciampi, M. (2021). Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases. Applied Sciences (Switzerland). MDPI. https://doi.org/10.3390/app12125775

Street,M, Mestric,I,I., Ndoni,A., Lenk,P., Teufert,J., Figueiredo,N. (2022). Data Driven Decision Support during COVID. International Conference on Military Communications and Information Systems. Elsevier. https://doi.org/10.1016/j.procs.2022.09.013

Suat-Rojas, N, Gutierrez-Osorio, C, Pedraza, C. (2022). Extraction and Analysis of Social Networks Data to Detect Traffic Accidents. Information (Switzerland). MDPI. https://doi.org/10.3390/info13010026

Sufi, F.K. (2022). Identifying the drivers of negative news with sentiment, entity, and regression analysis. (2022) International Journal of Information Management Data Insights, 2 (1), art. no. 100074. https://doi.org/10.1016/j.jjimei.2022.100074

Sutphin, C, Lee, K, Yepes, A.J, Uzuner, Ö, McInnes, B.T. (2022). Adverse drug event detection using reason assignments in FDA drug labels. Journal of Biomedical Informatics. Academic Press Inc. https://doi.org/10.1016/j.jbi.2020.103552

Tan, F, Yang, S, Wu, X, Xu, J. (2020). Exploring the relation between biomedical entities and government funding. CEUR Workshop Proceedings. CEUR-WS. https://ceur-ws.org/Vol-2658/paper6.pdf

Toledo,C.V, Dijk,F.V., Spruit.M. (2020). Dutch Named Entity Recognition and De-identification Methods for the Human Resource Domain. International Journal on Natural Language Computing (IJNLC). https://doi.org/10.48550/arXiv.2106.02287

Wang, L, Li, S, Yan, Q, Zhou, G. (2018). Domain-specific named entity recognition with document-level optimization. ACM Transactions on Asian and Low-Resource Language Information Processing. ACM. https://doi.org/10.1145/3213544

Wang, Q, You, H. (2022). A Study on BNM-cBLSTM for Financial Sentiment Analysis in European Bond Markets Based on mpBC-ELMo. 2022 International Conference on Data Analytics, Computing and Artificial Intelligence (ICDACAI). https://doi.org/10.1109/ICDACAI57211.2022.00032

Wang, Y, Wang, Z, Li, H, Huang W. (2023). Named Entity Recognition in Threat Intelligence Domain Based on Deep Learning. Journal of Northeastern University. https://doi.org/10.12068/j.issn.1005-3026.2023.01.005

White, N.M. (2022). The Hmong Medical Corpus: a biomedical corpus for a minority language. Language Resources and Evaluation. Springer Science and Business Media B.V. https://link.springer.com/content/pdf/10.1007/s10579-022-09596-2.pdf

Wicaksono, A.T, Mariyah, S. (2019). Social Network Analysis of Health Development in Indonesia. ICICOS 2019 - 3rd International Conference on Informatics and Computational Sciences: Accelerating Informatics and Computational Research for Smarter Society in The Era of Industry 4.0, Proceedings. IEEE. https://doi.org/10.1109/ICICoS48119.2019.8982482

Xiong, Z. and Kong, D. and Xia, Z. and Xue, Y. and Song, Z. and Wang, P. (2021). Chinese Government Official Document Named Entity Recognition Based on Albert. 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2021. https://doi.org/10.1109/ICCCBDA51879.2021.9442540

Yaozu, Y, Jiangen, Z. (2020). Constructing government procurement knowledge graph based on crawler data. Journal of Physics: Conference Series. IOP Publishing Ltd. https://iopscience.iop.org/article/10.1088/1742-6596/1693/1/012032/pdf

Zacharis, A., Gavrila, R., Patsakis, C., & Ikonomou, D. (2023). AI-assisted cyber security exercise content generation: Modeling a cyber conflict. International Conference on Cyber Conflict (CYCON), 2023-May, 217–238. https://doi.org/10.23919/CyCon58705.2023.10181930

Zhang, T, Liu, M, Ma, C, Tu, Z, Wang, Z. (2021). A Text Mining based Method for Policy Recommendation. Proceedings - 2021 IEEE International Conference on Services Computing, SCC 2021. IEEE. https://doi.org/10.1109/SCC53864.2021.00036




DOI: https://doi.org/10.24294/jipd9789

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Tosan Wiar Ramdhani, Indra Budi, Betty Purwandari

License URL: https://creativecommons.org/licenses/by/4.0/

This site is licensed under a Creative Commons Attribution 4.0 International License.