Named entity recognition in government domain: A systematic literature review
Vol 8, Issue 15, 2024
VIEWS - 164 (Abstract)
Abstract
Named Entity Recognition (NER), a core task in Information Extraction (IE) alongside Relation Extraction (RE), identifies and extracts entities like place and person names in various domains. NER has improved business processes in both public and private sectors but remains underutilized in government institutions, especially in developing countries like Indonesia. This study examines which government fields have utilized NER over the past five years, evaluates system performance, identifies common methods, highlights countries with significant adoption, and outlines current challenges. Over 64 international studies from 15 countries were selected using PRISMA 2020 guidelines. The findings are synthesized into a preliminary ontology design for Government NER.
Keywords
Full Text:
PDFReferences
Agarwal, A., Toshniwal, D. (2019). Face off: Travel Habits, Road Conditions, and Traffic City Characteristics Bared Using Twitter. IEEE Access. https://ieeexplore.ieee.org/stamp/stamp.jsp? arnumber=8715356
Agarwal, I.Y, Rana, D.P, Shaikh,M, Poudel,S. (2020). “Spatio-temporal approach for classification of COVID-19 pandemic fake news,” Social Network Analysis and Mining, Springer Vol 12 No 1, 2022Jayakumar, H, Krishnakumar, M.S, Peddagopu, V.V.V, Sridhar, R. RNN based question answer generation and ranking for financial documents using financial NER. Sadhana - Academy Proceedings in Engineering Sciences. Springer. https://link.springer.com/article/10.1007/s12046-020-01501-3
Al-Laith, A, Shahbaz, M. (2021). Tracking sentiment towards news entities from Arabic news on social media. Future Generation Computer Systems. Elsevier B.V. https://doi.org/10.1016/j.future.2021.01.015
Aria, M, Cuccurullo, C. (2017) Bibliometrix: An R-tool for comprehensive science mapping analysis. Elsevier https://doi.org/10.1016/j.joi.2017.08.007
Azzouzi, M.E, Coatrieux, G, Bellafqira, R, Delamarre, D, Riou, C, Oubenali, N, Cabon, S, Cuggia, M, Bouzillé, G (2024), Automatic de-identification of French electronic health records: a cost-effective approach exploiting distant supervision and deep learning models. BMC Medical Informatics and Decision Making. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85185310221&doi=10.1186
Bach,N.X., Thuy,N,T,T., Chien,D,B., Duy,T,K., Hien,T,M., Phuong,T,M. (2019). Reference Extraction from Vietnamese Legal Documents. The Tenth International Symposium on Information and Communication Technology. ACM. https://doi.org/10.1145/3368926.3369731
Bajaj, G, Kursuncu, U, Gaur, M, Lokala, U, Hyder, A, Parthasarathy, S, Sheth, A. (2022). Knowledge-Driven Drug-Use NamedEntity Recognition with Distant Supervision. Studies in Health Technology and Informatics. IOS Press BV. https://doi.org/10.3233/shti220048
Barachi, M.E., Mathew, S.S., Alkhatib, M. (2022). Combining Named Entity Recognition and Emotion Analysis of Tweets for Early Warning of Violent Actions. 2022 7th International Conference on Smart and Sustainable Technologies, SpliTech 2022. https://ieeexplore.ieee.org/document/9854231
Begen, P.N and Vidisaova, L. (2022). Development of an algorithm for fixing the citizens’ assessments of digital transformation processes based on text analysis. ACM International Conference Proceeding Series. https://10.1145/3560107.3560203
Bouabdallaoui, I, Guerouate, F, Bouhaddour, S, Saadi, C, Sbihi, M. (2022). Named Entity Recognition applied on Moroccan tourism corpus. Procedia Computer Science. Elsevier B.V. https://doi.org/10.1016/j.procs.2021.12.256
Cardenas, P., Obara, B., Theodoropoulos, G., Kureshi, I. (2019). Defining an Alert Mechanism for Detecting Likely Threats to National Security. Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. https://ieeexplore.ieee.org/document/8622569
Cenikj, G,Vitanova, B. Eftimov, T. (2021). Skills Named-Entity Recognition for Creating a Skill Inventory of Today’s Workplace. Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021. https://ieeexplore.ieee.org/document/9671435
Chen, J, Huang, Y, Yang, F, Li, C. (2020). A novel named entity recognition approach of judicial case texts based on BiLSTM-CRF. 12th International Conference on Advanced Computational Intelligence, ICACI 2020. IEEE. https://ieeexplore.ieee.org/document/9177731
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. (2019). NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, pp. 4171-4186. https://doi.org/10.48550/arXiv.1810.04805
Dogra, V, Singh, A, Verma, S, Alharbi, A, Alosaimi, W. (2021). Event study: Advanced machine learning and statistical technique for analyzing sustainability in banking stocks. MDPI. https://doi.org/10.3390/math9243319
Eva, M.B, Weber, N. (2022). Councils in Action: Automating the Curation of Municipal Governance Data for Research. Proceedings of the Association for Information Science and Technology. John Wiley and Sons Inc. https://doi.org/10.1002/pra2.601
Gangadharan, V, Gupta, D. (2020). Recognizing Named Entities in Agriculture Documents using LDA based Topic Modelling Techniques. Procedia Computer Science. Elsevier B.V. https://doi.org/10.1016/j.procs.2020.04.143
Garat, D, and Wonsever, D. (2022). Automatic Curation of Court Documents: Anonymizing Personal Data. Information (Switzerland). MDPI. https://doi.org/10.3390/info13010027
Gasmi, H., Laval, J., Bouras, A. (2019). Information extraction of cybersecurity concepts: An LSTM approach. Applied Sciences (Switzerland) MDPI, 9 (19), art. no. 3945. https://doi.org/10.3390/app9193945
Grishman, R and Sundheim, B. (1995). Design of the MUC-6 Evaluation. In Sixth Message Understanding Conference (MUC-6): Proceedings of a Conference Held in Columbia, Maryland, November 6-8, 1995. https://doi.org/10.3115/1072399.1072401
Han, X, Wang, J (2019). Earthquake Information Extraction and Comparison from Different Sources Based on Web Text. Geo-Information. MDPI. https://doi.org/10.3390/ijgi8060252
He, S, Yang, H, Zheng, X, Wang, B, Zhou, Y, Xiong, Y, Zeng, D. (2019). Massive meme identification and popularity analysis in geopolitics. 2019 IEEE International Conference on Intelligence and Security Informatics, ISI 2019. IEEE. https://doi.org/10.1109/ISI.2019.8823294
Ivanin, V, Artemova, E, Batura, T, Ivanov, V, Sarkisyan, V, Tutubalina, E, Smurov, I. (2021). RuREBus: A Case Study of Joint Nameds Entity Recognition and Relation Extraction from E-Government Domain. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. Springer Science and Business Media Deutschland GmbH. https://doi.org/10.48550/arXiv.2010.15939
Kulkarni, P., Deshmukh, V., & Rane, K. (2023). A framework for providing structured invoice document using optimized Bert enabled deep convolutional neural network classifier. 2023 Proceedings of the 7th International Conference on I-SMAC. IEEE. https://doi.org/10.1109/I-SMAC58438.2023.10290498
Lane, H., Nelson, C., & Sorgente, T. (2018). Named Entity Recognition with Python. O’Reilly Media, Inc. https://www.oreilly.com/library/view/natural-language-processing/9781787285101/ ch03s02.html
Li, J., Sun, A., Han, J., Li, C. (2022). A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34 (1), pp. 50-70. https://doi.org/10.1109/TKDE.2020.2981314
Liu, Q, Wang, D, Zhou, M, Li, P, Qi, B, Wang, B. (2018). Chinese Governmental Named Entity Recognition. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. Springer Verlag. http://dx.doi.org/10.1007/978-3-030-03520-4_2
Lutfi, S, Yasin, R, Barachi, M.E, Oroumchian, F, Imene, A, Samuel Mathew, S. (2021). Temporal behavioral analysis of extremists on social media: A machine learning based approach. 2021 6th International Conference on Smart and Sustainable Technologies, SpliTech 2021. IEEE. https://doi.org/10.23919/SpliTech52315.2021.9566446
Mansouri, A, Affendey, L.S, Mamat, A. (2008), Named Entity Recognition Approaches. Journal of Computer Science, vol. 8, no.2, pp. 339–344, 2008. https://www.researchgate.net/publication/ 238607553_Named_Entity_Recognition_Approaches
Martinez-Seis, B., Pichardo-Lagunas, L., Koff, H., Equihua, H., Perez-Maqueo, O., Hernandez-Huerta, A. (2022). Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws. Data, MDPI. https://doi.org/10.3390/data7070091
Maurel, D, Morale, E, Thouvenin, N, Ringot, P, Turri, A. (2019). Istex: A database of twenty million scientific papers with a mining tool which uses named entities. Information (Switzerland). MDPI. https://doi.org/10.3390/info10050178
Nemes, L., Kiss, A. (2021). Information extraction and named entity recognition supported social media sentiment analysis during the COVID-19 pandemic. Applied Sciences (Switzerland), 11 (22), art. no. 11017. https://doi.org/10.3390/app112211017
Niu, H, Zeng, Z. (2018), A New Efficiency Approach for Chinese Litigants Extraction. Procedia Computer Science. Elsevier B.V. https://doi.org/10.1016/j.procs.2018.03.049
Park, J. S, Kim, G.W, Lee, D.H. (2020). Sensitive Data Identification in Structured Data through GenNER Model based on Text Generation and NER. ACM International Conference Proceeding Series. ACM. https://doi.org/10.1145/3398329.3398335
Pena, P, Aznar, R, Montanes, R, Del Hoyo, R. (2018). Open Data for Public Administration: Exploitation and semantic organization of institutional web content. Procesamiento del Lenguaje Natural. Sociedad Espanola para el Procesamiento del Lenguaje Natural. https://rua.ua.es/dspace/bitstream/10045/81360/1/PLN_61_21.pdf
Pimpisal, Simud, A, Sanglerdsinlapachai, T, Surasvadi,N, Plangprasopchok,N, Anon. (2021). Named Entity Recognition of Thai Documents using CRF with a Simple Data Masking Technique. 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2021. https://doi.org/10.1109/iSAI-NLP54397.2021.9678156
PRISMA 2020. PRISMA Statement: Checklist. http://www.prisma-statement.org/PRISMAStatement/Checklist
Pugliese, D.P.L, Guerriero, F, Macrina, G, Messina, E. (2021). A Natural Language Processing Tool to Support the Electronic Invoicing Process in Italy. Proceedings of the 11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2021. https://doi.org/10.1109/IDAACS53288.2021.9660987
Ramdhani, T.W, and Budi, I, Purwandari, B. (2021). Optical Character Recognition Engines Performance Comparison in Information Extraction. International Journal of Advanced Computer Science and Applications. https://dx.doi.org/10.14569/IJACSA.2021.0120814
Sandescu, C, Dinisor, A, Vladescu, C.V, Grigorescu, O, Corlatescu, D, Dascalu, M, Rughinis, R. (2022). Extracting Exploits And Attack Vectors From Cybersecurity News Using NLP. UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science. Politechnica University of Bucharest. https://www.scientificbulletin.upb.ro/rev_docs_arhiva/ full90f_939135.pdf
Sensuse, D.I, Putro, P.A.W, Rachmawati, R., Sunindyo, W.D. (2022) Initial Cybersecurity Framework in the New Capital City of Indonesia: Factors, Objectives, and Technology. Information 2022, 13, 580. https://doi.org/10.3390/info13120580
Shen, Z, Spruit, M. (2021). Automatic extraction of adverse drug reactions from summary of product characteristics. Applied Sciences (Switzerland). MDPI. https://doi.org/10.3390/app11062663
Silvestri, S, Gargiulo, F, Ciampi, M. (2021). Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases. Applied Sciences (Switzerland). MDPI. https://doi.org/10.3390/app12125775
Street,M, Mestric,I,I., Ndoni,A., Lenk,P., Teufert,J., Figueiredo,N. (2022). Data Driven Decision Support during COVID. International Conference on Military Communications and Information Systems. Elsevier. https://doi.org/10.1016/j.procs.2022.09.013
Suat-Rojas, N, Gutierrez-Osorio, C, Pedraza, C. (2022). Extraction and Analysis of Social Networks Data to Detect Traffic Accidents. Information (Switzerland). MDPI. https://doi.org/10.3390/info13010026
Sufi, F.K. (2022). Identifying the drivers of negative news with sentiment, entity, and regression analysis. (2022) International Journal of Information Management Data Insights, 2 (1), art. no. 100074. https://doi.org/10.1016/j.jjimei.2022.100074
Sutphin, C, Lee, K, Yepes, A.J, Uzuner, Ö, McInnes, B.T. (2022). Adverse drug event detection using reason assignments in FDA drug labels. Journal of Biomedical Informatics. Academic Press Inc. https://doi.org/10.1016/j.jbi.2020.103552
Tan, F, Yang, S, Wu, X, Xu, J. (2020). Exploring the relation between biomedical entities and government funding. CEUR Workshop Proceedings. CEUR-WS. https://ceur-ws.org/Vol-2658/paper6.pdf
Toledo,C.V, Dijk,F.V., Spruit.M. (2020). Dutch Named Entity Recognition and De-identification Methods for the Human Resource Domain. International Journal on Natural Language Computing (IJNLC). https://doi.org/10.48550/arXiv.2106.02287
Wang, L, Li, S, Yan, Q, Zhou, G. (2018). Domain-specific named entity recognition with document-level optimization. ACM Transactions on Asian and Low-Resource Language Information Processing. ACM. https://doi.org/10.1145/3213544
Wang, Q, You, H. (2022). A Study on BNM-cBLSTM for Financial Sentiment Analysis in European Bond Markets Based on mpBC-ELMo. 2022 International Conference on Data Analytics, Computing and Artificial Intelligence (ICDACAI). https://doi.org/10.1109/ICDACAI57211.2022.00032
Wang, Y, Wang, Z, Li, H, Huang W. (2023). Named Entity Recognition in Threat Intelligence Domain Based on Deep Learning. Journal of Northeastern University. https://doi.org/10.12068/j.issn.1005-3026.2023.01.005
White, N.M. (2022). The Hmong Medical Corpus: a biomedical corpus for a minority language. Language Resources and Evaluation. Springer Science and Business Media B.V. https://link.springer.com/content/pdf/10.1007/s10579-022-09596-2.pdf
Wicaksono, A.T, Mariyah, S. (2019). Social Network Analysis of Health Development in Indonesia. ICICOS 2019 - 3rd International Conference on Informatics and Computational Sciences: Accelerating Informatics and Computational Research for Smarter Society in The Era of Industry 4.0, Proceedings. IEEE. https://doi.org/10.1109/ICICoS48119.2019.8982482
Xiong, Z. and Kong, D. and Xia, Z. and Xue, Y. and Song, Z. and Wang, P. (2021). Chinese Government Official Document Named Entity Recognition Based on Albert. 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2021. https://doi.org/10.1109/ICCCBDA51879.2021.9442540
Yaozu, Y, Jiangen, Z. (2020). Constructing government procurement knowledge graph based on crawler data. Journal of Physics: Conference Series. IOP Publishing Ltd. https://iopscience.iop.org/article/10.1088/1742-6596/1693/1/012032/pdf
Zacharis, A., Gavrila, R., Patsakis, C., & Ikonomou, D. (2023). AI-assisted cyber security exercise content generation: Modeling a cyber conflict. International Conference on Cyber Conflict (CYCON), 2023-May, 217–238. https://doi.org/10.23919/CyCon58705.2023.10181930
Zhang, T, Liu, M, Ma, C, Tu, Z, Wang, Z. (2021). A Text Mining based Method for Policy Recommendation. Proceedings - 2021 IEEE International Conference on Services Computing, SCC 2021. IEEE. https://doi.org/10.1109/SCC53864.2021.00036
DOI: https://doi.org/10.24294/jipd9789
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Tosan Wiar Ramdhani, Indra Budi, Betty Purwandari
License URL: https://creativecommons.org/licenses/by/4.0/
This site is licensed under a Creative Commons Attribution 4.0 International License.