Named entity recognition in government domain: A systematic literature review
Vol 8, Issue 15, 2024
VIEWS - 904 (Abstract)
Abstract
Named Entity Recognition (NER), a core task in Information Extraction (IE) alongside Relation Extraction (RE), identifies and extracts entities like place and person names in various domains. NER has improved business processes in both public and private sectors but remains underutilized in government institutions, especially in developing countries like Indonesia. This study examines which government fields have utilized NER over the past five years, evaluates system performance, identifies common methods, highlights countries with significant adoption, and outlines current challenges. Over 64 international studies from 15 countries were selected using PRISMA 2020 guidelines. The findings are synthesized into a preliminary ontology design for Government NER.
Keywords
Full Text:
PDFReferences
- Agarwal, A., Toshniwal, D. (2019). Face off: Travel Habits, Road Conditions, and Traffic City Characteristics Bared Using Twitter. IEEE Access. https://ieeexplore.ieee.org/stamp/stamp.jsp? arnumber=8715356
- Agarwal, I.Y, Rana, D.P, Shaikh,M, Poudel,S. (2020). “Spatio-temporal approach for classification of COVID-19 pandemic fake news,” Social Network Analysis and Mining, Springer Vol 12 No 1, 2022Jayakumar, H, Krishnakumar, M.S, Peddagopu, V.V.V, Sridhar, R. RNN based question answer generation and ranking for financial documents using financial NER. Sadhana - Academy Proceedings in Engineering Sciences. Springer. https://link.springer.com/article/10.1007/s12046-020-01501-3
- Al-Laith, A, Shahbaz, M. (2021). Tracking sentiment towards news entities from Arabic news on social media. Future Generation Computer Systems. Elsevier B.V. https://doi.org/10.1016/j.future.2021.01.015
- Aria, M, Cuccurullo, C. (2017) Bibliometrix: An R-tool for comprehensive science mapping analysis. Elsevier https://doi.org/10.1016/j.joi.2017.08.007
- Azzouzi, M.E, Coatrieux, G, Bellafqira, R, Delamarre, D, Riou, C, Oubenali, N, Cabon, S, Cuggia, M, Bouzillé, G (2024), Automatic de-identification of French electronic health records: a cost-effective approach exploiting distant supervision and deep learning models. BMC Medical Informatics and Decision Making. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85185310221&doi=10.1186
- Bach,N.X., Thuy,N,T,T., Chien,D,B., Duy,T,K., Hien,T,M., Phuong,T,M. (2019). Reference Extraction from Vietnamese Legal Documents. The Tenth International Symposium on Information and Communication Technology. ACM. https://doi.org/10.1145/3368926.3369731
- Bajaj, G, Kursuncu, U, Gaur, M, Lokala, U, Hyder, A, Parthasarathy, S, Sheth, A. (2022). Knowledge-Driven Drug-Use NamedEntity Recognition with Distant Supervision. Studies in Health Technology and Informatics. IOS Press BV. https://doi.org/10.3233/shti220048
- Barachi, M.E., Mathew, S.S., Alkhatib, M. (2022). Combining Named Entity Recognition and Emotion Analysis of Tweets for Early Warning of Violent Actions. 2022 7th International Conference on Smart and Sustainable Technologies, SpliTech 2022. https://ieeexplore.ieee.org/document/9854231
- Begen, P.N and Vidisaova, L. (2022). Development of an algorithm for fixing the citizens’ assessments of digital transformation processes based on text analysis. ACM International Conference Proceeding Series. https://10.1145/3560107.3560203
- Bouabdallaoui, I, Guerouate, F, Bouhaddour, S, Saadi, C, Sbihi, M. (2022). Named Entity Recognition applied on Moroccan tourism corpus. Procedia Computer Science. Elsevier B.V. https://doi.org/10.1016/j.procs.2021.12.256
- Cardenas, P., Obara, B., Theodoropoulos, G., Kureshi, I. (2019). Defining an Alert Mechanism for Detecting Likely Threats to National Security. Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. https://ieeexplore.ieee.org/document/8622569
- Cenikj, G,Vitanova, B. Eftimov, T. (2021). Skills Named-Entity Recognition for Creating a Skill Inventory of Today’s Workplace. Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021. https://ieeexplore.ieee.org/document/9671435
- Chen, J, Huang, Y, Yang, F, Li, C. (2020). A novel named entity recognition approach of judicial case texts based on BiLSTM-CRF. 12th International Conference on Advanced Computational Intelligence, ICACI 2020. IEEE. https://ieeexplore.ieee.org/document/9177731
- Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. (2019). NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, pp. 4171-4186. https://doi.org/10.48550/arXiv.1810.04805
- Dogra, V, Singh, A, Verma, S, Alharbi, A, Alosaimi, W. (2021). Event study: Advanced machine learning and statistical technique for analyzing sustainability in banking stocks. MDPI. https://doi.org/10.3390/math9243319
- Eva, M.B, Weber, N. (2022). Councils in Action: Automating the Curation of Municipal Governance Data for Research. Proceedings of the Association for Information Science and Technology. John Wiley and Sons Inc. https://doi.org/10.1002/pra2.601
- Gangadharan, V, Gupta, D. (2020). Recognizing Named Entities in Agriculture Documents using LDA based Topic Modelling Techniques. Procedia Computer Science. Elsevier B.V. https://doi.org/10.1016/j.procs.2020.04.143
- Garat, D, and Wonsever, D. (2022). Automatic Curation of Court Documents: Anonymizing Personal Data. Information (Switzerland). MDPI. https://doi.org/10.3390/info13010027
- Gasmi, H., Laval, J., Bouras, A. (2019). Information extraction of cybersecurity concepts: An LSTM approach. Applied Sciences (Switzerland) MDPI, 9 (19), art. no. 3945. https://doi.org/10.3390/app9193945
- Grishman, R and Sundheim, B. (1995). Design of the MUC-6 Evaluation. In Sixth Message Understanding Conference (MUC-6): Proceedings of a Conference Held in Columbia, Maryland, November 6-8, 1995. https://doi.org/10.3115/1072399.1072401
- Han, X, Wang, J (2019). Earthquake Information Extraction and Comparison from Different Sources Based on Web Text. Geo-Information. MDPI. https://doi.org/10.3390/ijgi8060252
- He, S, Yang, H, Zheng, X, Wang, B, Zhou, Y, Xiong, Y, Zeng, D. (2019). Massive meme identification and popularity analysis in geopolitics. 2019 IEEE International Conference on Intelligence and Security Informatics, ISI 2019. IEEE. https://doi.org/10.1109/ISI.2019.8823294
- Ivanin, V, Artemova, E, Batura, T, Ivanov, V, Sarkisyan, V, Tutubalina, E, Smurov, I. (2021). RuREBus: A Case Study of Joint Nameds Entity Recognition and Relation Extraction from E-Government Domain. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. Springer Science and Business Media Deutschland GmbH. https://doi.org/10.48550/arXiv.2010.15939
- Kulkarni, P., Deshmukh, V., & Rane, K. (2023). A framework for providing structured invoice document using optimized Bert enabled deep convolutional neural network classifier. 2023 Proceedings of the 7th International Conference on I-SMAC. IEEE. https://doi.org/10.1109/I-SMAC58438.2023.10290498
- Lane, H., Nelson, C., & Sorgente, T. (2018). Named Entity Recognition with Python. O’Reilly Media, Inc. https://www.oreilly.com/library/view/natural-language-processing/9781787285101/ ch03s02.html
- Li, J., Sun, A., Han, J., Li, C. (2022). A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34 (1), pp. 50-70. https://doi.org/10.1109/TKDE.2020.2981314
- Liu, Q, Wang, D, Zhou, M, Li, P, Qi, B, Wang, B. (2018). Chinese Governmental Named Entity Recognition. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. Springer Verlag. http://dx.doi.org/10.1007/978-3-030-03520-4_2
- Lutfi, S, Yasin, R, Barachi, M.E, Oroumchian, F, Imene, A, Samuel Mathew, S. (2021). Temporal behavioral analysis of extremists on social media: A machine learning based approach. 2021 6th International Conference on Smart and Sustainable Technologies, SpliTech 2021. IEEE. https://doi.org/10.23919/SpliTech52315.2021.9566446
- Mansouri, A, Affendey, L.S, Mamat, A. (2008), Named Entity Recognition Approaches. Journal of Computer Science, vol. 8, no.2, pp. 339–344, 2008. https://www.researchgate.net/publication/ 238607553_Named_Entity_Recognition_Approaches
- Martinez-Seis, B., Pichardo-Lagunas, L., Koff, H., Equihua, H., Perez-Maqueo, O., Hernandez-Huerta, A. (2022). Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws. Data, MDPI. https://doi.org/10.3390/data7070091
- Maurel, D, Morale, E, Thouvenin, N, Ringot, P, Turri, A. (2019). Istex: A database of twenty million scientific papers with a mining tool which uses named entities. Information (Switzerland). MDPI. https://doi.org/10.3390/info10050178
- Nemes, L., Kiss, A. (2021). Information extraction and named entity recognition supported social media sentiment analysis during the COVID-19 pandemic. Applied Sciences (Switzerland), 11 (22), art. no. 11017. https://doi.org/10.3390/app112211017
- Niu, H, Zeng, Z. (2018), A New Efficiency Approach for Chinese Litigants Extraction. Procedia Computer Science. Elsevier B.V. https://doi.org/10.1016/j.procs.2018.03.049
- Park, J. S, Kim, G.W, Lee, D.H. (2020). Sensitive Data Identification in Structured Data through GenNER Model based on Text Generation and NER. ACM International Conference Proceeding Series. ACM. https://doi.org/10.1145/3398329.3398335
- Pena, P, Aznar, R, Montanes, R, Del Hoyo, R. (2018). Open Data for Public Administration: Exploitation and semantic organization of institutional web content. Procesamiento del Lenguaje Natural. Sociedad Espanola para el Procesamiento del Lenguaje Natural. https://rua.ua.es/dspace/bitstream/10045/81360/1/PLN_61_21.pdf
- Pimpisal, Simud, A, Sanglerdsinlapachai, T, Surasvadi,N, Plangprasopchok,N, Anon. (2021). Named Entity Recognition of Thai Documents using CRF with a Simple Data Masking Technique. 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2021. https://doi.org/10.1109/iSAI-NLP54397.2021.9678156
- PRISMA 2020. PRISMA Statement: Checklist. http://www.prisma-statement.org/PRISMAStatement/Checklist
- Pugliese, D.P.L, Guerriero, F, Macrina, G, Messina, E. (2021). A Natural Language Processing Tool to Support the Electronic Invoicing Process in Italy. Proceedings of the 11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2021. https://doi.org/10.1109/IDAACS53288.2021.9660987
- Ramdhani, T.W, and Budi, I, Purwandari, B. (2021). Optical Character Recognition Engines Performance Comparison in Information Extraction. International Journal of Advanced Computer Science and Applications. https://dx.doi.org/10.14569/IJACSA.2021.0120814
- Sandescu, C, Dinisor, A, Vladescu, C.V, Grigorescu, O, Corlatescu, D, Dascalu, M, Rughinis, R. (2022). Extracting Exploits And Attack Vectors From Cybersecurity News Using NLP. UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science. Politechnica University of Bucharest. https://www.scientificbulletin.upb.ro/rev_docs_arhiva/ full90f_939135.pdf
- Sensuse, D.I, Putro, P.A.W, Rachmawati, R., Sunindyo, W.D. (2022) Initial Cybersecurity Framework in the New Capital City of Indonesia: Factors, Objectives, and Technology. Information 2022, 13, 580. https://doi.org/10.3390/info13120580
- Shen, Z, Spruit, M. (2021). Automatic extraction of adverse drug reactions from summary of product characteristics. Applied Sciences (Switzerland). MDPI. https://doi.org/10.3390/app11062663
- Silvestri, S, Gargiulo, F, Ciampi, M. (2021). Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases. Applied Sciences (Switzerland). MDPI. https://doi.org/10.3390/app12125775
- Street,M, Mestric,I,I., Ndoni,A., Lenk,P., Teufert,J., Figueiredo,N. (2022). Data Driven Decision Support during COVID. International Conference on Military Communications and Information Systems. Elsevier. https://doi.org/10.1016/j.procs.2022.09.013
- Suat-Rojas, N, Gutierrez-Osorio, C, Pedraza, C. (2022). Extraction and Analysis of Social Networks Data to Detect Traffic Accidents. Information (Switzerland). MDPI. https://doi.org/10.3390/info13010026
- Sufi, F.K. (2022). Identifying the drivers of negative news with sentiment, entity, and regression analysis. (2022) International Journal of Information Management Data Insights, 2 (1), art. no. 100074. https://doi.org/10.1016/j.jjimei.2022.100074
- Sutphin, C, Lee, K, Yepes, A.J, Uzuner, Ö, McInnes, B.T. (2022). Adverse drug event detection using reason assignments in FDA drug labels. Journal of Biomedical Informatics. Academic Press Inc. https://doi.org/10.1016/j.jbi.2020.103552
- Tan, F, Yang, S, Wu, X, Xu, J. (2020). Exploring the relation between biomedical entities and government funding. CEUR Workshop Proceedings. CEUR-WS. https://ceur-ws.org/Vol-2658/paper6.pdf
- Toledo,C.V, Dijk,F.V., Spruit.M. (2020). Dutch Named Entity Recognition and De-identification Methods for the Human Resource Domain. International Journal on Natural Language Computing (IJNLC). https://doi.org/10.48550/arXiv.2106.02287
- Wang, L, Li, S, Yan, Q, Zhou, G. (2018). Domain-specific named entity recognition with document-level optimization. ACM Transactions on Asian and Low-Resource Language Information Processing. ACM. https://doi.org/10.1145/3213544
- Wang, Q, You, H. (2022). A Study on BNM-cBLSTM for Financial Sentiment Analysis in European Bond Markets Based on mpBC-ELMo. 2022 International Conference on Data Analytics, Computing and Artificial Intelligence (ICDACAI). https://doi.org/10.1109/ICDACAI57211.2022.00032
- Wang, Y, Wang, Z, Li, H, Huang W. (2023). Named Entity Recognition in Threat Intelligence Domain Based on Deep Learning. Journal of Northeastern University. https://doi.org/10.12068/j.issn.1005-3026.2023.01.005
- White, N.M. (2022). The Hmong Medical Corpus: a biomedical corpus for a minority language. Language Resources and Evaluation. Springer Science and Business Media B.V. https://link.springer.com/content/pdf/10.1007/s10579-022-09596-2.pdf
- Wicaksono, A.T, Mariyah, S. (2019). Social Network Analysis of Health Development in Indonesia. ICICOS 2019 - 3rd International Conference on Informatics and Computational Sciences: Accelerating Informatics and Computational Research for Smarter Society in The Era of Industry 4.0, Proceedings. IEEE. https://doi.org/10.1109/ICICoS48119.2019.8982482
- Xiong, Z. and Kong, D. and Xia, Z. and Xue, Y. and Song, Z. and Wang, P. (2021). Chinese Government Official Document Named Entity Recognition Based on Albert. 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2021. https://doi.org/10.1109/ICCCBDA51879.2021.9442540
- Yaozu, Y, Jiangen, Z. (2020). Constructing government procurement knowledge graph based on crawler data. Journal of Physics: Conference Series. IOP Publishing Ltd. https://iopscience.iop.org/article/10.1088/1742-6596/1693/1/012032/pdf
- Zacharis, A., Gavrila, R., Patsakis, C., & Ikonomou, D. (2023). AI-assisted cyber security exercise content generation: Modeling a cyber conflict. International Conference on Cyber Conflict (CYCON), 2023-May, 217–238. https://doi.org/10.23919/CyCon58705.2023.10181930
- Zhang, T, Liu, M, Ma, C, Tu, Z, Wang, Z. (2021). A Text Mining based Method for Policy Recommendation. Proceedings - 2021 IEEE International Conference on Services Computing, SCC 2021. IEEE. https://doi.org/10.1109/SCC53864.2021.00036
DOI: https://doi.org/10.24294/jipd9789
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Tosan Wiar Ramdhani, Indra Budi, Betty Purwandari
License URL: https://creativecommons.org/licenses/by/4.0/
This site is licensed under a Creative Commons Attribution 4.0 International License.