An unsupervised machine learning-based profile system of Chinese researchers

Yan Yu, Peiyu Xu, Shuo Liu, Taiming He, Lu Yang, Junqiang Zhang

Article ID: 7281
Vol 8, Issue 11, 2024

VIEWS - 32 (Abstract) 14 (PDF)

Abstract


The construction of researcher profiles is crucial for modern research management and talent assessment. Given the decentralized nature of researcher information and evaluation challenges, we propose a profile system for Chinese researchers based on unsupervised machine learning and algorithms. This system builds comprehensive profiles based on researchers’ basic and behavior information dimensions. It employs Selenium and Web Crawler for real-time data retrieval from academic platforms, utilizes TF-IDF and BERT for expertise recognition, DTM for academic dynamics, and K-means clustering for profiling. The experimental results demonstrate that these methods are capable of more accurately mining the academic expertise of researchers and performing domain clustering scoring, thereby providing a scientific basis for the selection and academic evaluation of research talents. This interactive analysis system aims to provide an intuitive platform for profile construction and analysis.


Keywords


researcher profiles; machine learning; unsupervised learning; expertise recognition; cluster scoring

Full Text:

PDF


References


Al-Shamri, M. Y. H. (2016). User profiling approaches for demographic recommender systems. Knowledge-Based Systems, 100, 175–187. https://doi.org/10.1016/j.knosys.2016.03.006

Blei, D. M., & Lafferty, J. D. (2007). Correction: A correlated topic model of Science. The Annals of Applied Statistics, 1(2). https://doi.org/10.1214/07-aoas136

Boussaadi, S., Aliane, D. H., & Abdeldjalil, P. O. (2020). The Researchers Profile with Topic Modeling. In: Proceedings of 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS); 2–3 December 2020; Kenitra, Morocco.

Bulut, Z. A., & Doğan, O. (2017). The ABCD typology: Profile and motivations of Turkish social network sites users. Computers in Human Behavior, 67, 73–83. https://doi.org/10.1016/j.chb.2016.10.021

Chamorro-Padial, J., & Rodríguez-Sánchez, R. (2023). The relevance of title, abstract, and keywords for scientific paper quality and potential impact. Multimedia Tools and Applications, 82(15), 23075–23090. https://doi.org/10.1007/s11042-023-14451-9

Chavez, J. V., Libre, J. M., Gregorio, M. W., et al. (2023). Human resource profiling for post-pandemic curriculum reconfiguration in higher education. Journal of Infrastructure, Policy and Development, 7(2), 1975. https://doi.org/10.24294/jipd.v7i2.1975

Chen, L., Guo, S., Teng, G., et al. (2019). Research on the focus and migration of researchers’ study topics (Chinese). Journal of the China Society for Scientific and Technical Information, 12, 9-17. https://doi.org/10.3772/j.issn.1673-2286.2019.12.002

de Campos, L. M., Fernández-Luna, J. M., Huete, J. F., et al. (2020). Automatic construction of multi-faceted user profiles using text clustering and its application to expert recommendation and filtering problems. Knowledge-Based Systems, 190, 105337. https://doi.org/10.1016/j.knosys.2019.105337

Gao, J., Peng, B. (2021). Research on knowledge discovery methods for cultural relics information resources based on topic identification (Chinese). Information Science, 39(4), 9-14.

Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16569–16572. https://doi.org/10.1073/pnas.0507655102

Holanda, O., Elias, E., Costa, E., et al. (2013). Towards an Agent-Based Approach for Automatic Generation of Researcher Profiles Using Multiple Data Sources. In: Proceedings of 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT). pp.163–166.

Jia, T., Wang, D., & Szymanski, B. K. (2017). Quantifying patterns of research-interest evolution. Nature Human Behaviour, 1(4). https://doi.org/10.1038/s41562-017-0078

Lei, W. (2017). Enhanced text feature representation for short text topic modeling (Chinese) [Master’s thesis]. Nanjing University of Aeronautics and Astronautics.

Li, S. (2023). Research on user profiling technology for university researchers (Chinese). In: Proceedings of the 27th Annual Conference on New Network Technologies and Applications of the China Computer Users Association Network Application Branch; November 2023; Zhenjiang, China. pp. 498-501.

Liu, C. (2018). Research on an internal threat detection framework based on user profiling technology (Chinese) [Master’s thesis]. Information Engineering University.

Noureddine, H., Jarkass, I., Hazimeh, H., et al. (2015). CARP: Correlation Based Approach for Researcher Profiling. In: Proceedings of the 27th International Conference on Software Engineering and Knowledge Engineering; 6–8 July 2015; Pittsburgh, USA.

O’Leary, D. E. (2021). An Analysis of Information Systems Researcher and Collaboration Rankings. Journal of Organizational Computing and Electronic Commerce, 31(3), 270–292. https://doi.org/10.1080/10919392.2021.1975477

Özçift, A., Akarsu, K., Yumuk, F., et al. (2021). Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish. Automatika, 62(2), 226–238. https://doi.org/10.1080/00051144.2021.1922150

Papaevangelou, O., Syndoukas, D., Kalogiannidis, S., et al. (2023). Efficacy of embedding IT in human resources (HR) practices in education management. Journal of Infrastructure, Policy and Development, 8(1). https://doi.org/10.24294/jipd.v8i1.2371

Pottier, P., Lagisz, M., Burke, S., et al. (2024). Title, abstract and keywords: a practical guide to maximize the visibility and impact of academic papers. Proceedings B, 291(2027), 20241222. https://doi.org/10.1098/rspb.2024.1222

Sateli, B., Löffler, F., König-Ries, B., et al. (2017). ScholarLens: extracting competences from research publications for the automatic generation of semantic user profiles. PeerJ Computer Science, 3, e121. Portico. https://doi.org/10.7717/peerj-cs.121

Song, P., Long, C., Ni, X., et al. (2022). Research on the method of identifying academic expertise of researchers based on the iceberg model (Chinese). Data Analysis and Knowledge Discovery, 50-60.

Tang, J. (2016). AMiner: Toward understanding big scholar data. In: Proceedings of the ninth ACM international conference on web search and data mining. pp. 467-467.

Tang, J., Li, J., Zhang, K., et al. (2018). AMiner: A big data mining and service platform for scientific and technological information (Chinese). China Science and Technology Achievements, 19(13), 57-58. https://doi.org/10.3772/j.issn.1009-5659.2018.13.026

Wang, Q. (2019). Information push and mining model based on time-varying neighborhood system (Chinese) [Master’s thesis]. Southwest Jiaotong University.

Xu, L., Zhang, J., Zhang, C., et al. (2024). Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT framework. Geo-Spatial Information Science, 1–19. https://doi.org/10.1080/10095020.2024.2354229

Yimam-Seid, D., Kobsa, A. (2003). Expert-Finding Systems for Organizations: Problem and Domain Analysis and the DEMOIR Approach. Journal of Organizational Computing and Electronic Commerce, 13(1), 1-24. https://doi.org/10.1207/S15327744JOCE1301_1

Zhang, Y., Huang, J., Wang, G. (2019). A method for constructing a three-dimensional accurate profile of researchers’ scientific behavior considering global and local information (Chinese). Journal of Information Science, 38(10), 1012-1021.

Zhao, H., Hua, B., He, H. (2020). Science and Technology Intelligence User Profile Label Generation and Recommendation (Chinese). Journal of the China Society for Scientific and Technical Information, 39(11), 1214-1222.

Zhao, Y. (2009). Research on K-means clustering mining method based on genetic algorithm (Chinese) [Master’s thesis]. Qingdao University of Science and Technology.

Zhu, L., Ma, B., Zhao, X. (2010). Cluster validity analysis based on silhouette coefficient. Computer Applications, 30(12), 139-141.

Zou, L., He, Z., Zhou, C., et al. (2024). Multi-class multi-label classification of social media texts for typhoon damage assessment: a two-stage model fully integrating the outputs of the hidden layers of BERT. International Journal of Digital Earth, 17(1). https://doi.org/10.1080/17538947.2024.2348668




DOI: https://doi.org/10.24294/jipd.v8i11.7281

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Yan Yu, Peiyu Xu, Shuo Liu, Taiming He, Lu Yang, Junqiang Zhang

License URL: https://creativecommons.org/licenses/by/4.0/

This site is licensed under a Creative Commons Attribution 4.0 International License.