Research on Automatic Classification Based on Academic Text Corpus

Chuan Jiang, Zhixiao Zhao, Na Wu, Litao Lin

Article ID: 3649
Vol 6, Issue 5, 2023

VIEWS - 245 (Abstract) 83 (PDF)

Abstract


Recognizing the discipline category of the abstract text is of great significance for automatic text recommendation and knowledge mining. Therefore, this study obtained the abstract text of social science and natural science in the Web of Science 2010-2020, and used the machine learning model SVM and deep learning model TextCNN and SCI-BERT models constructed a discipline classification model. It was found that the SCI-BERT model had the best performance. The precision, recall, and F1 were 86.54%, 86.89%, and 86.71%, respectively, and the F1 is 6.61% and 4.05% higher than SVM and TextCNN. The construction of this model can effectively identify the discipline categories of abstracts, and provide effective support for automatic indexing of subjects.


Keywords


Deep Learning; SCI-BERT; Academic Literature; Automatic Indexing

Full Text:

PDF


References


1. Awad, M., Khanna, R., Awad, M., & Khanna, R. (2015). Support vector machines for classification. Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers, 39-66.

2. Bazi, Y., & Melgani, F. (2006). Toward an optimal SVM classification system for hyperspectral remote sensing images. IEEE Transactions on geoscience and remote sensing, 44(11), 3374-3385.

3. Chandra, M. A., & Bedi, S. S. (2021). Survey on SVM and their application in image classification. International Journal of Information Technology, 13, 1-11.

4. Deng, J., Cheng, L., & Wang, Z. (2021). Attention-based BiLSTM fused CNN with gating mechanism model for Chinese long text classification. Computer Speech & Language, 68, 101182.

5. Ebrahimi, M. A., Khoshtaghaza, M. H., Minaei, S., & Jamshidi, B. (2017). Vision-based pest detection based on SVM classification method. Computers and Electronics in Agriculture, 137, 52-58.

6. Enamoto, L., Santos, A. R., Maia, R., Weigang, L., & Filho, G. P. R. (2022). Multi-label legal text classification with BiLSTM and attention. International Journal of Computer Applications in Technology, 68(4), 369-378.

7. Guo, H., Zhang, J., & Xiao, L. (2022, August). A news text classification method based on the BiLSTM-Attention. In 2022 International Conference on Data Analytics, Computing and Artificial Intelligence (ICDACAI) (pp. 468-472). IEEE.

8. KafiKang, M., & Hendawi, A. (2023). Drug-Drug Interaction Extraction from Biomedical Text using Relation BioBERT with BLSTM. Machine Learning and Knowledge Extraction, 5(2), 669-683.

9. Khadhraoui, M., Bellaaj, H., Ammar, M. B., Hamam, H., & Jmaiel, M. (2022). Survey of BERT-base models for scientific text classification: COVID-19 case study. Applied Sciences, 12(6), 2891.

10. Li, W., Gao, S., Zhou, H., Huang, Z., Zhang, K., & Li, W. (2019, December). The automatic text classification method based on bert and feature union. In 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS) ,774-777.

11. Liu, G., & Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 337, 325-338.

12. Lu, Z., Du, P., & Nie, J. Y. (2020). VGCN-BERT: augmenting BERT with graph embedding for text classification. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, 14–17.

13. Maheshwari, H., Singh, B., & Varma, V. (2021, June). Scibert sentence representation for citation context classification. In Proceedings of the Second Workshop on Scholarly Document Processing ,130-133.

14. Mathur, A., & Foody, G. M. (2008). Multiclass and binary SVM classification: Implications for training and classification users. IEEE Geoscience and remote sensing letters, 5(2), 241-245.

15. Mondal, I. (2021). BBAEG: Towards BERT-based biomedical adversarial example generation for text classification. arXiv preprint arXiv:2104.01782.

16. Moraes, R., Valiati, J. F., & Neto, W. P. G. (2013). Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Systems with Applications, 40(2), 621-633.

17. Mou, X., Wei, Z., Zhang, Q., & Huang, X. J. (2023, July). UPPAM: A Unified Pre-training Architecture for Political Actor Modeling based on Language. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics , 11996-12012.

18. Neumann, J., Schnörr, C., & Steidl, G. (2005). Combined SVM-based feature selection and classification. Machine learning, 61, 129-150.

19. Pal, M., & Foody, G. M. (2010). Feature selection for classification of hyperspectral data by SVM. IEEE Transactions on Geoscience and Remote Sensing, 48(5), 2297-2307.

20. Pujari, S. C., Tarsi, T., Strötgen, J., & Friedrich, A. Team RobertNLP at BioCreative VII LitCovid track: neural document classification using SciBERT. In Proceedings of the Seventh BioCreative Challenge Evaluation Workshop.

21. Ruan, J., Caballero, J. M., & Juanatas, R. A. (2022, May). Chinese news text classification method based on attention mechanism. In 2022 7th international conference on business and industrial research (ICBIR) ,330-334.

22. Smirnova, N., & Mayr, P. (2023). Embedding Models for Supervised Automatic Extraction and Classification of Named Entities in Scientific Acknowledgements. arXiv preprint arXiv:2307.13377.

23. Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to fine-tune bert for text classification?. In Chinese Computational Linguistics: 18th China National Conference, CCL 2019, 194-206.

24.

25. Wu, H., He, Z., Zhang, W., Hu, Y., Wu, Y., & Yue, Y. (2021). Multi-class text classification model based on weighted word vector and bilstm-attention optimization. In Intelligent Computing Theories and Application: 17th International Conference, ICIC 2021, Shenzhen, China, August 12–15, 2021, Proceedings, 393-400.

26. Xu, E., Qin, D., Huang, J., & Zhang, J. (2022, July). Multi text classification model based on bret-cnn-bilstm. In 2022 IEEE 5th International Conference on Big Data and Artificial Intelligence (BDAI) ,184-189.

27. Xue, B., Zhu, C., Wang, X., & Zhu, W. (2022, March). The Study on the Text Classification Based on Graph Convolutional Network and BiLSTM. In Proceedings of the 8th International Conference on Computing and Artificial Intelligence ,323-331.

28. Yue, W., & Li, L. (2020, December). Sentiment analysis using Word2vec-CNN-BiLSTM classification. In 2020 seventh international conference on social networks analysis, management and security (SNAMS),1-5.




DOI: https://doi.org/10.24294/ijmss.v6i5.3649

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Creative Commons License

This site is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.