Identifying suspicious internet threat exchanges using machine learning algorithms to ensure privacy and cybersecurity in the USA

Syeda Farjana Farabi; Barna Biswas; Md Imran Sarkar; Nur Mohammad; Mohammad Zahidul Alam; Rakibul Hasan; Masuk Abdullah

doi:10.24294/jipd8848

Identifying suspicious internet threat exchanges using machine learning algorithms to ensure privacy and cybersecurity in the USA

Syeda Farjana Farabi, Barna Biswas, Md Imran Sarkar, Nur Mohammad, Mohammad Zahidul Alam, Rakibul Hasan, Masuk Abdullah

Article ID: 8848
Vol 8, Issue 15, 2024

VIEWS - 3866 (Abstract)

Abstract

The usage of cybersecurity is growing steadily because it is beneficial to us. When people use cybersecurity, they can easily protect their valuable data. Today, everyone is connected through the internet. It’s much easier for a thief to connect important data through cyber-attacks. Everyone needs cybersecurity to protect their precious personal data and sustainable infrastructure development in data science. However, systems protecting our data using the existing cybersecurity systems is difficult. There are different types of cybersecurity threats. It can be phishing, malware, ransomware, and so on. To prevent these attacks, people need advanced cybersecurity systems. Many software helps to prevent cyber-attacks. However, these are not able to early detect suspicious internet threat exchanges. This research used machine learning models in cybersecurity to enhance threat detection. Reducing cyberattacks internet and enhancing data protection; this system makes it possible to browse anywhere through the internet securely. The Kaggle dataset was collected to build technology to detect untrustworthy online threat exchanges early. To obtain better results and accuracy, a few pre-processing approaches were applied. Feature engineering is applied to the dataset to improve the quality of data. Ultimately, the random forest, gradient boosting, XGBoost, and Light GBM were used to achieve our goal. Random forest obtained 96% accuracy, which is the best and helpful to get a good outcome for the social development in the cybersecurity system.

Keywords

cybersecurity; social development; random forest; gradient boosting; XGBoost; machine learning

Full Text:

PDF

References

Aksu, D., & Aydin, M. A. (2018). Detecting Port Scan Attempts with Comparative Analysis of Deep Learning and Support Vector Machine Algorithms. 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT).
Akter, J., Nilima, S. I., Hasan, R., Tiwari, A., Ullah, M. W., & Kamruzzaman, M. (2024). Artificial intelligence on the agro-industry in the United States of America. AIMS Agriculture and Food, 9(4), 959–979. https://doi.org/10.3934/agrfood.2024052
Bahaghighat, M., Ghasemi, M., & Ozen, F. (2023). A high-accuracy phishing website detection method based on machine learning. Journal of Information Security and Applications, 77, 103553. https://doi.org/https://doi.org/10.1016/j.jisa.2023.103553
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5
Bhuyan, M. K., Kamruzzaman, M., Nilima, S. I., Khatoon, R., & Mohammad, N. (2024). Convolutional Neural Networks Based Detection System for Cyber-attacks in Industrial Control Systems. Journal of Computer Science and Technology Studies, 6(3), 86–96.
Biswas, B., Sharmin, S., Hossain, M. A., Alam, M. Z., & Sarkar, M. I. (2024). Risk Analysis-based Decision Support System for Designing Cybersecurity of Information Technology. Journal of Business and Management Studies, 6(5), 13–22.
Breiman, L. (2001). Random forests. Machine learning, 45, 5–32.
Buss, D. D. (2002). Technology in the Internet age. 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).
Chen, C. W., Su, C. H., Lee, K. W., & Bair, P. H. (2020). Malware Family Classification using Active Learning by Learning. 2020 22nd International Conference on Advanced Communication Technology (ICACT).
Chowdhury, A. A. M., & Arefeen, S. (2011). Software risk management: importance and practices. IJCIT, ISSN, 2078-5828.
Chugh, A. (2018). Label Encoding of datasets in Python. GeeksforGeeks.
Datta, S. D., Islam, M., Rahman Sobuz, M. H., Ahmed, S., & Kar, M. (2024). Artificial intelligence and machine learning applications in the project lifecycle of the construction industry: A comprehensive review. Heliyon, 10(5), e26888. https://doi.org/https://doi.org/10.1016/j.heliyon.2024.e26888
Datta, S. D., Sarkar, M. M., Rakhe, A. S., Aditto, F. S., Sobuz, M. H. R., Shaurdho, N. M. N., Nijum, N. J., & Das, S. (2024). Analysis of the characteristics and environmental benefits of rice husk ash as a supplementary cementitious material through experimental and machine learning approaches. Innovative Infrastructure Solutions, 9(4), 121. https://doi.org/10.1007/s41062-024-01423-7
Federal Bureau of Investigation (FBI). (2023). Cyber Crime | Federal Bureau of Investigation,” Federal Bureau of Investigation. Retrieved 5 July 2024 from https://www.fbi.gov/investigate/cyber
Fortinet. (2024). 7 Common Web Security Threats for an Enterprise. Retrieved 15 July 2024 from https://www.fortinet.com/resources/cyberglossary/web-security-threats#:~:text=The%20most%20common%20web%20security
Gallagher, S., & Brandt, A. (2020). Facing down the myriad threats tied to COVID-19. Sophos, Abingdon, United Kingdom. Available online: https://news. sophos. com/en-us/2020/04/14/covidmalware/(accessed on January 2024).
GeeksforGeeks. (2024). XGBoost. . Retrieved 8 August 2024 from https://www.geeksforgeeks.org/xgboost/
Görgün, E. (2022). Characterization of Superalloys by Artificial Neural Network Method. Online International Symposium on Applied Mathematics and Engineering (ISAME22) January 21–23, 2022 Istanbul-Turkey,
Gunay, D. (2023). Random Forest. Medium.
Habibur Rahman Sobuz, M., Khan, M. H., Kawsarul Islam Kabbo, M., Alhamami, A. H., Aditto, F. S., Saziduzzaman Sajib, M., Johnson Alengaram, U., Mansour, W., Hasan, N. M. S., Datta, S. D., & Alam, A. (2024). Assessment of mechanical properties with machine learning modeling and durability, and microstructural characteristics of a biochar-cement mortar composite. Construction and Building Materials, 411, 134281. https://doi.org/https://doi.org/10.1016/j.conbuildmat.2023.134281
Hasan, N. M. S., Sobuz, M. H. R., Shaurdho, N. M. N., Meraz, M. M., Datta, S. D., Aditto, F. S., Kabbo, M. K. I., & Miah, M. J. (2023). Eco-friendly concrete incorporating palm oil fuel ash: Fresh and mechanical properties with machine learning prediction, and sustainability assessment. Heliyon, 9(11). https://doi.org/10.1016/j.heliyon.2023.e22296
Hassan, A., Tahir, S., & Baig, A. I. (2019). Unsupervised Machine Learning for Malicious Network Activities. 2019 International Conference on Applied and Engineering Mathematics (ICAEM),
Hossain, M. A., Tiwari, A., Saha, S., Ghimire, A., Imran, M. A. U., & Khatoon, R. (2024). Applying the Technology Acceptance Model (TAM) in Information Technology System to Evaluate the Adoption of Decision Support System. Journal of Computer and Communications, 12(8), 242–256.
Innab, N., Osman, A. A. F., Ataelfadiel, M. A. M., Abu-Zanona, M., Elzaghmouri, B. M., Zawaideh, F. H., & Alawneh, M. F. (2024). Phishing Attacks Detection Using Ensemble Machine Learning Algorithms. Computers, Materials and Continua, 80(1), 1325–1345. https://doi.org/https://doi.org/10.32604/cmc.2024.051778
Jamil, A., Asif, K., Ashraf, R., Mehmood, S., & Mustafa, G. (2018). A comprehensive study of cyber attacks & counter measures for web systems Proceedings of the 2nd International Conference on Future Networks and Distributed Systems, Amman, Jordan. https://doi.org/10.1145/3231053.3231116
Johora, F. T., Hasan, R., Farabi, S. F., Alam, M. Z., Sarkar, M. I., & Mahmud, M. A. A. (2024). AI Advances: Enhancing Banking Security with Fraud Detection. 2024 First International Conference on Technological Innovations and Advance Computing (TIACOMP),
Johora, F. T., Manik, M. M. T. G., Tasnim, A. F., Nilima, S. I., & Hasan, R. (2021). Advanced-Data Analytics for Understanding Biochemical Pathway Models. American Journal of Computing and Engineering, 4(2), 21–34.
Kamruzzaman, M., Bhuyan, M. K., Hasan, R., Farabi, S. F., Nilima, S. I., & Hossain, M. A. (2024). Exploring the Landscape: A Systematic Review of Artificial Intelligence Techniques in Cybersecurity. 2024 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI).
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.
Khan, M. M. H., Sobuz, M. H. R., Meraz, M. M., Tam, V. W. Y., Hasan, N. M. S., & Shaurdho, N. M. N. (2023). Effect of various powder content on the properties of sustainable self-compacting concrete. Case Studies in Construction Materials, 19, e02274. https://doi.org/https://doi.org/10.1016/j.cscm.2023.e02274
Khan, N. A., Brohi, S. N., & Zaman, N. (2023). Ten deadly cyber security threats amid COVID-19 pandemic. Authorea Preprints.
Konstantinov, A. V., & Utkin, L. V. (2021). Interpretable machine learning with an ensemble of gradient boosting machines. Knowledge-Based Systems, 222, 106993. https://doi.org/https://doi.org/10.1016/j.knosys.2021.106993
Lee, Roh, H., & Lee, W. (2020). Poster Abstract: Encrypted Malware Traffic Detection Using Incremental Learning. IEEE INFOCOM 2020—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS),
Lee, Z. S., Guo, H., & Zhou, L. (2020). Rail System Anomaly Detection via Machine Learning Approaches. 2020 IEEE REGION 10 CONFERENCE (TENCON).
Linkon, A. A., Noman, I. R., Islam, M. R., Bortty, J. C., Bishnu, K. K., Islam, A., Hasan, R., & Abdullah, M. (2024). Evaluation of Feature Transformation and Machine Learning Models on Early Detection of Diabetes Melitus. IEEE Access, 1-1. https://doi.org/10.1109/ACCESS.2024.3488743
Mandot, P. (2017). What is LightGBM, How to implement it? How to fine tune the parameters. Online. In Medium. https://medium. com/@ pushkarmandot/https-medium-com-pushkarmandotwhat-is-lightgbm-how-to-implement-it-how-to-fine-tune-the-parameters-60347819b7fc.
Manik, M. M. T. G., Nilima, S. I., Mahmud, M. A. A., Sharmin, S., & Hasan, R. (2022). Discovering Disease Biomarkers in Metabolomics via Big Data Analytics. American Journal of Statistics and Actuarial Sciences, 4(1), 35–49. https://doi.org/https://doi.org/10.47672/ajsas.2452
Mehedi, M. T., Sobuz, M. H. R., Hasan, N. M. S., Jabin, J. A., Nijum, N. J., & Miah, M. J. (2024). High-strength fiber reinforced concrete production with incorporating volcanic pumice powder and steel fiber: sustainability, strength and machine learning technique. Asian Journal of Civil Engineering. https://doi.org/10.1007/s42107-024-01169-8
Mitchell, R., & Frank, E. (2017). Accelerating the XGBoost algorithm using GPU computing. PeerJ Computer Science, 3, e127.
Mohammad, N., Khatoon, R., Nilima, S. I., Akter, J., Kamruzzaman, M., & Sozib, H. M. (2024). Ensuring Security and Privacy in the Internet of Things: Challenges and Solutions. Journal of Computer and Communications, 12(8), 257–277.
Mosa, D. T., Shams, M. Y., Abohany, A. A., El-kenawy, E.-S. M., & Thabet, M. (2023). Machine Learning Techniques for Detecting Phishing URL Attacks. Computers, Materials and Continua, 75(1), 1271–1290. https://doi.org/https://doi.org/10.32604/cmc.2023.036422
Mottakin, M., Datta, S. D., Hossain, M. M., Sobuz, M. H. R., Rahman, S. M. A., & Alharthai, M. (2024). Evaluation of textile effluent treatment plant sludge as supplementary cementitious material in concrete using experimental and machine learning approaches. Journal of Building Engineering, 96, 110627. https://doi.org/https://doi.org/10.1016/j.jobe.2024.110627
Nilima, S. I., Bhuyan, M. K., Kamruzzaman, M., Akter, J., Hasan, R., & Johora, F. T. (2024). Optimizing Resource Management for IoT Devices in Constrained Environments. Journal of Computer and Communications, 12(8), 81–98.
Shahriar, H., Qian, K., & Zhang, H. (2020). Learning Environment Containerization of Machine Leaning for Cybersecurity. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC),
Shetty, Patil, A., & Mohana. (2023). Malicious URL Detection and Classification Analysis using Machine Learning Models. 2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT),
Siddhartha, M. (2024). Malicious URLs dataset. Retrieved 8 June 2024 from https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset/data
Sobuz, M. H. R., Aditto, F. S., Datta, S. D., Kabbo, M. K. I., Jabin, J. A., Hasan, N. M. S., Khan, M. M. H., Rahman, S. M. A., Raazi, M., & Zaman, A. A. U. (2024). High-Strength Self-Compacting Concrete Production Incorporating Supplementary Cementitious Materials: Experimental Evaluations and Machine Learning Modelling. International Journal of Concrete Structures and Materials, 18(1), 67. https://doi.org/10.1186/s40069-024-00707-7
Sobuz, M. H. R., Al, I., Datta, S. D., Jabin, J. A., Aditto, F. S., Sadiqul Hasan, N. M., Hasan, M., & Zaman, A. A. U. (2024). Assessing the influence of sugarcane bagasse ash for the production of eco-friendly concrete: Experimental and machine learning approaches. Case Studies in Construction Materials, 20, e02839. https://doi.org/https://doi.org/10.1016/j.cscm.2023.e02839
Sobuz, M. H. R., Jabin, J. A., Ashraf, J., Anzum, M. T., Shovo, A. R., Rifat, M. T. R., & Adnan, T. (2024). Enhancing sustainable concrete production by utilizing fly ash and recycled concrete aggregate with experimental investigation and machine learning modeling. Journal of Building Pathology and Rehabilitation, 9(2), 134.
Sobuz, M. H. R., Joy, L. P., Akid, A. S. M., Aditto, F. S., Jabin, J. A., Hasan, N. M. S., Meraz, M. M., Kabbo, M. K. I., & Datta, S. D. (2024). Optimization of recycled rubber self-compacting concrete: Experimental findings and machine learning-based evaluation. Heliyon, 10(6), e27793. https://doi.org/https://doi.org/10.1016/j.heliyon.2024.e27793
Swathi, Y., Hegde, P., Sravani, P., & Hegde, P. (2023). Detection of Phishing Websites Using Machine Learning. 2023 International Conference on the Confluence of Advancements in Robotics, Vision and Interdisciplinary Technology Management (IC-RVITM).
Tang, L., & Mahmoud, Q. H. (2021). A Survey of Machine Learning-Based Solutions for Phishing Website Detection. Machine Learning and Knowledge Extraction, 3(3), 672–694.
Turner, C. R., Fuggetta, A., Lavazza, L., & Wolf, A. L. (1999). A conceptual basis for feature engineering. Journal of Systems and Software, 49(1), 3–15.
Wang, Y., & Wang, T. (2020). Application of Improved LightGBM Model in Blood Glucose Prediction. Applied Sciences, 10(9).
Zhao, Q., Sun, J., Ren, H., & Sun, G. (2020). Machine-Learning Based TCP Security Action Prediction. 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE).

DOI: https://doi.org/10.24294/jipd8848

Refbacks

There are currently no refbacks.

Copyright (c) 2024 Syeda Farjana Farabi, Barna Biswas, Md Imran Sarkar, Nur Mohammad, Mohammad Zahidul Alam, Rakibul Hasan, Masuk Abdullah

License URL: https://creativecommons.org/licenses/by/4.0/

This site is licensed under a Creative Commons Attribution 4.0 International License.

Username

Password

Remember me

ISSN

Publication Frequency

Identifying suspicious internet threat exchanges using machine learning algorithms to ensure privacy and cybersecurity in the USA

Abstract

Keywords

Full Text:

References

Refbacks