Towards Detecting and Classifying
Malicious URLs 1Colorado
Mesa University, Grand
Junction, CO 81501, USA {cpjohnson, bkhadka}@mavs.coloradomesa.edu,
rbasnet@coloradomesa.edu 2University
of Southern California, Los
Angeles, CA 90007, USA doleck@usc.edu Abstract Emails containing Uniform Resource Locators (URLs)
pose substantial risks to organizations by potentially compromising both
credentials and network security through general and spear-phishing campaigns
to their employees. The detection and classification of malicious URLs is an
important research problem with practical applications. With an appropriate
machine learning model, an organization may protect itself by filtering
incoming emails and the websites its employees are visiting based on the
maliciousness of URLs contained in emails and web pages. In this work, we
compare the performance of traditional machine learning algorithms, such as
Random Forest, CART, and kNN against popular deep
learning framework models, such as Fast.ai and Keras-TensorFlow
across CPU, GPU, and TPU architectures. Using the publicly available
ISCX-URL-2016 dataset, we present the models’ performances across binary and
multiclass classification experiments. By collecting accuracy and timing
metrics, we find that Random Forest, Keras-TensorFlow,
and Fast.ai models performed comparably and with the highest accuracies >
96% in both the detection and classification of malicious URLs, with Random
Forest as the preferable model based on time, performance, and complexity
constraints. Additionally, by ranking and using feature selection techniques,
we determine that the top 5-10 features provide the best performances
compared to using all the features provided in the dataset. Keywords: Malicious URLs, Phishing URLs, Deep
Learning, Web Security, Machine Learning +: Corresponding author: Ram B. Basnet Journal of Wireless Mobile Networks, Ubiquitous
Computing, and Dependable Applications (JoWUA),
Vol. 11, No. 4, pp. 31-48, December 2020 [pdf] DOI: 10.22667/JOWUA.2020.12.31.031 |