Learning to Detect Phishing Webpages

Ram B. Basnet
1+ and Andrew H. Sung2
 

1Colorado Mesa University, Grand Junction, Colorado, USA
rbasnet@coloradomesa.edu

 

2University of Southern Mississippi Hattiesburg, Mississippi, USA

andrew.sung@usm.edu

 

Abstract

Phishing has become a lucrative business for cyber criminals whose victims range from end users to large corporations and government organizations. Though Internet users are generally becoming more aware of phishing websites, cyber scammers come up with novel schemes that circumvent phishing filters and often succeed in fooling even savvy users. Recent studies to detect phishing and malicious webpages using features from URLs alone show promise. The approach, however, may not be reliable and robust enough to detect evolving sophisticated phishing webpages. For examples, phishers can use URL shortening services to masquerade their phishing URLs, or use compromised legitimate websites to host their phishing campaign. Along with the features from URLs, we propose many novel content based features and apply cutting-edge machine learning techniques to demonstrate that our approach can detect phishing webpages with error rates 0.04-0.44%, false positive and false negative rates of 0.0-0.30% and 0.06-0.73% respectively on real-world data sets using Random Forests classifier, thereby improving previous results on the important problem of phishing detection.

 

Keywords: phishing attack, phishing webpages, content-based approach, batch learning, online learning

 

+: Corresponding author: Ram B. Basnet
Department of Computer Science, Mathematics and Statistics, Colorado Mesa University, Grand Junction, CO, 81501, USA, Tel: +1-970-248-1680, Web: myhome.coloradomesa.edu/˜rbasnet

 

Journal of Internet Services and Information Security (JISIS), 4(3): 21-39, August 2014 [pdf]