Preventing Data Loss by Harnessing Semantic Similarity and Relevance

Hanan Alhindi1
1, Issa Traore2, and Isaac Woungang3+
 

1King Saud University, Riyadh, Saudi Arabia
halhindi@ksu.edu.sa

 

2University of Victoria, Victoria, BC, Canada

itraore@ece.uvic.ca

 

3Ryerson University, Toronto, ON, Canada

iwoungan@cs.ryerson.ca

 

Abstract

Malicious insiders are considered among the most dangerous threat actors faced by organizations that maintain security sensitive data. Data loss prevention (DLP) systems are designed primarily to detect and/or prevent any illicit data loss or leakage out of the organization by both authorized and unauthorized users. However, exiting DLP systems face several challenges related to performance and efficiency, especially when skillful malicious insiders transfer critical data after altering it syntactically but not semantically. In this paper, we propose a new approach for matching and detecting similarities between monitored and transferred data by employing the conceptual and relational semantics, including extracting explicit relationships and inferring implicit relationships. In our novel approach, we detect altered sensitive data leakage effectively by combining semantic similarity and semantic relevance metrics, which are based on an ontology. Our experimental results show that our system generates on average relatively high detection rate (DR) and low false positive rate (FPR).

Keywords: Data loss prevention, Threat actors, Malicious insiders, Similarities, Data leakage,

Detection rate

 

+: Corresponding author: Isaac Woungang

350, Victoria street, Toronto, Ontario, M5B 2K3, Canada, Tel: +1-416-979-5000 ext. 6972,
Web: https://www.cs.ryerson.ca/~iwoungan/

 

Journal of Internet Services and Information Security (JISIS), 11(2): 78-99, May 2021

Received: February 15, 2021; Accepted: April 20, 2021; Published: May 31, 2021

DOI: 10.22667/JISIS.2021.05.31.078 [pdf]