Detecting Information Leakage via
a HTTP Request Based on the Edit Distance

Kazuki Chiba, Yoshiaki Hori^*, and Kouichi Sakurai

Institute of Systems, Information Technologies and Nanotechnologies /

Kyushu University Fukuoka, Japan

{chiba, hori}@itslab.inf.kyushu-u.ac.jp, sakurai@csce.kyushu-u.ac.jp

Abstract

Recently, we often face the problem of information leakage. In a lot of routes of leakage, the number of leakage victims
via the Internet makes up approximately the half of all leakage victims. The cause of leakage via the Internet is divided
into human action and malware such as spyware. For example, it occurs when human writes on the bulletin board and
spyware works. Especially a technical countermeasure against spyware is needed. In any event, we cannot trust
countermeasures for information leakage via the Internet completely.

When a web browser communicates with a server, it sends a HTTP request. The server replies with the information
specified in the HTTP request. Some spyware takes advantage of the HTTP request. Installed spyware collects user’s
information and embeds it in the HTTP request, then sends it to an attacker’s server. Filtering packets by a port number
of TCP or UDP is not a good way because HTTP is a main communication protocol. A signature based technique is often
used as a countermeasure against these spyware. If data of some software matches with signatures stored in the database,
it is regarded as spyware. This technique has an advantage that it can detect most spyware if data of spyware is stored,
however, it loses effects if data of spyware is not stored.

Then, we propose a leakage detection system which is independent of a database. This system focuses on the leakage
caused by human action and malware. In an existing research, researchers calculate an edit distance between the last HTTP
request and the new HTTP request. The edit distance is much smaller than the number of characters because a lot of HTTP
requests have common characters. We can detect leakage easily because the information which is sent repeatedly is disregarded
and the new information which is sent suddenly is digitized and its value stands out. We propose and evaluate a technique
that uses not only the just previous HTTP request but further previous HTTP requests to further ignore unnecessary information.
Furthermore, we propose a system which raises an alert when it is in danger of information leakage.
When an abnormal value is detected in a continuous numerical value, this system judges that there is some possibility of leakage.
Assuming that certain quantity information is leaked, some of the detection rate is higher than 90%.

Keywords: HTTP, information leakage, edit distance, behavior based detection

*Corresponding author: Kyushu University, 744 Motooka, Nishi-ku, Fukuoka, 819-0395, Japan, Tel: +81-92-802-3666,

Email: hori@inf.kyushu-u.ac.jp, Web: http://itslab.inf.kyushu-u.ac.jp/~hori/index.html

Journal of Internet Services and Information Security (JISIS), 2(3/4): 18-28, November 2012 [pdf]