A Framework for Identifying Obfuscation Techniques applied to

Android Apps using Machine Learning

Minjae Park
1, Geunha You1, Seong-je Cho1, Minkyu Park2 and Sangchul Han2+

1Dankook University, Yongin, Korea

{parkminjae, geunhayou, sjcho}@dankook.ac.kr

2Konkuk University, Chungju, Korea

{minkyup, schan}@kku.ac.kr

 

Abstract

Malicious app writers tend to employ code obfuscation techniques to prevent their malicious code from being easily reverse engineered and analyzed. In order to effectively analyze malicious Android apps, it is necessary to identify what code obfuscation technique is applied to the malicious apps. Existing studies have devised some approaches that identify app-level obfuscation. However, recent obfuscators can apply different obfuscation techniques on a class-by-class basis not on an app basis. In such a case, app-level obfuscation identification may be ineffective. In this paper, we propose a new framework to identify a class-level obfuscation technique used in Android apps. The proposed framework vectorizes the decompiled codes of each class of Android apps using a paragraph vector. Then the output vectors are fed to machine learning classifier to identify what obfuscation technique is applied to each class. We use four machine learning classifiers: Random Forest, AdaBoost, Extra Trees, and Linear SVM, and compare the performance of the classifiers for each obfuscation technique.

Keywords: Android app, Obfuscation technique, Class-level obfuscation, Machine learning.

 

+: Corresponding author: Sangchul Han

Department of Software Technology, Konkuk University, 268 Chungwondaero, Chungju-si, Chungcheongbuk-do, 27478, Korea, Tel: +82-43-840-3605

 

Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA)

Vol. 10, No. 4, pp.22-30, December 2019 [pdf]

Received: November 1, 2019; Accepted: December 7, 2019; Published: December 31, 2019
DOI: 10.22667/JOWUA.2019.12
.31.022