DroidDetector: Android Malware Characterization and Detection Using Deep Learning
Smartphones and mobile tablets are rapidly becoming indispensable in daily life. Android has been the most popular mobile operating system since 2012. However, owing to the open nature of Android, countless malwares are hidden in a large number of benign apps in Android markets that seriously threaten Android security. Deep learning is a new area of machine learning research that has gained increasing attention in artificial intelligence. In this study, we propose to associate the features from the static analysis with features from dynamic analysis of Android apps and characterize malware using deep learning techniques. We implement an online deep-learning-based Android malware detection engine (DroidDetector) that can automatically detect whether an app is a malware or not. With thousands of Android apps, we thoroughly test DroidDetector and perform an indepth analysis on the features that deep learning essentially exploits to characterize malware. The results show that deep learning is suitable for characterizing Android malware and especially effective with the availability of more training data. DroidDetector can achieve 96.76% detection accuracy, which outperforms traditional machine learning techniques. An evaluation of ten popular anti-virus softwares demonstrates the urgency of advancing our capabilities in Android malware detection.
- Previous research has revealed that Android malware is rapidly evolving to circumvent signature based characterizations and thus calls for the development of next-generation anti-mobile-malware solutions.
- Android malware evidently cannot be adequately characterized using only specific patterns (signatures). In view of this situation, machine learning- based methods are being proposed to characterize Android malware that extract features by the static or dynamic analysis of Android apps and learn the distinctions between malware and benign apps automatically.
- In particular, these machine-learning-based methods can avoid the need to manually craft and update detection rules, which is crucial for keeping pace with the variety of Android malware.
DISADVANTAGES OF EXISTING SYSTEM:
- The main countermeasure to defense against malware on Android platforms is a risk communication mechanism that warns users about the permissions required before installing each app.
- In a previous study it is used to detect the presence of a malware by detecting the trend, not the rate, of the observed illegitimate scan traffic.
- The filter is used to separate malware traffic from background non malware scan traffic.
- In this study, our contributions include:
- We describe our development of a deep-learning-based Android malware detection engine (DroidDetector) that has been put online for user testing and can automatically detect whether an app is a malware or not.
- We thoroughly test DroidDetector and perform an in-depth analysis on the features that deep learning essentially exploits to characterize malware using association rule mining techniques.
- We conduct experiments on ten popular anti-virus softwares and reveal that they are extremely vulnerable to repackaging attacks. In the light of our analyses, we conclude that deep learning is a promising technique for Android malware detection.
ADVANTAGES OF PROPOSED SYSTEM:
- Our experiments also demonstrated that the deep learning model significantly outperforms traditional machine learning models.
- In our opinion, if a malware cannot be identified correctly, its malicious characteristics must not have been properly learned by the machine learning model. Note that any app defined as a malware must have some special characteristics that have been defined as malicious behaviors. Therefore, to characterize and detect more types of malware, more fine-grained features that can cover more aspects of malware must be collected.
- More types of training samples learned
- Feature Extraction
- Deep Learning Engine
- Features exploitation
To systematically characterize Android apps (i.e., both malware and benign apps), we conduct static and dynamic analyses to extract features from each app.. All the features fall under one of three types: required permissions, sensitive APIs, and dynamic behaviors. Among them, required permissions and sensitive APIs are extracted through the static analysis, whereas dynamic behaviors are extracted through dynamic analysis. Specifically, all we need is the installation file (i.e., apk file) of each Android app. In this way, we obtained few features for each app through static and dynamic analyses. Note that each feature is binary, indicating that when a feature occurs in an app, its feature value is 1; otherwise, its feature value is 0.
Deep Learning Engine
In this module, we develop the deep learning engine. Traditional machine learning models (e.g., SVM and C4.5) that have less than three layers of computation units are considered to have shallow architectures. Fortunately, deep learning models with a deep architecture change that situation. In practical use, a deep learning model can be constructed with different deep architectures, e.g., Deep Belief Networks (DBN) and convolutional neural networks. For this study, we chose DBN architecture to construct our deep learning model and characterize Android apps.
To validate the ability of the deep learning model to detect Android malware and make an in-depth analysis on the features that deep learning essentially exploits to characterize malware, we conducted experiments on public app sets. One benign app set was randomly crawled from the Google Play Store, which contains a large-scale of Apps. Although there might be a few malicious apps hidden among them, we regard all of them as benign apps. Another two malicious app sets were respectively collected from the Contagio Community. Several parameters need to be set when building deep learning networks, including the number of layers, number of neurons in each layer, contrastive divergence (CD-k) value, and number of iterations.
In this module, we develop the features exploitation, we conducted experiments on the app sets. We performed an in-depth analysis on the features exploited by deep learning to distinguish malicious and benign apps using association rule mining techniques. In these experiments, we consider that the analysis results only reflect trends in the feature differences between them and are not absolute distinctions in real-world situations. First, we examined the ten top-ranked features in either malicious or benign classes. The results show that they both have the same features.
- System : Pentium Dual Core.
- Hard Disk : 120 GB.
- Monitor : 15’’ LED
- Input Devices : Keyboard, Mouse
- Ram : 1 GB
- Operating system : Windows 7.
- Coding Language : Android,JAVA
- Toolkit : Android 2.3 ABOVE
- IDE : Eclipse
Zhenlong Yuan, Yongqiang Lu, and Yibo Xue, “DroidDetector: Android Malware Characterization and Detection Using Deep Learning”, IEEE Tsinghua Science and Technology, 2016