Planning Phase
Research process was done by reading up research papers and articles on performing Malware Attribution using machine learning. The research was done to understand the different methods and techniques that are used to perform Malware Attribution, and to identify the best approach to use for the project. It also allows me to understand the different features that can be extracted from the malware, and identify the best features to use for the machine learning model.
Feature Extraction Tool Development
Next, a feature extraction tool was developed to extract features from the a 60gb sample of malware from different families. The tool was developed using Python and the pefile library, which allows for easy extraction of features from the malware. The tool was able to extract important features from the malware, which were then pre-processed and used as input for the machine learning model.
Model Development
Sci-kit learn was used to develop the machine learning model. 5 Models (Logistic Regression, Random Forest, Support Vector Machine, Naive Bayes and MLP) were trained and tested, and the best model was selected based on the accuracy and K Fold Cross Validation.
The model was able to achieve an accuracy of 76% on the test data, and was then saved and used for the web application.
Web Application Development
The web application was developed using Flask, a micro web framework for Python. The application was deployed OnRender, a cloud platform that allows for easy deployment of web applications. The application was developed to be user friendly and easy to use, with a simple and clean user interface that is responsive and work on all devices.