Final Year Project

ByteSorcerer - Malware Attributor

About this Project

ByteSorcerer is a web application that uses machine learning to Perform Malware Attribution which identify the family of malware based on extracted features from the malware. The application is developed using Python and Flask, and uses Sci-kit learn to develop the machine learning model. The application is deployed on Render, a cloud platform that allows for easy deployment of web applications.

My Role and Solutions

The Final Year Project involves 2 members, I was mainly focusing on the research, development of the feature extraction tool, machine learning models and the web application. Meanwhile, my team member was focusing her own part of the solution which is to use Control Flow Graphs to perform Malware Attribution. This project here will be showing my part of the work.

Work Process

Planning Phase

Research process was done by reading up research papers and articles on performing Malware Attribution using machine learning. The research was done to understand the different methods and techniques that are used to perform Malware Attribution, and to identify the best approach to use for the project. It also allows me to understand the different features that can be extracted from the malware, and identify the best features to use for the machine learning model.

Feature Extraction Tool Development

Next, a feature extraction tool was developed to extract features from the a 60gb sample of malware from different families. The tool was developed using Python and the pefile library, which allows for easy extraction of features from the malware. The tool was able to extract important features from the malware, which were then pre-processed and used as input for the machine learning model.

Model Development

Sci-kit learn was used to develop the machine learning model. 5 Models (Logistic Regression, Random Forest, Support Vector Machine, Naive Bayes and MLP) were trained and tested, and the best model was selected based on the accuracy and K Fold Cross Validation.

The model was able to achieve an accuracy of 76% on the test data, and was then saved and used for the web application.

Web Application Development

The web application was developed using Flask, a micro web framework for Python. The application was deployed OnRender, a cloud platform that allows for easy deployment of web applications. The application was developed to be user friendly and easy to use, with a simple and clean user interface that is responsive and work on all devices.

Outcome

Overall, the project was a success as we were able to deliver the project on time with all the features implemented. In the future, we plan to improve the accuracy of the model by using more advanced machine learning techniques and algorithms.

View the application View Source Code (GitHub)