Planning Phase
We decided to narrow down the scope of the project to only focus on 4 birds, the Javan Myna, Black-naped Oriole, Collared Kingfisher and Little Egret. We decided to do this as we wanted to ensure that we are able to deliver the project on time and to ensure that the machine learning model is able to identify the birds accurately. We plan to expand the number of birds in the future.
Data Collection & Pre-Processing
The collection process was done by scraping the internet for textual descriptions of the birds using the BeautifulSoup library in Python. This is done by scraping the description and of search results and the metadata of each site from search engines Google, Yahoo, AOL and Brave. We manage to 150-200 descriptions for each bird species.
The data is split into train and test, then pre-processed by removing any special characters and any stop words that may be present in the data. The data is then tokenized and converted into a TF-IDF matrix to be used as input for the machine learning model.
Model Development
Sci-kit learn was used to develop the machine learning model. 5 Models (Logistic Regression, Random Forest, Support Vector Machine, Naive Bayes and MLP) were trained and tested, and the best model was selected based on the accuracy and K Fold Cross Validation. An ensemble model was also created by combining the best models of Logistic Regression, Naive Bayes and MLP to improve the accuracy and robustness of the model.
The model was able to achieve an accuracy of 95% on the test data, and was then saved and used for the web application.
Web Application Development
The web application was developed using Flask, a micro web framework for Python. The application was deployed OnRender, a cloud platform that allows for easy deployment of web applications. The application was developed to be user friendly and easy to use, with a simple and clean user interface that is responsive and work on all devices.