Electric guitar classification using CNNs

Debadri Sengupta
4 min readJun 13, 2020

Introduction-

GitHub link to project code: https://github.com/Debadri3/cnn-electric-guitar-classification

The electric guitar has been one of the most adore instruments since it was widely popularized by the likes of Hendrix, Van Halen and so forth.

Hendrix was one of the pioneers in popularizing the electric guitar

Likewise, two guitar manufacturing mammoths, namely, Fender and Gibson have produced several variants of the electric guitar with modifications over the years. This project is a multi-class classification problem using pre-trained Convolutional Neural Network Model from TensorFlow Hub where I attempt to classify 7 different solid-body models of the electric guitar.

The electric guitar models used as classes typically differ in body shape, pick-up positions, knob positioning and so on. We’ll see how much our model can figure these out.

For information about the types of electric guitars, check this link: https://ledgernote.com/columns/guitar-guru/types-of-electric-guitars/

Creating the dataset-

‘Bulk Image Downloader’ software was used to download dataset images. Only .jpg extension images were used while .png, .webp images were ignored.

This linked was referred to for creation of .csv labels file: https://www.datacamp.com/community/tutorials/datasets-for-images

However, the folder location need not be added before the image name as the images were uploaded and imported from my drive for the project. A separate ‘label’ column was also added to the .csv file.

Strings specifying the storage location of the images were used while pre-processing because it is easier to work with them rather than the images itself.

Processing the labels-

The ‘label’ column of the csv file was converted to a numpy array for processing.

An array containing only the unique labels was created. Then, we compare each data label with the unique label and store the corresponding Boolean array.

‘Train test split’ was applied to convert the dataset into training and valid datasets.

Pre-processing images-

The images presently were in the form of file-paths. And labels in the form of numpy arrays. We convert them to tensor, which is similar to an n-dimensional array, but can be easily computed by GPUs.

For this a function is built which reads the image in the file-path, converts it into a Tensor with 3 colour channels (RGB), normalizes the gray-scale value (0–255 to 0–1), and then converts it into our desired size (224*224 in our case, as it is the input shape expected by our future model) and returns it.

A portion of the training dataset images

Creating batch data-sets-

The function built for this can handle either of test, validation or training data. If it is training data, the batch is shuffled every time we call the function and then turns them into batches of 32 at a time for processing. According to Yann LeCun, a French pioneer of machine learning, that number is ideal as a batch size. [https://twitter.com/ylecun/status/989610208497360896?s=20]

Building the model-

Transfer learning iwas used by compiling a pre-trained model from TensorFlow Hub. The model falls under ImageNet/MobileNet V2 architecture and uses a depth multiplier of 1.40 for image classification. It requires an input shape of 224*224 which our images have already been converted to. The high depth multiplier implied a chance of overfitting; hence a Dropout layer was necessary.

The activation function used is ‘softmax’ which assigns a probability of the data point of belonging to each class. It is a multi-class version of ‘sigmoid’. The loss function I used is ‘categorical cross-entropy’. Without delving into the mathematics, lower the cross-entropy better the model. Metric used for evaluation is validation accuracy.

Creating callbacks-

Two callbacks- TensorBoard and Early Stopping were used. The TensorBoard is essentially able to save our model’s performance logs to a directory so that we can visualize them later.

The Early Stopping callback prevents overfitting of the model when the desired parameter stops improving or starts decreasing for the specified (‘patience’) number of rounds. In our case, it was 3.

Training the model-

Model was trained for maximum 100 epochs. Early stopping would stop training much before that.

Scoring the model-

A mean accuracy score of around 70-75 percent was attained, depending on each ‘restart and run’ on the validation set. It may differ slightly depending on the GPU etc.

Checking our predictions-

First we check our TensorBoard log. The loss decreases and then reaches a constant, while the accuracy increases and reaches a constant. The ‘predictions’ array consist of the probability of each data point belonging to each of the labels. A function is built to return the label having the maximum probability.

TensorBoard log for epoch accuracy of our model

Picturing our predictions-

We first unbatchify our data. And then plot few of the images against a probability distribution of the model’s predicted labels.

Probability distribution prediction

Conclusion:

Our model does a better than expected job in classifying the guitars. After inspecting few images, I see that, much like other guitar players, it got confused mainly between Super Strat and the Strat. Most other predictions were accurate.

--

--

Debadri Sengupta

Experimenter of Machine Learning in production and research. Deeply interested in MLOps.