Which Beatle Had The Most Musical Influence Within The Beatles?

9 min readOct 29, 2020

Using Convolutional Neural Networks And Scikit-Optimize To Predict Who Had The Largest Influence Within The Beatles

The Beatles’ London rooftop concert, 1969

Personal Background

Since the late summer of 2019, I have become a huge fan of The Beatles. It all started when I discovered the conspiracy theory that Paul is Dead and dove deep into the rabbit hole of The Beatles’ music and personal stories. Now, I don’t believe the conspiracy, but I am glad I looked into it as it was a great way to kill time on a train ride to New York City!
As a python enthusiast and a Master’s student in Applied Data Science, I am always trying to think of new projects to work on to build both my data science and presentation skills. One day I was listening to the song “Something” off of the Abbey Road album and it popped into my head… “Which Beatle had the heaviest influence on the Beatles discography?” I decided to approach this project by using each Beatle’s post-Beatles discography as the training and validation data for the model to isolate each artist’s musical voice. The entire Beatles discography would be the data that I would use to predict who had the most influence. I knew that this project would be a major challenge for a few reasons.

One, collecting the audio data for each song in the discography of each artist, a post-Beatles breakup would not come easily through an API. This would be a challenge because the data would take additional processing and a good chunk of computer memory.
Two, I had never worked with audio data before, so this type of modeling would be different than what I have done in the past.
Three, analyzing the accuracy of the results when there was no known answer to who had the most musical influence on each and every Beatles song.

Ultimately, this project may lead to some interesting insights about the band members.

Data Collection

Audio data for music can be hard to collect as most artist’s music is licensed to companies such as Spotify, Apple Music, Pandora, etc. To get each song needed for this project, YouTube was going to be the best source. youtube-dl was created to download videos from YouTube and has a python package built into it for extracting videos in python scripts. Using this package, I created two functions to help download all the songs in the album playlists and save them in the proper directory.

To get the list of each artist’s albums released after the Beatles’ last day in the recording studio together, I used Wikipedia. When getting the playlist links for each album, I made sure to check the songs on the playlist against the tracklist for the album on Wikipedia. Since downloading all the songs for an artist would take about an hour or more, I decided to create individual download scripts for each artist so I could run one script and build the download script for another at the same time.
All of the download scripts can be found at the GitHub link at the bottom of the article and titled “download_[artist]_albs.py”. Here’s an example:

Mel Spectrogram image of the song “A Day In The Life” off Sgt. Pepper’s Lonely Hearts Club

Data Processing

Each song was either saved down as a .mp4, .webm, or .mkv file, but this wasn’t going to matter as I could convert the files into a .wav file using a function that leverages the AudioSegment module of the pydub package. I also wrote a helper script that walks through each artist’s directory and converts the files, while using multiprocessing’s Pool class to run the tasks in parallel.

To transform the data into an image for classification in our CNN, we can use the librosa package, which works very well with audio files and has some built-in functions for running short-time Fourier transform and mel-scaled spectrogram. We will use all of the aforementioned functions for converting our audio files into images that we can use in our image classification neural network. To create the images from the .wav files, I’ll create two plotting functions to load and plot the sound frequency of the audio:

Then, leverage multiprocessing’s Pool class again to speed up the processing of the images, and save to the respective training and test folders:

Now that we have our images via stft and mel-scaled spectrogram processes, we can see which process helps us better train the model.

Short-Time Fourier Transform of “Working Class Hero” by John Lennon

Model Training

To begin training a CNN to classify which song’s image belongs to which artist, we first need to load our training and validation data. Keras provides an ImageDataGenerator class that will allow us to use a method called flow_from_dataframe, to load images in batches based on the file and artist names from a pandas DataFrame object. Since I planned on multiple iterations of model testing, I decided to create a function that creates either the training and validation or test sets of images to use as input to my model:

Now it is time to create the convolutional neural network using Keras. To be able to tweak the model to have more convolutional layers, different convolutional activation functions, dropout rates, optimizers, etc. quickly, I created a function. This function builds and compiles a two-dimensional convolutional neural network, and even allows you to pass parameters to the optimizer. The benefit of creating this function, outside of the build speed, is the ability to use it in an optimizer to find the best hyperparameters of a CNN.

So there are now functions to retrieve the training and testing images and build our classification model. The next step now involves passing the training data to our model and find the best hyperparameters of the model. To attempt to find the hyperparameters as fast as possible for the neural network, we will use scikit-optimize:

Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements several methods for sequential model-based optimization. skopt aims to be accessible and easy to use in many contexts.

To find the parameters such as batch size, epochs, learning rate, image size, etc., we will use scikit-optimize’s gp_minimize function, which attempts to minimize a function that is passed to it by applying Bayesian optimization using Gaussian processes. Since gp_minimize needs a function to minimize, a function named fitness will be created to create the CNN model and training and validation data and then train the neural network. The function will also test if this version has the highest validation accuracy compared to the previously trained versions, and save the model. You will notice that this function returns the negated validation accuracy since gp_minimize is attempting to minimize the fitness function.

Now it's time to put it all together and train and tune the CNN. gp_minimize will run 40 times to attempt to find the best fit of hyperparameters.

After running multiple iterations of this code with both the stft and mel spectrogram images, the best validation accuracy found was 67.33% with stft and 69.33% with mel spectrogram. The hyperparameters used with the best mel spectrogram model were as follows:

Learning Rate: 1.0
Epochs: 79
Batch Size: 16
Image dimensions: (256, 256)
Convolutional Layers: 5

Next, it is time to predict which artist had the most influence on a particular Beatles song.

Interpreting the Results

First, I load the best model and make predictions on the Beatles data:

from model_staging import fetch_images_dataframe
import pandas as pd
import numpy as np
import kerastrain_df = pd.read_csv("train_df.csv")
test_df = pd.read_csv("test_df.csv")train_path = "C://Users//Alec//MyPython//Beatles/train_melspec"
test_path = "C://Users//Alec//MyPython//Beatles/test_melspec"# load the Beatles data to predict on
test_gen = fetch_images_dataframe(test_df, x_col="song", 
                                  y_col="artist", 
                                  directory=test_path,
                                  batch_size=16, 
                                  target_size=(256, 256), 
                                  class_mode="categorical", 
                                  shuffle=False, seed=1, 
                                  save_format="png")
# load the modelbest_model_adam = keras.models.load_model("models/melspec/skopt_best_adamV3.h5")# predict the probabilities for each song
probabilities = best_model_adam.predict_generator(test_gen)# get the prediction based on largest probability
preds = np.argmax(probabilities, axis=1)# load in the training data to get the artist indicies
train_gen, valid_gen = fetch_images_dataframe(train_df, 
                                              x_col="song", 
                                              y_col="artist", 
                                              directory=train_path,
                                              batch_size=16, 
                                              target_size=(256, 256), 
                                              class_mode="categorical", 
                                              shuffle=True, seed=1, 
                                              validation_split=0.2, 
                                              save_format="png")
class_map = train_gen.class_indices# create a dataframe of songs and predictions
pred_df = pd.DataFrame(data={"songs": test_gen.filenames,
                                   "predictions": preds})# now convert the prediction column to the artist name
mapping = {v:k for k,v in class_map.items()}
pred_df["predictions"] = pred_df["predictions"].map(mapping)# merge the pred_df with the test_df in order to bring the album
# name in for each song
pred_df = pred_df.merge(test_df[["album", "song"]], left_on="songs",right_on="song")
pred_df.drop("song", axis=1, inplace=True)# join the prediction probabilities with the prediction dataframe
pred_df = pred_df.join(pd.DataFrame(probabilities))
pred_df.rename(mapping, axis=1, inplace=True)
pred_df[['Lennon', 'harrison', 'mccartney', 'starr']] = pred_df[['Lennon', 'harrison', 'mccartney', 'starr']].round(4)

Let’s take a look at the total predictions per artist:

Lennon: 28
Harrison: 21
McCartney: 120
Starr: 47

The model seems to be highly favoring Paul McCartney as the main influence in the discography. It’s also very surprising to see John Lennon so low and Ringo Starr being predicted for almost 50 Beatles songs. This goes wildly against the mainstream knowledge that Lennon was as big of an influence as McCartney, and that Ringo Starr was a rhythm drummer that could have been replaced with any old drummer. Let’s also take a look at the breakdown per album.

Revolver and Magical Mystery Tour seem to be the most evenly distributed albums, with Mccartney dominating the White Album, Abbey Road, Sgt. Pepper’s Lonely Hearts Club, and Yellow Submarine. Let’s see if there are any interesting trends of influence throughout the years.

Ringo is predicted to have the most influence in 1963 over the course of the Please Please Me and With The Beatles. George Harrison leads the year 1966 when the Beatles only released the Revolver album. Outside of those years, McCartney dominates the remaining years, especially the last four years of the group’s existence. It’s widely known that the rest of the band, especially John Lennon, were getting fed up with Paul’s stranglehold over the songwriting towards the end of the band’s tenure, which ultimately led to The Beatles breaking up. It is interesting to see the model reflects his influence in those years.

Conclusion and Lessons Learned

Overall, it is interesting to see Paul McCartney dominate the influence, and Ringo Starr is an easy second influence over Lennon and Harrison. I would have to think that this is due to the CNN overfitting to the training data as McCartney and Starr made up 39% and 30% of the training and validation data. This is due to their prolonged careers since they are still both alive as of October 2020. It would be interesting to see how one could normalize the class weights in the model training to ensure the CNN does not overfit the skewed training data.

I enjoyed working with audio data for the first time and having to figure out new methods of acquiring and processing the data for model training. Scikit-optimize was also a great help in model training and will definitely be used in the future in favor of scikit-learn’s grid search method.

Please share any methods I could have used in data processing or model building to improve the predictions on The Beatles songs!

All code can be found on my GitHub. Please reach out to me on LinkedIn if you would like to connect!