PatientZero July 17, 2019 at 10:44

Do different hit songs have something in common?

Transfer

If you log in to Spotify.me , you can get a personalized summary of how Spotify understands you through the music you listen to on this Spotify site. That's cool!

I listen to a lot of music and enjoy working with data, so it inspired me to try to analyze my music collection.

I was very curious if there were any specific ingredients that made up the hit songs. What makes them cool? Why do we like hits, and do they have a certain “DNA”?

Task

This led me to try to answer two questions using Spotify data:

What does my music playlist look like?
Are there specific audio attributes common to all hit songs?

Instruments

Fortunately, there are very simple tools to connect to Spotify, receive data, and then visualize it.

We will work with the Python 3 programming language , the Spotipy Python library , which allows you to connect to the Spotify Web API , and we will visualize the data using plot.ly and Seaborn .

Data array

At the end of each code, Spotify compiles a playlist of the 100 most played songs. The dataset I used is already uploaded to Kaggle: Top Spotify Tracks of 2018 . The list of the 100 most popular songs with Spotify seems like a reasonable amount of data for studying hits, isn't it?

Let's get started!

To get started, you need to create an account on developer.spotify.com . After that, you can directly access the Spotify Web API Console and start exploring the various API endpoints.

Note : the link to the code that I used for the project is at the end of the post.

After connecting to the Spotify Web API, we will create a Spotify object using the Spotipy Python Library, which we will then use to send requests to the Spotify endpoint.

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy import util
cid =”Your-client-ID” 
secret = “Your-Secret”
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret) 
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Analyzing My Playlist Data

This is one of the most important steps in data science. Our goal here is to understand the type of music in my playlist, extract any interesting observations and compare them with the audio characteristics of hundreds of popular songs of 2018.

Performer Frequency Graph

How often artists appear in my playlist Having looked at this histogram, I can understand how often artists appear in one of my playlists.

Audio specifications

Now let's take a look at the audio features of the songs from the playlist. Spotify has compiled a list of audio specifications for each track on Spotify! Here is a brief description of the characteristics that we will use in this article:

Instrumentalness : predicts that there is no vocals in the track. In this context, the sounds “oh” and “aaaa” are considered instrumental. Rap or tracks with words are obviously “vocal”. The closer the instrumental value is to 1.0, the higher the likelihood that the track does not contain a voice.

Energy: this is a value in the range from 0.0 to 1.0, characterizing the criterion for perceiving the "brightness" and "activity" of a song. Usually energetic compositions are fast, loud and noisy. For example, death metal has high energy, and Bach's prelude has low indicators on this scale.

Acousticness : A measure of confidence that a composition is acoustic in the range of 0.0 to 1.0. A value of 1.0 means high confidence that the composition is acoustic.

Liveness : Recognizes the presence of listeners in a recording. The higher the liveness value, the higher the likelihood that the song was performed live. A value above 0.8 gives serious confidence in live performance.

Speechiness (text): Speechiness detects the presence of text in a song. If the speechiness of the composition is above 0.66, then it most likely consists of text, a value from 0.33 to 0.66 means that the song can contain both music and words, and a value below 0.33 means that there are no words in the song.

Danceability : Danceability describes the suitability of a dance track based on musical elements such as tempo, rhythm stability, beat strength and general constancy. Compositions with a value of 0.0 are the least danceable, with a value of 1.0 are the most danceable.

Valence: this value in the range from 0.0 to 1.0 describes the musical positivity conveyed by the song. Songs with high valence sound more positive (i.e. they convey happiness, joy or euphoria), and songs with low valence sound more negative (i.e. they are sad, depressed or angry).

Distribution of musical styles in my playlist

Observation results:

Most of the songs in my playlist have a wide distribution of dance and there are not many “happy” songs in it, as can be seen from the high frequency of songs with values below 0.5 valence . Therefore, we can say that I like songs that you can dance to (and that's true!)
There is a steep descent on the chart for speechiness, instrumentalness and a bit of liveness . This tells us that in music from my playlist there are rarely words, there are few instrumental compositions and songs performed live.
Acousticness is distributed approximately evenly between 0 and 1, that is, in this attribute I have no preferences. (I usually like acoustic songs, but I would not look for acoustic covers for each song).
Finally, energy is distributed normally and has small tails at both ends, which means less chance of being added to my playlist. That is, in fact, I like compositions with medium energy.
My compositions are not so popular

2018 Top 100 Songs Data Analysis

After downloading and importing the data array from Kaggle into my application, I started by analyzing the most popular artists, determined by the number of hits on this list.

Performers in the Top 100 Songs of 2018 in frequency

Performers most often found in the Top 100 Songs of 2018

Code snippet

Post Malone and XXXTENTACION

Now, let's examine the audio characteristics of the hundred most popular songs in our dataset and see how they look! We’ll create the same histogram as for my playlist so that you can compare them later.

The distribution of musical styles in the top 100 songs 2018

Looking at the histogram, we can see that the top 100 songs have the following characteristics:

Very high dancing and energy, but low probability of live performance , the presence of text and acoustics (we already see some signs that my playlist is not as cool as the top 100).

For example, the song “In My Feelings” by Drake from our dataset has a high danceable and relatively high energy value.

Finally, I decided to create a petal diagram of the top 100 songs and lay on it the characteristics of the audio in my playlist.

Top 100 songs are shown in blue and my songs are shown in orange.

Conclusion

So it looks like I have answers to both questions from the beginning of the post. I managed to see how my music looks and I found the DNA of hit songs. The characteristics of the audio from my playlist are a bit like the top 100, but I have more acoustic songs and less live performance.

Want to write a hit song? Make her dance, with great energy and a bit of positivity.

I am pleased with the results, but I want to continue the research.

The code for the entire project is posted on GitHub .

Here's what I recommend to do next :

Learn how you can use your playlist to determine your personal preferences and recommended ads that you might like.
Use the K-Secondary Machine Learning Clustering Algorithm to find out which songs are similar to yours. So you can search for new songs that you may like.
Use machine learning to predict the "popularity" of songs based on their audio characteristics

Thanks to Alvin Chun , Ashrith and John Koch for helpful articles on this topic. Spotify and Spotipy, thanks for the awesome API and library!

Tags: