Data from your discs

A look at the music industry through the Discogs open dataset

Why music industry ?

Music has thoroughly evolved in the past decades and so did its industry. Genres appeared as well as new numerous labels that changed the music landscape. New technologies also changed the way music is consumed, both in the music support (numeric, MP3 …) and in the way it can be shared and discovered.
To grasp at the importance of the music industry, let us say that last year, it was worth 20 billion $ only in the U.S. Although online platforms such as Spotify are on the rise, records more popular each year, with a +14% increase in 2018 for vinyls in the U.S. Discogs is the number 1 online worldwide records selling platform.
If you wish to know more about the technical details of our work, check our Github repo here.

Why Discogs ?

We chose to use Discogs as our Dataset for the project. Discogs.com is an online records marketplace. It acts as a place for music collectors to buy/sell records, as well as track the evolution of the prices of releases they consider acquiring. Discogs is a social network as well, and users can connect to each other and share the list of their favorite records.
A useful feature of Discogs is its API along with a Python plugin that allows to easily browse and gather information about records, artists and labels from the database. Our aim is to download the latest data files (XML format) to create the skeleton of our dataset and then enrich it using the API or web-scraping to gather more relevant informations.

A global look at the evolution of the music industry

A simple exploration of our dataset gives us the number of Master releases each year. Each record in the Discogs database is represented by a unique Master, but can actually be attached to several Release entities. Each Release for a record is a commercially distributed copy of the record. It corresponds to a year of release, a country in which it has been released, a physical support and a label. If we take for instance, the fan-favorite classic Led Zeppelin I record : it is associated to one Master entity, but 603 different Releases in vinyl, compact disc or cassette over several labels and reissues.

The number of Master released per year is linearly increasing starting in the 1950s. This evolution is indeed the actual evolution of the number of records that were released each year. The music industry is in good shape as we see that the increase of record production isn't about to stop !
The number of Releases per Master is also increasing in the 1960s, with a downward trend during the 1990s. The average number of Releases per Master is obtained by counting the number of Releases for each Master each year, and dividing by the total number of Masters in this year. This evolution show a bump starting in the 1960s until the beginning of the 1980s. Two factors explain this :

  • here each Release is attached to its Master, and that is the year of the Master that prevails. This means that for our beloved Led Zeppelin I (this record is crazy good, I love it, go listen to it if you haven't already), all the 603 associated Releases count for the year 1968, that is the year of the first release. This means that older records are more likely to have more Releases associated than recent ones : they are more likely to have been reprinted since it was initially released.
  • another factor explains the downfall during the 1980s : in 1982, Philips in a collective effort with Sony, launched the first Compact Disc. It quickly became the physical support of choice, replacing progressively the vinyl. Between the launch of the Compact Disc, and the death of the vinyl format at the beginning of the 1990s, every record was released both in CD and vinyl, effectively raising the number of Releases for each record.

But we can say more about the evolution of Releases. Health of the music business is correlated with the number of Releases as well as with the general economic health of the publishing countries. U.S are the cultural force in the music industry for over 70 years now. From the 50s, the post-World War II economic expansion had a huge impact on consumerism, and thus on the music industry in the U.S. We can hypothesize that the dip in the 70s comes respectively from the oil crash in 1973 and the energy crisis in 1979. These events led to a period of recession, affecting all the markets, including the music one. One could argue that the second dip in the 2000s was caused by the internet as music piracy exploded (the famous peer-to-peer website Nasper was launched in 1999) leading to a decline in CD sales, thus again affecting the industry.

Music formats : from good old vinyl to digital files

To corroborate what we just said above, let's have a look at the evolution of records formats from the 1950s. As expected, the vinyl had a golden age until the beginning of the 1990s, its death mainly caused by the arrival of the CD. But WOW, we just said that the CD appeared in 1982, and we see that we have data for CD records even in the 1950s. Impressive. Let's stop joking : the CD Releases are attached to their Master. That is, if a record released in 1968, for instance Led Zeppelin I was to be released in CD during the 1980s, it would still appear as a released in 1968 in the database, as it is the year its corresponding Master was released.

This plot also confirms the popular idea of the vinyl comeback at the beginning of the 2010s : we see its proportion clearly increasing. All the more that since this plot is normalized by the number of Releases per year, so the absolute number of vinyl Releases is a greater increase than what it appears to on this plot.
Beware of the vinyl peak during the late 2010s though : this plot shows the production trends of formats, not the sales, and according to some sources, signals show that this trend could finishing off soon.

"So ... what kind of music are you into?"

Music genres evolved from the 1950s as trends are different nowadays than they were in the middle of the 20th century. Hard bop is not as popular now as it was in 1955, but you can listen to VaporTrap in 2019, which is not a full compensation for Hard Bop, but at least the name is cool. The big players in the room are Rock, Electronic and Pop. This is well known that those genres are pretty popular, and it is quite easy to explain why those take the place of most produced genres. Popular genres are profitable for music labels, and those labels push the production of records that sell.

Now let's take a look at time evolution of those genres. Many points are interesting to notice :

  • Jazz was by far the most popular genre until 1957, with Pop just behind. They are not the most popular genres overall because the records production before 1955 was five times lower than what it was in the 1980s. We see the maintain of Pop in the 1980s though, thanks to highly popular artists such as Michael Jackson and Prince with very popular records in Thriller in 1982 and Purple Rain in 1984. The same applies for Folk, a genre popular maintaining a mild popularity from the 1950s, as the influence of Bob Dylan and consorts can be seen during the 1960s on the records production of the whole genre. In general, music labels try to ride the wave of popular trends to sell records.
  • Funk is known to be at its creative peak in the 70s : Maggot Brain by Funkadelic in 1971, Head Hunters by Herbie Hancock in 1973 and Curtis and Superfly in 1970 and 1972 by Curtis Mayfield. Once again, those popular albums at the beginning of the 1970s had an influence on the production of this genre a few years later, with a peak in 1976.
  • Rock has a complex evolution, but we can draw some parallels with famous Rock records that influenced the genre. We can explain the first peak in 1958 as it is the beginning of the Rock popularity : Johnny B. Goode from Chuck Berry was released that year.
  • Electronic music is simpler to analyze : its popularity continually increased from its appearance. The house movement in Chicago during the 1980s helped popularizing the genre, and the appearance of techno and acid in the U.S and in the U.K in clubs led to the tide we know today. In this case, the explanation for the increasing popularity of the genre comes from the democratisation of production tools for electronic music : cheap synthesizers and drum machines helped young producers discover the genre and produce their own music.

We can go even further and explore the particular sub-genres. One intersting aspect of this division is the omnipresence of pop in sub-genres such as Pop-Rock in Rock and Synth-pop in Electronic. It makes sense, because Pop is the most popular genre (duh); seeing it "popping" in different sub-genres is not surprising. If we scroll through the decades, we can state that sub-genres evolved from vocal to timbre-oriented. As music recording technics progressed, with the advent of computers, synthesizers and digital audio workstations, music engineers could focus more on the character of sounds, thus making more interesting mixes. Music didn't become less vocal per se , but more complex than simple folky guitar songs (No offence M. Bob Dylan!).

Records labels : the art of the deal

How can we talk about the music industry without mentioning record labels? Money's not gonna make itself, right? The "big three" giants of the music indutry Warner Music Group (WMG), Universal Music Group (UMG) and, Sony Music Entertainment (SME) are worth 21 Billion Dollars altogether! Led Zeppelin I was released by Atlantic Records owned by ... you guessed it : UMG, on of the "big three". These companies control a huge majority of the music market.

Here are the top10 music labels from the number of master they own. All of these are owned by the "big three. Epic ? SME. Philips? UMG. RCA? The list goes on. At first, we could think that the music industry has a lot of variety by its number of labels. Unfortunately, a tremendous part of them are owned by only three corporations.

  • Notice Columbia doubling its share in '91 ... and CBS almost disappearing. Both of these labels are owned by SME who bought them in '88. SME decided to rename CBS to Columbia Records, as CBS was the international arm of Columbia.
  • Also, EMI was bought by WMG in 2013.
  • UMG owns the most labels here. (Philips, Capitol, Polydor, Decca)

Prices, notes, collection... a peek at Discogs' Community dimension

So far, we've discussed masters, releases, labels... But at the end of the chain, there is always a consumer. A consumer that wants the best mixtape, the latest vinyl record of her/his favorite label or this unfindable track that all of her/his friends are looking for. For that, she/he can also use Discogs where each user can keep track of the releases she/he owns by adding them to its « Collection ». One can also use its « Wantlist » for wanted releases. Finally, releases are being graded by users with a note between 0 and 5. Let us see what we can get from those data by looking at one specific genre: House music. The House community is often actively looking for new pressed records to expand their collection, thus we thought that it made sense to focus on this music genre.

Responsive image

On average, House releases sell for 5.3 CHF (on the second hand market), with actually 95% of the sales going for less than 17 CHF. The overall tendency is visible above. Given this very unbalanced repartition, it make more sense to actually consider the logarithm of the price and to focus on lower prices. This results in the figure below:

To help users go through the huge number of releases they can possibly buy from other users, notes are also available to them. So one question has to be asked ? Does good music means expensive music ? In other terms, will expensive records be better rated than cheap ones? Well it turns out they do!

Discogs users seems to be pretty generous when grading, since even the set of releases with the lowest note (releases selling between 1 CHF and 1.4 CHF) almost gets a note of 4. But still, an increasing trend is clearly visible

Finally, it is interesting to compare the "Collection" and the "Wantlist": For vinyl records (and for a lot of stuff in fact), price is highly correlated to scarcity, for instance records pressed in only a couple hundreds of copies will certainly sell at higher price than more common records. Added to this that expensive records are on average better graded, we are thus expecting relatively more expensive records in the Wantlist than in the Collection. Let us see if we were right by looking at the average distribution of the Collection (first figure) and the Wantlist (second figure):

Those looks very different indeed! First of all, the Wantlist is on average 8 times bigger than the Collection (in term of numbers of records). But what's more striking is the difference in the distribution's shape. For the Collection we have a concave curve maxing out at around 35 CHF whereas for the Wantlist, the curve explodes after a small dip... This difference embodies the harsh reality of the music lover, desiring more and more rare and expensive records but being stuck with average and low-costs ones ...

Conclusion

In our journey through the decades, we saw how music industry was shaped by different phenomena, like economical crashes, technology and new genres. The way we consume our favourite songs will be everchanging. Will streaming platforms completely annihilate physical copies or will vinyl come back stronger? Only time can tell. One thing's for sure : Led Zeppelin I will stay in our hearts.