Music has thoroughly evolved in the past decades and so did its industry. Genres appeared as well as
new numerous labels that changed the music landscape. New technologies also changed the way music is
consumed, both in the music support (numeric, MP3 …) and in the way it can be shared and discovered.
To grasp at the importance of the music industry, let us say that last year, it was worth 20
billion $ only in the U.S. Although online platforms such as Spotify are on the rise, records
more popular
each year, with a +14% increase in 2018 for vinyls in the U.S. Discogs is the number 1
online worldwide records selling platform.
If you wish to know more about the technical details of our work, check our Github repo here.
We chose to use Discogs as our Dataset for the project.
Discogs.com is an online records marketplace. It acts as a place for music collectors to buy/sell
records, as well as track the evolution of the prices of releases they consider acquiring. Discogs
is a social network as well, and users can connect to each other and share the list of their
favorite records.
A useful feature of Discogs is its API along with a Python plugin that allows to easily browse and
gather information about records, artists and labels from the database. Our aim is to download the
latest data files (XML format) to create the skeleton of our dataset and then enrich it using
the
API or web-scraping to gather more relevant informations.
A simple exploration of our dataset gives us the number of Master releases each year. Each record in the Discogs database is represented by a unique Master, but can actually be attached to several Release entities. Each Release for a record is a commercially distributed copy of the record. It corresponds to a year of release, a country in which it has been released, a physical support and a label. If we take for instance, the fan-favorite classic Led Zeppelin I record : it is associated to one Master entity, but 603 different Releases in vinyl, compact disc or cassette over several labels and reissues.
The number of Master released per year is linearly increasing starting in the 1950s.
This evolution is indeed the actual evolution of the number of records that were released each
year. The music industry is in good shape as we see that the increase of record production isn't
about to stop !
The number of Releases per Master is also increasing in the 1960s, with a downward trend during
the
1990s. The average number of Releases per Master is obtained by counting the number of Releases
for each Master each year, and dividing by the total number of Masters in this year. This
evolution show a bump starting in the 1960s until the beginning of the 1980s. Two factors explain
this :
But we can say more about the evolution of Releases. Health of the music business is correlated with the number of Releases as well as with the general economic health of the publishing countries. U.S are the cultural force in the music industry for over 70 years now. From the 50s, the post-World War II economic expansion had a huge impact on consumerism, and thus on the music industry in the U.S. We can hypothesize that the dip in the 70s comes respectively from the oil crash in 1973 and the energy crisis in 1979. These events led to a period of recession, affecting all the markets, including the music one. One could argue that the second dip in the 2000s was caused by the internet as music piracy exploded (the famous peer-to-peer website Nasper was launched in 1999) leading to a decline in CD sales, thus again affecting the industry.
To corroborate what we just said above, let's have a look at the evolution of records formats from the 1950s. As expected, the vinyl had a golden age until the beginning of the 1990s, its death mainly caused by the arrival of the CD. But WOW, we just said that the CD appeared in 1982, and we see that we have data for CD records even in the 1950s. Impressive. Let's stop joking : the CD Releases are attached to their Master. That is, if a record released in 1968, for instance Led Zeppelin I was to be released in CD during the 1980s, it would still appear as a released in 1968 in the database, as it is the year its corresponding Master was released.
This plot also confirms the popular idea of the vinyl comeback at the beginning of the 2010s
: we see its proportion clearly increasing. All the more that since this plot is normalized by the
number of Releases per year, so the absolute number of vinyl Releases is a greater increase than
what it appears to on this plot.
Beware of the vinyl peak during the late 2010s though : this plot shows the production trends of
formats, not the sales, and according to some
sources,
signals show that this trend could
finishing off soon.
Music genres evolved from the 1950s as trends are different nowadays than they were in the middle of the 20th century. Hard bop is not as popular now as it was in 1955, but you can listen to VaporTrap in 2019, which is not a full compensation for Hard Bop, but at least the name is cool. The big players in the room are Rock, Electronic and Pop. This is well known that those genres are pretty popular, and it is quite easy to explain why those take the place of most produced genres. Popular genres are profitable for music labels, and those labels push the production of records that sell.
Now let's take a look at time evolution of those genres. Many points are interesting to notice :
We can go even further and explore the particular sub-genres. One intersting aspect of this division is the omnipresence of pop in sub-genres such as Pop-Rock in Rock and Synth-pop in Electronic. It makes sense, because Pop is the most popular genre (duh); seeing it "popping" in different sub-genres is not surprising. If we scroll through the decades, we can state that sub-genres evolved from vocal to timbre-oriented. As music recording technics progressed, with the advent of computers, synthesizers and digital audio workstations, music engineers could focus more on the character of sounds, thus making more interesting mixes. Music didn't become less vocal per se , but more complex than simple folky guitar songs (No offence M. Bob Dylan!).
How can we talk about the music industry without mentioning record labels? Money's not gonna make itself, right? The "big three" giants of the music indutry Warner Music Group (WMG), Universal Music Group (UMG) and, Sony Music Entertainment (SME) are worth 21 Billion Dollars altogether! Led Zeppelin I was released by Atlantic Records owned by ... you guessed it : UMG, on of the "big three". These companies control a huge majority of the music market.
Here are the top10 music labels from the number of master they own. All of these are owned by the "big three. Epic ? SME. Philips? UMG. RCA? The list goes on. At first, we could think that the music industry has a lot of variety by its number of labels. Unfortunately, a tremendous part of them are owned by only three corporations.
So far, we've discussed masters, releases, labels... But at the end of the chain, there is always a consumer. A consumer that wants the best mixtape, the latest vinyl record of her/his favorite label or this unfindable track that all of her/his friends are looking for. For that, she/he can also use Discogs where each user can keep track of the releases she/he owns by adding them to its « Collection ». One can also use its « Wantlist » for wanted releases. Finally, releases are being graded by users with a note between 0 and 5. Let us see what we can get from those data by looking at one specific genre: House music. The House community is often actively looking for new pressed records to expand their collection, thus we thought that it made sense to focus on this music genre.
On average, House releases sell for 5.3 CHF (on the second hand market), with actually 95% of the sales going for less than 17 CHF. The overall tendency is visible above. Given this very unbalanced repartition, it make more sense to actually consider the logarithm of the price and to focus on lower prices. This results in the figure below:
To help users go through the huge number of releases they can possibly buy from other users, notes are also available to them. So one question has to be asked ? Does good music means expensive music ? In other terms, will expensive records be better rated than cheap ones? Well it turns out they do!
Discogs users seems to be pretty generous when grading, since even the set of releases with the lowest note (releases selling between 1 CHF and 1.4 CHF) almost gets a note of 4. But still, an increasing trend is clearly visible
Finally, it is interesting to compare the "Collection" and the "Wantlist": For vinyl records (and for a lot of stuff in fact), price is highly correlated to scarcity, for instance records pressed in only a couple hundreds of copies will certainly sell at higher price than more common records. Added to this that expensive records are on average better graded, we are thus expecting relatively more expensive records in the Wantlist than in the Collection. Let us see if we were right by looking at the average distribution of the Collection (first figure) and the Wantlist (second figure):
Those looks very different indeed! First of all, the Wantlist is on average 8 times bigger than the Collection (in term of numbers of records). But what's more striking is the difference in the distribution's shape. For the Collection we have a concave curve maxing out at around 35 CHF whereas for the Wantlist, the curve explodes after a small dip... This difference embodies the harsh reality of the music lover, desiring more and more rare and expensive records but being stuck with average and low-costs ones ...
In our journey through the decades, we saw how music industry was shaped by different phenomena, like economical crashes, technology and new genres. The way we consume our favourite songs will be everchanging. Will streaming platforms completely annihilate physical copies or will vinyl come back stronger? Only time can tell. One thing's for sure : Led Zeppelin I will stay in our hearts.