✨ FYI

Black Friday crash post-mortem

Jonathan Williamson
Jonathan Williamson
Hey everyone,

As you may already know we experienced a near total meltdown of the Blender Market server yesterday during Black Friday.

Starting at 9:50am Chicago until approximately 2:40pm the server was unreachable for most people and barely chugging along for everyone else. Response time (the time to finish loading a requested page) went from 0.26 seconds to to 0.51 minutes - a 118x increase. And that's just for the pages that actually loaded, most failed outright.

There were several reasons for this:
  1. 267% increase in traffic compared to the previous Friday
  2. 105% increase in Affiliate visits
  3. Missing index on the Affiliate Visits guid column
  4. Missing index on the User Roles user_id column
The first two issues are not that substantial, or shouldn't have been. Increased traffic and affiliate visits caused a problem because of the two missing indexes when combined with a slew of small optimization issues across the board.

The problem was this: each time a user is identified on the site we check the database for their permissions via UserRoles, checking if which permissions they have. Each check to the database requires the user_id and that column was not indexed (sort like missing a map to the data) resulting in the query taking dramatically longer than it should have. We check for these permissions on every single page load (bad bad bad) as we weren't properly caching these results.

Additionally, each time a purchase was made and commissions generated we would also do a lookup to check for Affiliate's that should earn a referral. This was done by looking up a previously stored Affiliate Visit by it's GUID (globally unique identifier). This lookup was missing the index on the GUID, resulting again, in dramatically slower queries.

The combined outcome of this was a slew of ultra slow database queries that started to bog down the server and ultimately cascaded into complete meltdown. The server and database simply couldn't process the queries before running out of memory and crashing. 

Each of these issues have been fixed and we're continuing to work on additional optimizations. Query times are still much slower than we'd like in a lot of areas and we're still seeing the occasional crash due to memory maxing out, but at least it's functional for the vast majority of people again. 



TLDR; the server got slammed with heavy traffic that created a cascading database slowdown due to missing indexes on two important database tables. We fixed the tables and optimized a couple other things to get back online after roughly four hours.

It's hard to quantify how much revenue was missed out during this outage for our Creators and ourselves, but it's substantial. Yesterday beat last years Black Friday by 77.56%, processing $118,389.03 in a single day 🤯

Due to the downtime we've chosen to extend the sale another 24 hours until the end of Tuesday.




This outage was also a good reminder to ensure each us is available and on-call during seasonal sales like this in the case of emergency. Going forward I will try and do better to schedule everyone and set expectations for who is needed and what needs to be done.

I would also like to give a shoutout to
Nick Haskins Nick
for recognizing the problem early on and providing some direction on where to start looking for the root of the problem (literally from the side of the road en route to SoCal),
Matthew Muldoon Matthew
for handling an onslaught of support and questions, and
Rom Robotnik Rom
for implementing some quick optimizations to keep Blender Market running.