Last month I gave a talk at Velocity Conf NY with Ben Hughes titled Scaling Things That Don’t Scale: Scalability and Reliability at Airbnb. The slides and a video of the talk are both available online.
I recently spent a few weeks at work focusing on database performance. Specifically, I was looking to reduce load on our database by rewriting application code and queries to be more efficient. Some of the easiest wins were in code that counted things.
Counting is deceptively simple. Even a five year old can count — how hard could it be? The difficulty lies in counting efficiently. In a new application, you can often get away with counting inefficiently because your data set is small and database load is minimal. In a more mature application, there is more data, the database is busier, and there are outliers. Perhaps users on your blogging platform have five posts on average, but a few of your most dedicated users may have thousands of posts. This makes performance problems difficult to spot in development or even production. Ironically, your most dedicated users are the ones affected the most!
After working through a few inefficient queries, here are a few patterns that emerged. The examples below are for a Rails 3 app using ActiveRecord on MySQL, but the same SQL principles should apply to other languages and frameworks.
Today I open sourced hammerspace, a project that I’ve been working on for the last few months. The accompanying blog post explains the context and motivation. For me, there are three important takeaways.
First, make sure you know what your application is doing. We knew that application startup was slow, and that loading translations over the network was probably expensive, and that all of those translations probably took up a decent chunk of memory. But we never actually measured these things, so we never understood how slow or expensive or large these things were. And we didn’t anticipate how all of these things conspired together to cause such severe performance degradation.
Second, not all “memory” is equal. A lot of the comments on Hacker News suggested shared memory, or mmap’d files, or memcache or redis. Prior to hammerspace we were using the heap, and after hammerspace we are using the filesystem cache. These are all effectively “in memory” solutions, but they vary wildly in ease of implementation and speed of access from within ruby. Don’t just assume that because you’re keeping data “in memory” that it will automatically be fast.
Third, your time is not always best spent writing code. My co-worker and I joke that we don’t write more than 10 lines of code a week, and there’s some truth to it — at time of release, hammerspace is only 736 lines of production code, or 528 lines if you exclude whitespace and comments. And yet the impact was huge — 66% reduction in in-request garbage collection time, and a 17% reduction in application server response time overall. The real time and effort was spent hunting down the cause of the problem, evaluating potential solutions scientifically, deploying increasingly invasive experiments to production, collecting and analyzing data, iteratively refining the design, generalizing the solution, writing an extensive test suite, documenting everything, evangelizing the solution internally, and preparing the project for public release.
This has been the nature of my work recently. Every so often I still have to just crank out a bunch of code, but increasingly I’ve been spending more time on the things around the code. It’s gratifying to have something you can maintain, be proud of, and deploy with confidence. I’m very much looking forward to my next project.
At work, we have a number of REST services that are responsible for most of the content you see on IGN. For example, all our reviews and features are stored in an articles service. Data about games (release date, publisher, our review score, etc.) have their own service, and so on. These services use Mongo as their primary data store.
Sometimes data is retrieved by slug or ID. These fields are indexed in Mongo, so we just do a straightforward Mongo query. Mongo uses the index and the query is performant.
Other times, we need to do more complex queries. For example, to build this page, we need to do a query like:
- Get me games
- That were released in the US
- That have a review score greater than 0
- That are for the PS3
- That are RPGs
- And sort the list by review publish date
Even if we had indexes on all of these fields, Mongo can’t combine indexes dynamically. It will pick the “best” one, then try to resolve the other criteria the hard way. To speed up the query, we would need an index that covered all of the fields specified in the query. Unfortunately, that is impractical because of the diversity of our queries.