December 2014 – Matt Faus

Like most frameworks these days, the Google Appengine (henceforth, GAE) SDK provides an API for reading and writing objects derived from your classes to the datastore. This saves you the boring work of validating raw data returned from the datastore and repackaging it into an easy-to-use object. In particular, GAE uses protocol buffers to transmit raw data from the store to the frontend machine that needs it. The SDK is then responsible for decoding this format and returning a clean object to your code.

This utility is great, but sometimes it does a bit more work than you would like. I found one query that was fetching 50 entities, each with over 60 properties. Using our profiling tool, I discovered that fully 50% of the time spent fetching these entities was during the protobuf-to-python-object decoding phase. This means that the CPU on the frontend server was a bottleneck in these datastore reads!

The problem was that the GAE SDK was deserializing all 60 of the properties from protobuf format, even though the code in question only required 15. The best way to solve this problem is with projection queries provided by the GAE SDK, which reduces both the download time and the objectification. But, each projection query requires its own perfect index. I wanted something with a little less overhead.

Using the magic of monkey patching (run within appengine_config.py), I modified the pb_to_entity function (which exists inside the GAE SDK) to drop unnecessary properties before performing the protobuf-to-python-object translation. This is much like the scenario where a new property is added to a Model. Entities that have not been written since the new property was introduced will return a protobuf that excludes the newly introduced property. With this change, the 2.9 seconds spent decoding was reduced to 900 milliseconds, a 3x improvement!

You can get the gist of how I got this working here. The main points are:

appengine_config.py monkey patches the relevant functions inside the GAE SDK so that my function runs instead.
A context manager sets a global, thread-local variable with the names of the properties that we care about. The custom function only operates when this global variable is set.
- This means you should query against exactly 1 Model within the context.
The __getattribute__ function is monkey-patched to provide access safety for projected objects. When you try to access an excluded property from an object that was created during protobuf projection, an exception is raised. This is a feature that “official” projection queries do not offer.
Various put() functions are monkey-patched to disallow putting a projected entity. This provides parity with “official” projection queries.

Now that this code has been written, I will be on the hunt for more queries that might benefit from its use. If you use this code in your application, I would love to hear about it!

Matt Faus

Month: December 2014

Speeding up GAE Datastore Reads with Protobuf Projection