Recently, we noticed a dramatic spike in the number of writes to a specific model in our datastore. I wanted to figure out which portion of our codebase was issuing these writes. After perusing several of the commits, I figured there had to be a better way. Thankfully, we have some hooks that sit between any code that issues a query or a put and google appengine itself. This code determines the calling code’s source file and line number and keeps a running count of how often this code issues a query or a put. Every 5 minutes, these counts are flushed to the log files.
The log lines look like this:
3041: activity_summary.py:312 -- UserData.
Where 3041 is the count, 312 is the line number, activity_summary.py is the source file, and everything after are the datastore operations that were issued.
To extract and organize this information from the logs, I performed these steps:
- grep’d through the logs to find the operations on the entity I was concerned with. Specifically looking at a Wednesday before the increase and a Wednesday after the increase (being careful to choose the same day of the week to have the most comparable numbers).
- Fed these log lines through a simple python script to aggregate the thousands of log lines and turn them into .csv files.
- Opened the .csv files in Excel, sorted by source file / line number, and matched up the source file and line number to discover which files caused the increase in writes. Note that since the code may have changed between days, the files and line numbers may not match up exactly, so do the best you can.
- Compare the total operations per source file to see which one(s) have had the greatest increase.
- Look through the commits of those files and try to be better about issuing reads/writes only when necessary.
Here is a screenshot of the final spreadsheet that shows the increase in two source files.