Making my code run 3000 times faster!

Photo by Abed Ismail on Unsplash

Photo by Abed Ismail on Unsplash

I first started using the MySQL database system way back when version 2 was released. I cut my teeth on writing SQL queries using that system, even when it still didn’t have nested SELECTs, which entailed me really getting to grips with hand optimised JOIN commands.

But over the years, as better tools and layers of abstraction evolved, I began to touch the raw SQL less and lees, and began to rely on these other tools to do the heavy lifting for me.

My latest startup uses the newest, shiniest version of MySQL to store data, but I am several steps removed from the data store via a Ruby ORM (Object Relational Mapping). Basically, I just define all the database elements as standard Ruby objects, and the system just takes care of everything for me.

It works well 98% of the time, but sometimes there is a problem, like the one I will describe below.

Plenty of space, really! 😅

Plenty of space, really! 😅

In my SaaS app, we allow our users to store document attachments against their employees. We have a handy little widget in the footer of every screen to show our users just how much of their allotted storage they are presently using.

Now, this widget used to do a quick calculation upon every page load. Initially it worked well, with only a few extra milliseconds being added to page load time, but as our system grew, and we allowed attachments in different areas (i.e. Company Library, Job Applicants, e-Signatures, Expense Claim receipts etc.), we started to find that this page load time was actually being increased to over a second - sometimes even a couple of seconds per view, which really impacts the overall snappy feel of the app.

What started as this…

What started as this…

Quickly became this…

Quickly became this…

How would we solve this? Well, we decided to transition from a live, real time calculation, to running a process every couple of hours (in the background) that would go through the file attachment tables and tally up the total storage amount and store that in a database field somewhere. The screen rendering process would then just present the value of that database field on the screen each time.

So we would lose the real time aspect, but customers would still see their storage allocations up to only a couple of hours of lag time. Best thing was that page loads were back up to nearly instantaneous again.

We built a utility routine that runs on our background worker server to run these calculations every 4 hours (on a CRON timer). But the problem was, the routine would have to run through our entire active customer list of 1700+ companies, and tally up all the attachments across the different files in different areas, and store it in the company masterfile. That routine would take up to 45 minutes to run! AND it was also spiking the CPU of the worker server so that other critical background tasks (like calculating leave, handling Slack notifications and important emails) was being impacted and delayed.

We tried several things like spawning several concurrent Threads to run part of the task in, but still no joy. I kept getting critical alerts in the middle of the night that our worker instances were down or struggling with the load. Clearly not acceptable.

This was a frustrating issue, and the team, including myself, were struggling to solve it. But then it dawned on me. We were getting lazy. Our immediate ‘go to’ solution was to stick with Ruby and the ORM and our background worker to solve a background task. What if there was another way?

Looking at the problem, there were no real external inputs required. The task was simply to step through the customer database row by row, then find the associated records (which was easy, because we have keys linking them all), and run a SUM query and then total up all those SUMs and save it in the main company database.

Why weren’t we just doing this in SQL directly? But also, MySQL now has the concept of ‘Scheduled Events’, where you can store regular housekeeping routines right within the database that can to routine queries on a fixed interval. No need to set up a timer on our worker server at all.

So I opened up a MySQL query console, and typed up a query that would do the totalling of all the attachment records and perform an UPDATE on the company row.

Nested selects…. nested selects everywhere!

Nested selects…. nested selects everywhere!

I ran the query.

It took…. 0.8 seconds!

Let me say that again - it took LESS than a second to run. Compared the worker thread we used to do the EXACT SAME thing that took 45 minutes+. That is about 1/3000th the time.

Best of all, it completely takes the load off our worker server. I think the database CPU spiked about 2% for that 1 second timeframe. This meant that instead of running every 4 hours, we could schedule the task to run every hour. Heck, we could even run it every 5 minutes if we wanted to, but I don’t want to get that crazy unless customers really demand it.

It was a valuable lesson to me that complacency and laziness can really be a detriment in a growing business. When you have a hammer, everything looks like a nail, and we were so wrapped up in the comfy perceived safety and convenience of our Ruby ORM that it hadn’t really occurred to us that we could take the covers off and get down to the nuts and bolts to solve the issue in the first place.

Next time there is a problem like this in my startup, I will go back several decades in my working life again, and see if I can roll up my sleeves and fix it the ‘good old fashioned way’.