Using Auto Increment Fields to Your Advantage

I just found, and read, Clément Delafargue’s post “Why Auto Increment Is A Terrible Idea” (via @CoreRamiro). I agree that an opaque primary key is very nice and clean from an information architecture viewpoint.

However, in practice, a serial (or monotonically increasing) key can be handy to have around. I was reminded of this during a recent situation where we (app developers & ops) needed to be highly confident that a replica was consistent before performing a failover. (None of us had access to the back end to see what the DB thought the replication lag was.)

My first, naive, thought was simply to count rows in the tables between master & replica. Since that is via 2 separate DB connections, the queries wouldn’t be simultaneous, and some discrepancy was to be expected. My expectation was that the discrepancy would go to zero after we’d disabled all the database writers.

However, select count(*) from ____; runs quite slowly on some of the tables in this db – checking the timing on the replica (over a r/o VIP) took almost an hour to execute [1]. Not good enough by a long shot – not only did an hour make a big difference in record counts, but this was also too long to wait for information during the maintenance window.

Our DBA pointed out that all the “problem” tables had some form of serial integer field in them. By tweaking my script to use select max(id) from ____;, the entire set of queries, on both database VIPS ran in under 1 second. Much better! That information let us proceed with much more confidence during the maintenance window.

None of this anecdote should be read as justification for not using UUIDs as primary keys. Rather, if you’re stuck with serial integers, you might as well use them to your advantage. :)

Footnotes:

[1]We were switching our MySQL database from MyISAM format to InnoDB format. MyISAM caches row count as metadata, and can thus process count(*) efficiently. InnoDB does not cache the row count, so it must be calculated with ensuing massive disk i/o.