MySQL “EXPLAIN” changes after restart

Posted on by Mark

If you don’t work with MySQL you can stop reading, but if you’ve come here from Google and you’re experiencing a problem like the one described in the title, hopefully this will save you some time.

I was performance testing a new system at work the other day and was confused by some huge changes in performance that didn’t seem to be related to the config setting I was playing with. The test would run in 6 minutes, I’d tweak a setting and it would suddenly blow out to 13 minutes. I’d change the setting back and it would stubbornly stay at 13 minutes. I’d change something unrelated and it would drop back to 6 minutes. Come back an hour later and it would it be back at 13 minutes.

This is the last thing you want happening with this sort of testing because you can’t trust anything you think you have found. All the conclusions you think you reached are meaningless. Hours you’ve have been spent are possibly wasted and you’re probably going to have to waste a many more before you get anywhere.

Fortunately this story had a happy ending.

On the bad runs SHOW PROCESSLIST was showing a single query running for ~500 seconds. On the good runs this query was done in a second or two. The test I was doing involved running variants of the query (among many others) about 40 times. I initially suspected it was the query cache being on or off, then I thought it might be that if multiple instance of this query were running in parallel it might saturate the CPU or IO. iostat showed no disk activity, but the system runs on a mix of SPARC and Intel servers and the faster Intels seemed to have the problem more than the SPARCS which have lots of relatively slow processors. But nothing held up to scrutiny, the problem would come back or disappear just when I thought I had it pinned down.

I eventually fell back on using EXPLAIN to look at the execution plan for the query. I’d been shying away from this because being 2 weeks away from go-live I wanted the problem to be config and not code. But EXPLAIN did some strange things. When the query was slow EXPLAIN was showing one of the joins not using a key and doing a full scan of a table with 500k records. Other times it would decide to use a key and only have to look at between 700-1500 records. It seemed that restarting MySQL would make EXPLAIN recalculate the execution plan and almost randomly pick one of these two conclusions.

I typed the title of this post into Google and came across this post which gave me the clues I needed to solve the problem. It seems that by default InnoDB uses 8 random pages to estimate the best execution plan for a query and depending on the data it finds the plan can vary. Once it has chosen this data it will cache it until one of several events occur, most likely being a server restart or when ANALYZE TABLE is run.

Once I learned this I was fairly confident I’d found the culprit. Next step was to work out how to coax it into picking the right plan. The number of pages it uses to calculate the plan is controlled by the innodb_stats_sample_pages variable. The downside of setting it too high is that MySQL spends more CPU and potentially IO figuring out the optimal execution plan so I needed to find the lowest number that gave consistently provides the optimal execution plan.

The routine I used looked something like this, with multiple ANALYZE/EXPLAIN runs to check see if the results remained consistent:
mysql> SET GLOBAL innodb_stats_sample_pages = 800;
mysql> SHOW VARIABLES LIKE "innodb_stats_sample_pages";
mysql> ANALYZE TABLE TableName;
mysql> EXPLAIN ....
mysql> ANALYZE TABLE TableName;
mysql> EXPLAIN ....
mysql> ANALYZE TABLE TableName;
mysql> EXPLAIN ....
mysql> ANALYZE TABLE TableName;
mysql> EXPLAIN ....
mysql> ANALYZE TABLE TableName;
mysql> EXPLAIN ....

Each ANALYZE/EXPLAIN run gave the good execution plan, but the ANALYZE TABLE was (unsurprisingly) 100x slower than previously. Next I dropped it to 80 which also gave good execution plan results and was only 10x slower on the ANALYZE. Being fairly suspicious at this point I set it back to 8 and was relieved to get mixed results again. So now I was confident that the right number was between 8 & 80. By repeating this process I eventually found that 10 gave consistently good results. To be on the safe side I eventually bumped it up to 12 and got back to my testing.

This entry was posted in Google Fodder, MySQL, Under The Hood. Bookmark the permalink


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>