What do you mean unstable?
[edit] I originally wrote this post last week, upon doubting whether I should or shouldn’t post, I left it simmering as a draft. But after testing, the problems for this Magento instance are solved with the manip I did.
I wasn’t quite awaiting the day that I would have to sift through the 300+ tables of a Magento installation to find an error.
In fact it all started with Magento turning slow. As I stated before this particular Magento suffered from being installed on an InnoDB incapable MySQL environment. Honestly, what I’ll tell you next can virtually happen to any application that makes full use of InnoDB’s capabilities. But having it happen to a Magento was particularly painful due to the rather large amount of tables to check.
So what happened? Usually, when you install Magento on an InnoDB capable MySQL, the 300+ tables get linked to each other with plenty of foreign constraints and a lot of them, if not all of them, have an accompanying mention of “on delete cascade”.
This means that if you have a table to store your order in and a table that contains order_items, you can link both tables using the order_id field as a foreign key. Setting the “on delete cascade” will, when you delete your order, make that delete cascade to all tables depending on that order. Meaning that it’ll remove both the order and the order_items linked to it.
Magento is very dependant on having that cascade working. But MyISAM tables do not have this fancy feature to make life easier when you’re having 300+ tables. So what happened when the website got put online?
As it is customary when we put our Magento’s out into the wild, we execute a script we made to remove all test-data and test-clients from the store and reset all Magento stats, thus making it ready for real business. This script, much like Magento itself, depends on that cascade I mentioned before. Meaning that it’ll pretty much delete the order from the sales_flat_order_entity table, but does not pass down the sales_flat_order_items table to delete the corresponding items of that order, the script (as it’s maker) assumes that the cascade will take care of the heavy work of deleting the unnecessary content. However when it’s executed on a MyISAM set of tables only the order will be deleted and nothing else.
That would not have been much of a problem if we had used “delete” to delete content. But instead we used “truncate” which will not only delete the content but will also reset the auto-increment value(s) of the table. Which means, and I’m sure you hear me coming, that if a new order gets made with the old order’s auto-increment id, the old -supposedly deleted- order-items will be added to the new order, even-though the client did not order that particular item. Can you imagine what happens when orders also have a link with payments?
Anyway, despite our relatively small problem of being unable to rebuild the catalogue after some messing around with categories and products (keep in mind that we had already reconstructed the right table structure by then), there was this bigger and more dangerous problem going on. So how to fix it?
300+ tables to check is scary. The last thing you want to do is check them one by one, right?
What I did was, I made a dump of the structure of the development installation of the database. Then made a mysqldump of the data of the production website and got started.
The idea was to import the InnoDB structure into a test database without the foreign key constraints. That way the structure gets built and the database will be ready to receive the faulty data. Add the data from the production server. And then execute the foreign key constraints by means of “alter table” statements to have MySQL throw errors upon finding faulty data. Correct the table, execute the “alter table” again and move on until everything is cleaned up.
Please note, that this will only clean up the data related to id’s that haven’t been reused yet. To clean up data of id’s that have been reused is a lot tougher. In my case I was in luck that only the invoicing system was having inconsistency problems which actually only showed upon clicking “print” on the invoice. It was a bit of a pickle but I managed to remove the old faulty items from the invoices by comparing them with the actual orders whom in this case did not seem to suffer any problems. Hence, the problem could have been a lot worse.