LOAD DATA INFILE and crash safety
I used to warn people with few details and lots of hand waving that LOAD DATA INFILE should not be run on important MySQL servers in production. I spent a few hours today crashing slaves during LOAD DATA INFILE and now have a much better warning. It should be OK to use LOAD DATA INFILE when the slaves use rpl_transaction_enabled, the loaded table has a primary key and the load doesn't run for a long time.
Replication isn't crash proof on a MySQL slave even when InnoDB is used. I have written about this before and rpl_transaction_enabled fixes the problem for InnoDB. The MySQL docs have a brief warning about the problem and a few other warnings are here.
I spent time today crashing slaves during replication of LOAD DATA INFILE statements to understand crash recovery for them. That statement can fail differently in 5.0 and 5.1 than it did in 4.0. When LOAD DATA INFILE is run for a large file, the following events are written to the binlog for MySQL 4.0: append file, append file, ..., append file, exec file. Each append file event represents a subset of the file to be loaded and appends that subset to a temporary file on the slave. The exec file event performs the LOAD DATA INFILE using the data from the append file events. The relay-log.info file is updated after each of these events is processed on the slave.
There are two interesting failures for transactional storage engines. If the slave crashes or is kiilled after an append file event and before writing relay-log.info then the slave will repeat that event on restart and eventually run the exec file event. The result of this is either a duplicate key error or silent duplication of data.
If the slave crashes or is killed after the exec file event and before writing relay-log.info then the slave will get an error on restart (at least it did for me) because the temporary file to be loaded does not exist. I prefer an error like this versus silent duplication of data. Recover from this by starting replication from the first append file event using CHANGE MASTER.
Note that writes to relay-log.info are not forced to disk unless enabled via a recently added my.cnf option. Therefore crashes that occur after writing relay.log.info but before the write is forced to disk by the OS can also reproduce this.
The failures change in 5.0 because BEGIN and COMMIT are written to the binlog for LOAD DATA INFO. The sequence of events is: BEGIN, append file, ..., append file, exec file, COMMIT. As far as I know starting in MySQL 5.x all (most?) binlog event groups are wrapped by BEGIN and COMMIT to fix another problem, but I forgot the bug number of that other problem. After the slave processes a BEGIN event it does not update relay-log.info until the next COMMIT event. On restart, the slave will begin again at the BEGIN event, run the append file events and then run the exec file event.
This fixes one error. A crash after an append file event in MySQL 4.0 resulted in silent duplication of data. In MySQL 5.0 and beyond that error does not occur as the slave will start over from the first append file event. However, the crash after the exec file event results in repeating the entire load -- it will perform the LOAD DATA INFILE command twice for the original data.
If there is a primary key on the loaded table then any duplication will halt the slave with a duplicate key error.