Thursday, March 22, 2007

Recovering from Corrupted MySQL Database

Today one of my clients ran into a problem with his Sguil installation. The server hosting his Sguil MySQL database experienced a crash, as shown by dmesg on reboot:

Trying to mount root from ufs:/dev/ad0s1a
WARNING: / was not properly dismounted
WARNING: /home was not properly dismounted
WARNING: /nsm was not properly dismounted
WARNING: /usr was not properly dismounted
WARNING: /var was not properly dismounted

The original error message said:

ERROR: loaderd: mysqlexec/db server: Incorrect key file for table
'./sguildb/sancp_sensor_20070322.MYI'; try to repair it

If the sensor crashed while SANCP data was loading, it would make sense that sancp_sensor_20070322.MYI was corrupted.

When trying to restart sguild, the following error appeared:

[user@sensor ~]$ ./
pid(3119) Loading access list: ./sguild.access
pid(3119) Sensor access list set to ALLOW ANY.
pid(3119) Client access list set to ALLOW ANY.
pid(3119) Adding AutoCat Rule:
pid(3119) Adding AutoCat Rule: ||ANY||ANY||ANY||ANY||ANY||ANY||tag:
Tagged Packet||1
pid(3119) Email Configuration:
pid(3119) Config file: ./
pid(3119) Enabled: No
pid(3119) Connecting to localhost on 3306 as user
pid(3119) MySQL Version: version 5.0.27
pid(3119) SguilDB Version: 0.11
pid(3119) Creating event MERGE table.
pid(3119) Creating tcphdr MERGE table.
pid(3119) Creating udphdr MERGE table.
pid(3119) Creating icmphdr MERGE table.
pid(3119) Creating data MERGE table.
ERROR: loaderd: You appear to be using an old version of the
sguil database schema that does not support the MERGE sancp
table. Please see the CHANGES document for more information
SGUILD: Exiting...

That doesn't look good.

Whenever I encounter a database problem, I first run mysqlcheck (with the database running) like so:

[user@sensor ~]$ mysqlcheck -r sguildb -p
Enter password:
note : The storage engine for the table doesn't support repair
sguildb.data_sensor_20070215 OK
sguildb.data_sensor_20070216 OK
sguildb.sancp_sensor_20070215 OK
sguildb.sancp_sensor_20070216 OK
sguildb.sancp_sensor_20070320 OK
sguildb.sancp_sensor_20070321 OK
sguildb.sensor OK

Note sguildb.sancp_sensor_20070322 isn't listed.

I stopped MySQL and then ran myisamchk, which showed the following:

sensor:/var/db/mysql/sguildb# myisamchk *.MYI
Checking MyISAM file: sancp_sensor_20070322.MYI
Data records: 2687 Deleted blocks: 0
myisamchk: warning: Table is marked as crashed
myisamchk: warning: 1 client is using or hasn't closed the table properly
- check file-size
myisamchk: error: Size of indexfile is: 303104 Should be: 306176
- check record delete-chain
- check key delete-chain
- check index reference
- check data record references index: 1
myisamchk: error: Found key at page 295936 that points to
record outside datafile
MyISAM-table 'sancp_sensor_20070322.MYI' is corrupted
Fix it using switch "-r" or "-o"

I fixed it like this:

sensor:/var/db/mysql/sguildb# myisamchk -r sancp_sensor_20070322.MYI
- recovering (with sort) MyISAM-table 'sancp_sensor_20070322.MYI'
Data records: 2687
- Fixing index 1
- Fixing index 2
- Fixing index 3
- Fixing index 4
- Fixing index 5
- Fixing index 6
Data records: 2678

Next I restarted the database and re-ran my sguild startup script. Everything returned to normal as I had hoped.

This is another example of the idea that anyone who uses detection systems long enough eventually becomes a database admin!


Alex said...

I had a similar issue with the MySQL database last month due to a SAN issue, only I had all of my tables in limbo! I was eventually able to recover because I (luckily) did not shut down MySQL. I couldn't agree more that practicing NSM has taught me more about database administration than I ever expected.

Alex said...

I should have said "partially recover" above. Some of the tables were unrecoverable.