There comes a time in every sysadmin’s life where filesystem errors just…happen. Luckily, these are somewhat easy to fix, assuming you don’t have a greater problem involving physical hardware damage.
First, you need to know the name of the disk device having the problem. Do a quick df to see what device the affected partition is on:
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 2.7T 2.6T 106G 97% /
/usr/tmpDSK 4.0G 1.7G 2.2G 44% /tmp
Look under the “Filesystem” column to see the device name for the partition in question. Now, if this is any filesystem but “/”, your job is probably going to be easy. Simply unmount the file system and run a fsck against it. For example, if you have a separate /home partition listed as /dev/sda3, you would do:
fsck -yC /dev/sda3
There are a number of options for fsck, but the above combination is my personal preference. The ‘y’ tells the fsck to fix whatever error sees, which is preferable unless you feel that your index finger has the stamina to hit ‘y’ 500 times in a row, and the ‘C’ prints out a pretty little progress bar so you can keep an eye on it. Ext4 fiesystems fsck rather quickly – typically less than an hour for a 2TB filesystem. Ext3 takes significantly longer.
Now, unmounting a filesystem may not be straight-forward – if any services are actively using files on that partition, the OS will refuse to unmount it. Doing a lazy unmount (umount -l) won’t work here either – you need to unmount it cleanly. To see what processes are using the filesystem in question, use lsof. From the above example:
lsof -p |grep /var/
Then stop any services or processes using it.
If the filesystem issue is on your primary partition, you have a little more work ahead of you. You’re going to need to boot into a rescue image. To do this, simply use a Netinstall image and boot to the CentOS installation screen, then type:
linux rescue nomount
You can skip networking and all that jazz, then run the shell. From there, you’ll need to find the partition in question and run the same fsck command. Do note that on CentOS 6+, the device name may be incremented since it will count the rescue image as the first device in most cases. So your /dev/sda3 might be /dev/sdb3 now, or even /dev/sdc3.
Once the fsck is done, reboot and confirm your filesystem is clean:
dumpe2fs -h /dev/sda3
The “Filesystem state” line should read “clean”. If it doesn’t, the fsck either didn’t complete correctly, or you have a larger problem on your hands.