I noticed my server (Ubuntu 11.10 Server) HDD light was on when nothing was accessing the server. I already had a SSH window open on my Mac. So I tried a few commands to try and see what was going on.
Code:
[/usr/local/sbin/hourly.active]: htop
-bash: /usr/bin/htop: Input/output error
[/usr/local/sbin/hourly.active]: sudo cat /proc/mdstat
-bash: /usr/bin/sudo: Input/output error
[/usr/local/sbin/hourly.active]: ls
/usr/local/sbin/ls: line 50: 19495 Bus error /bin/ls $@ 1>&1
[/usr/local/sbin/hourly.active]: la
/usr/local/sbin/ls: line 50: 19500 Bus error /bin/ls $@ 1>&1
[/usr/local/sbin/hourly.active]: cd
[~]: ls
/usr/local/sbin/ls: line 50: 19507 Bus error /bin/ls $@ 1>&1
[~]: top
Segmentation fault
top, htop, cat, and ls gave errors, but cd worked fine.
I tried a reboot, but wound up with a "no operating system found" sort of error. I had to go to work, so I shut down and switched off the PSU. After coming home I rebooted into recovery, powered down the system normally (halt -p) and rebooted. It came up fine except for a failed sdb in in the raid. It rebuilt fine on the spare. I'm currently running a smart test (smartctl -t long /dev/sdb) but I don't expect any errors as the RAID has dropped disks before and they checked out fine.
It seems odd thought that just failing a raid disk (the OS is on a separate drive) would take the whole system down.
Any thoughts?