There I am, settling in for a quiet Saturday of housekeeping, websurfing and bookreading. And the phone rings.
The main Enco server is down. This is bad news ordinarily, but it becomes exceptionally bad when you remember that the standby server went down a week ago and you haven’t yet received the replacement hard drive you need. Uh oh.
So I hop in the shower, hurry to the office, and discover that one of the drives in the external chassis has gone south. Oh no. I grab the spare (yes, we do keep a spare for the main server, just not the standby), put it into an enclosure and swap it into place, all the while expecting a long overnight as I babysit the restoration of files to the new Netware volume I’m doomed to have to create.
And the new drive exhibits exactly the same problem as the old. Aw, hell…
(changing verb tenses, just a moment please.)
It took Gary and I about three hours to get everything running again. How could we possibly have rebuilt a RAID 0 array and restored the data in such short time? Piece of cake. Turns out the drive itself didn’t die, just the receive bay in the hot-swap drive chassis.
And the boxed spare also turned out to be flakey. We tried every combination of enclosure, receive bay and LVD add-on board we had… except one. In a flash of desperate inspiration I decided to look up on one of the shelves in the engineering shop. Under a pair of old hard drives and other assorted detritus I found one more receive bay. We attached an LVD add-on board and set the SCSI drive ID to match the old bay so the RAID controller would hopefully recognize the original drive and spare us the need to create a new array. Lo and behold, it worked!
Yay, we got our array back. The main Enco server is once again alive and kicking. We made a list of spare parts we need to order, since it’s just a matter of time before that slot fails again. (Turns out that we’ve lost two receive bay units in the same chassis position since putting the Enco system into service. This does not instill us with confidence.) I then turned my attention to the standby server for which we’d received the replacement drive yesterday, naturally on the day I couldn’t make it to the office.
There’s a standard principle followed by almost every RAID-controller manufacturer in the business: All drives in an array will be treated as if they were the same size as the smallest drive in the array. It’s difficult to replace a single dead drive with an exact duplicate, especially two years down the road, so RAID controllers (usually) allow you to use a replacement drive slightly larger than the original. Yet, for some asinine reason, the folks at 3Ware decided that all drives on one of their IDE RAID controllers must always be exactly the same to be included in a single array.
Of course, the replacement drive we purchased, while the same manufacturer (IBM) and basic type (IDE, 7200 RPM), was just a wee bit larger than the others, and therefore different enough that the 3Ware controller refused to include it in the new array. And so, we cannot bring The Beast back online until we either find another DTLA-307075 or buy six or seven identical replacement drives for the new array.
I suppose you can’t win ’em all.