40 working hours ago (that would be Monday morning, and yes it has been one long damned week already) we began a painful process of dealing with and sussing out a series of crashes on our Groupwise server. We’ve tried the usual maintenance procedures, we’ve tried patching the OS and email systems, and we’ve tried cursing. Lots of that, actually.
Today, the (verbose) logfiles have finally given up their secrets. For each abend there’s a temporary file in the MTA working directory. Each file is locked by the MTA because even though its thread has died the MTA doesn’t know it so it won’t release the lock. After ratcheting up the logfile verbosity, the MTA told me two things: That it’s failing on an address lookup, and that the address is “kmttcontests.” It didn’t take a genius to figure that the rest of the address is “@kmtt.com,” mind you.
So just now I sent a test message from my Groupwise address to that recipient. Guess what? Another thread died! Eureka!
Now I know what’s causing these specific crashes. The question is… why does that one email address cause the MTA to lose a processing thread? I have no idea, and neither Novell’s support site or Google’s usenet searching capability have been much help. Right now my fate is in the hands of those busy guys at Corporate…