Category: Work

  • Servers Gone Wild!

    That accursed server at the heart of our Enco network has given us a series of bizarre, traumatic experiences, but none quite so weird and infuriating as this week’s situation.

    It started Monday, when I moved the Beast (our standby server) from its old rack space to a new space underneath the main server to make room for the fancy new replacement main server. Shortly afterward we started hearing complaints about audio playback stuttering and behaving strangely in the production studios. We checked the usual error logs and indicators and what-not and couldn’t see anything out of the ordinary. We checked the connectors and lights and cable paths, but everything looked fine.

    More complaints came in during the day yesterday. Again, we checked everything we could think to check. We had Enco specialists on the phone and dialed into our system to check everything they could think to check. Still, no problems could be found.

    Complaints were widespread this morning, so we began an even more intensive search for the cause. While, for the fifth time in two days, monitoring the SCSI drive array controller… between blinks of the eye… two of the drives went “bad.”

    Huh?

    So we ended up downing the server, after some tinkering and deliberation. When we brought it up again, six drives were marked bad… out of eight.

    Huh?

    Going into the array controller’s configuration software, it was indicated to us that all eight of the hard drives were now offline and unusable.

    HUH?

    We tried everything. We stole the cables from the new server. We removed the drive enclosures and re-seated them. We completely power-cycled every part of the server system. No matter what, the drives were all coming up as Offline.

    Great, we’ve lost the main Enco server array. Again. For the Nth time. Argh. And as always, the array has impeccable timing: In two weeks we were to have migrated gracefully to the new server.

    So right now we’re on the backup, aka Beast. Tomorrow we’ll call Enco’s tech support and figure out the best way to get us online with the new server ASAP.

    But wait, there’s more!

    For some bizarre reason, our locally-hosted Qwest Dex software decided to nag everyone who uses it, insisting they download the latest version directly to their computers. Argh! I don’t think so! So between rounds with the Enco server I was frantically trying to update the Dex software on the server. I have the client updated, so at least now it looks better and nags in a reasonable fashion. Tomorrow I’ll try to get the actual updated phone book data from those fine, fine folks at Qwest.

    But wait, there’s even more!

    Before I could work on the Qwest problem, I had to get some semblance of a working office! That’s right, folks, when I got to work this morning I was greeted with an office crammed full of boxes. I couldn’t even see my desk, let alone get to it. A few hours’ (intermittent) unpacking later, and some cussing while I tried to figure out who the idiot is that neglects to label his network ports… oh wait, that would be me… anyway, I finally got my main workstation running. A few hours after that, I got my Linux box running.

    There’s good news and bad news, here. Good news? I have a 19″ flatscreen on my Linux/shared workstation side. Bad news? My webcam died. Argh! So still no OfficeCam!

    So, yeah. Tomorrow had better damned well better be a happy shiny fun day, or I’m going to go insane…

  • Movement! I’ve got movement!

    In the midst of all of this, you know, reconstruction and preparation to move in the broadcast operations for KWJJ… er, sorry, The Wolf… and KOTK, it was decided that there’s also to be a wholesale recarpeting and repainting of the facility! Isn’t that great?

    I will be vacating my office tomorrow, so I’ve spent the last couple of days packing and cleaning and filing and sorting. It’s all good, but it’s also kept me from doing much of anything else around here. My office will get the paint and carpet treatment on Saturday and Monday, then I get to move back in Tuesday… if all goes well.

    Oh yeah: The OfficeCam will be offline between tomorrow and (probably) Wednesday. There’s no point getting too settled in my very-temporary space, right?

    And then, when I’m back again… I get to rearrange things the way I’ve been thinking about doing for months now. (In a way, this project is a very good thing. It’s just damned annoying doing all of this packing and cleaning!)

    Fun, I tell ya. Fun.

  • Now I know what, just not why.

    40 working hours ago (that would be Monday morning, and yes it has been one long damned week already) we began a painful process of dealing with and sussing out a series of crashes on our Groupwise server. We’ve tried the usual maintenance procedures, we’ve tried patching the OS and email systems, and we’ve tried cursing. Lots of that, actually.

    Today, the (verbose) logfiles have finally given up their secrets. For each abend there’s a temporary file in the MTA working directory. Each file is locked by the MTA because even though its thread has died the MTA doesn’t know it so it won’t release the lock. After ratcheting up the logfile verbosity, the MTA told me two things: That it’s failing on an address lookup, and that the address is “kmttcontests.” It didn’t take a genius to figure that the rest of the address is “@kmtt.com,” mind you.

    So just now I sent a test message from my Groupwise address to that recipient. Guess what? Another thread died! Eureka!

    Now I know what’s causing these specific crashes. The question is… why does that one email address cause the MTA to lose a processing thread? I have no idea, and neither Novell’s support site or Google’s usenet searching capability have been much help. Right now my fate is in the hands of those busy guys at Corporate…

  • Can’t update. Still working.

    Friday? About 10 hours.

    Sunday? Eight hours.

    Monday? 13 or so.

    Tuesday? A bit more than 10.

    Today? Who knows?

    If you’re wondering why I haven’t had the energy to write or anything to write about, now you know. I’ll be glad when the half-dozen ongoing emergencies all die down around the office… whenever that will be.

  • The The

    First we had a sports station named The Fan.

    Recently Rosie… er, Rosey 105 became The Buzz.

    And now, as of yesterday, country station KWJJ has become… The Wolf.

    Remember a time when radio stations were known by their calls? Apparently that time is now past, or at least it’s no longer fashionable in the industry.

    What’s next? Will our oldies station become The Fogey? We could change our news/talk to station to The Talk. It’s hard to say what classic-rock stalwart KGON would become. The Rock? Nah, we’d get in trouble with Prudential and some nutjob wrestler/movie-star over that one. No good suggestions come to mind for our “new rock” station, KNRK, though plenty of snarky ideas are easily conjured.

    Branding, baby. It’s the buzzword of our times. Welcome to the new age, a lot like the past age but with sillier nomenclature.

  • Of Windows, Phone Equipment And Troubleshooting

    The fun and excitement of the Enco situation this week has obscured another interesting technical situation. We ordered some new studio telephone equipment that arrived late last week, and today (at long last) we tried to configure its hub.

    Telos’ 2101 “hub” is a computer designed to manage remote phone sets. (This is the machine we thought was blue-screening last week, for those of you who’ve been following along at home.) It’s running Windows NT “Embedded,” which I’ve never seen before.

    So we tried to configure it over the network, which is the only way provided to configure one of these machines. After sorting out some subnet issues we were able to ping the box, but not talk to it via the provided configuration utility.

    “Oh,” Telos’ tech support says, “You have to authenticate to the box first.” Turns out we have to search for the machine by IP address through Windows networking, connect with username and password, then the configuration utility is allowed to do its job. Hmm. We didn’t see that anywhere in the documentation.

    Then things took a turn for the weird. See, it turns out that the provided utility is known for doing weird, bad things to the device it configures. Well, we don’t want that, do we? So we attempt to download updated software via the utility. And we attempt, and we try, and we try, and we attempt, and we try some more. All is for naught, however, and we can’t figure out why. One clue is that the reported software version on the 2101 is more than two years old. It is, in fact, almost the first “released” version of the 2101’s software. This baffles Telos’ tech support guy.

    In an attempt to figure out what’s going on (here comes the cool part) I’m instructed to fire up Netmeeting and use it to connect to the 2101… and upon connection I’m given a desktop to control! That’s right, folks. I was looking at an NT desktop via a Netmeeting instance designed to allow last-ditch system administration on a box that lacks keyboard and video display (but does provide the hookups therefore, go figure).

    To wrap up, it turns out that the software we were trying to connect to and update wasn’t even running on the computer. Telos is going to prep a new CF card with the most-recent software revision and “all that,” which we should receive next week. Supposedly we can just drop that CF card into place and ship them back the one we have, and then we’ll be able to use the web-based administration and (gasp!) actually have working, running software when we boot the device.

    Wow. I’m really glad I didn’t go home immediately after we finished up the Enco project today…