Thoughts About Upgrades
It’s been a long time since I last posted. The reason can be summed up in a single word: upgrades.
When I became the IT Director at our school district, we are on the cusp of the Vista/Windows 7 transition and were staying with Windows XP. We were not as behind as some places I’ve seen, but we were certainly due for some new technology. I was surprised to learn that there was a near phobia of talking about changing anything. I mean anything, from desktops to laptops, away from XP or looking at alternative software among the staff that had been in place for several years. To their credit, I now completely understand this fear.
Our network was set up around desktop-based Windows XP machines. It worked great for that, but when more and more mobile devices began to creep into the network, it needed some adjustment. It was also designed to be very secure, everything was locked down and complex security was everywhere. The problem was, we continued to have the same problems with virus infections, spyware and all of the crap that makes its way into computers, in spite of this security.
It seemed to me that since we were still getting the junk, there was no sense having to trip over the security so much. Last summer, I worked with an outside consultant and we significantly dialed back the security across many areas. We decided a year ago, that the following year we would perform major upgrades. We called it the “Etch-a-Sketch” approach and were just going to shake the thing and start over. In July, that process began and this is the true story of how I aged ten years in the last two months.
The beginning of the end started with hard drive upgrades. We upgraded hard drives in our virtual machine host, one by one, until we had all eight disks in the RAID array the same size. Prior this this, I ensured we had good backups, even had snapshots of the virtual machines on an external hard drive. Worst case scenario, a day of downtime to copy all of them back, no problem. Somehow, about three days after the last rebuild happened, the controller decided “Oh, hey, check out all this new space! Let’s use it! Here you go, you’re welcome.” Now, ordinarily, that would be great. However, the VMWare host couldn’t see a single partition greater than 2TB. VMWare OS ran on an embedded flash drive, and was still running happily along, except that all of the machines were suddenly invisible.
No problem, we’ll just split the drive into two partitions to make VMWare happy, put all of our machines back and we’re back in business. As it happened, this was on a Friday, so I had the whole weekend. It wouldn’t even affect that many people. If only it were that easy. The first malfunction included the external hard drive where I stored all of these virtual machines decided that it was going to quit. No warning, no click of death, it just wouldn’t spin up. No biggie, I could still just reinstall the multiple servers since I was upgrading them to Server 2008 anyway and then just restore the docs from Crashplan. It would take a bit longer, but it was summer and there weren’t that many people who would be affected. The only problem was that the Crashplan backup server decided that it would be SUPER helpful to corrupt all of the backup sets and not report it for days. For every server. So the backup failed, the backup to the backup failed, and I had four VMs that said “unavailable.”
I opened a case with VMWare support. They worked on it for hours with no luck. Finally, as a crazy approach, I booted the server from a Clonezilla CD, cloned the vmfs to a (working) external hard drive. Since I was dead in the water anyway, I re-partitioned the RAID set into two equal sizes, made the adjustment in VMWare to span both partitions and I was back in business. I then cloned my original vmfs image back onto the first partition and all was well with the world. It was a bit of a hack, but it worked.
Next was a primary domain controller upgrade. This was a serious upgrade, new hardware, new OS, new configuration, new Active Directory. The works. We set it up, plugged it in the network and powered off the old one. Since it was summer time, a single controller would be just fine for DHCP, DNS and the basics of networking. Now, you may be asking, what about the existing machines, what of their fate? Since it was summer, there were less than a dozen staff working consistently. I planned the upgrade with the time we could give them newer computers. It was upgrade/migration combination. For the most part, it went pretty smooth.
The second part was to move networking equipment into a new rack about three feet to the right of its present location. I know enough about networking to know that I should call someone who knows, so this was approached with a cautious optimism. It’s just unplugging and plugging in, what could possibly go wrong? Again, we moved and installed the new equipment into this with little or no fuss, fired it back up, and of course no Internet. All fiber connections to buildings, internal network were just great. 24 hours of troubleshooting later and the culprit was PIX firewall. This experience taught me, and a subsequent switch encounter confirmed, that the solution to DOA Cisco products is to simply turn them off and on repeatedly and eventually they will work. It may take 30-40 times, but eventually they will regain consciousness and begin functioning again.
That nightmare safely behind us, it was time to move on to file and print servers. Throw in several hundred student user accounts, a Mac OS upgrade for all student and staff computers, along with replacing dated teacher computers with newer equipment, and that’s where the blur begins.
Oh, did I mention the installation of 140 virtual NComputing terminals, which required setting up and configuring 5 physical hosting servers. Yeah, that too. Although I will say that if I could replace every machine in the entire district with those terminals and give staff iPads for mobility, I would do it in an instant. No more laptops, a classroom “teacher terminal” and iPads with Dropbox, Google Docs and iCloud. One or two high powered physical Remote Desktop servers to manage. Nirvana.
Throw in a couple of failed courseware upgrades for good measure. Oh yeah, and when it finally does get fixed after an hour and half of downtime for the 20+ students waiting on it, the Windows Remote Desktop server would choose that exact instant to decide that it was counterfeit and end all of the remote desktop sessions. Good times.
Did I mention that a shipment of laptops was lost somewhere in Fedex land? They arrived a week late, one week before school started. By that time we had a steady backlog of 100 tickets a day, plus the crush of unfinished projects and a leap to Windows 7, new teachers, and a new food service point of sale system. One day I will look back on this and laugh. It may be a day in a very distant future.
What I learned from this experience is that upgrades should be done incrementally, not in leaps and bounds. It has also helped me to understand my physical and mental limitations. I would give anything for a slow day at work. Sometimes you just need those days where you can work on projects you want to do to improve, rather than play fireman. It seems all we’ve done is put out fires. I am tired. I am physically and mentally drained. As the school year is well under way, things are beginning to settle into a manageable pace. I will leave you with this piece of advice: upgrades suck.



