Microsoft Windows crash nearly causes 800-plane disaster

“A major breakdown in Southern California’s air traffic control system last week was partly due to a ‘design anomaly’ in the way Microsoft Windows servers were integrated into the system, according to a report in the Los Angeles Times,” Matthew Broersma reports for Techworld.

“The failure was ultimately down to a combination of human error and a design glitch in the Windows servers brought in over the past three years to replace the radio system’s original Unix servers, according to the FAA,” Broersma reports. “The radio system shutdown, which lasted more than three hours, left 800 planes in the air without contact to air traffic control, and led to at least five cases where planes came too close to one another, according to comments by the Federal Aviation Administration reported in the LA Times and The New York Times.”

“As originally designed, the VSCS system used computers that ran on an operating system known as Unix, said Ray Baggett, vice president for the union’s western region. The VSCS system was built for the FAA by Harris Corp. of Melbourne, Fla., at a cost of more than $1.5 billion. When the system was upgraded about a year ago, the original computers were replaced by Dell computers using Microsoft software,” The Los Angeles Times reports.

Full article here.

MacDailyNews Take: Stupid? You decide. We wouldn’t trust Dell computers using the Microsoft Windows operating system to surf the internet safely, much less depend on the combo for safely communicating with 800 planes in flight.

58 Comments

  1. YES, IT *IS* A WINDOZE PROBLEM

    Hey snagglepuss — I guess you got your MCSE? Did Microshaft send you here to “enlighten” all of us “Mac-heads”?

    Why should a mission-critical system need to be rebooted every 30 days? BECAUSE IT RUNS WINDOWS! In my opinion, there should be ZERO downtime for a system like this — it should also be redundant enough so that if the servers did need to reboot automatically, they could do so on their own and no one would know the difference. All monitored very closely by a well-trained technician, I agree.

  2. Read Microsoft’s EULA (end user license agreement).

    Microsoft is not liable for any damages (injury, loss of business, etc) caused by its software beyond the purchase price of windows.

    Essentially use at your own risk.

  3. I’ve never had to reboot a windows server every “30 days”. Harris created the software to run on a windows system. This restart process is by design of the VCSU. Harris is responsible for making software that doesn’t require you to reboot. Running such a critical system on a dell computer is nuts I agree. Blaming windows for a problem that Harris created is arrogant and ignorant. I wouldnt say that windows is the problem. 2000 AS is probably one of the best server systems i’ve ever used when compaired to Linux or even OS X Server.
    Its stable and well suited for mass data transit. Considering they used an “Off the Shelf” Dell doesnt say much either. I wouldnt trust an off the shelf dell for any mission critical server, thats stupid, but again, thats a Harris problem not Windows.

  4. OK children, here’s the word for the day:

    Oxymoron – Ox`y*mo”ron, pointedly foolish, n : conjoining contradictory terms (as in `Airport Security or Microsoft Windows Servers’)

  5. Most likely the software was written in India or Russia – where you can hire MCSE’s by the dozen for $5/day. You can bet that if some company is ditching UNIX for Windows – its because their programming is largely moving outside the US.
    AM

  6. There are two paragraphs of note in the TechWorld article:

    “The servers are timed to shut down after 49.7 days of use in order to prevent a data overload, a union official told the LA Times. To avoid this automatic shutdown, technicians are required to restart the system manually every 30 days. An improperly trained employee failed to reset the system, leading it to SHUT DOWN WITHOUT WARNING, the official said.”

    Okay, shutting down without warning is a really, really smart design feature! Not.

    So the question is, did the UNIX system that was replaced have this “feature?” Check out the next paragraph.

    “Soon after installation, however, the FAA discovered that the system design could lead to a radio system shutdown, and put the maintenance procedure into place as a workaround, the LA Times said. The FAA reportedly said it has been working on a permanent fix but has only eliminated the problem in Seattle. The FAA is now planning to institute a second workaround – an alert that will warn controllers well before the software shuts down.”

    Basically, the need for the servers to be restarted periodically to avoid “data overload” is characteristic of the Windows system. And the “data overload” problem is something so inherent in the design of the Windows-based system that the only workaround is to make sure the system is restarted periodically.

    Yeah, Windows, the OS of choice for mission-critical apps. No wonder that every time you go to the mall where there are info kiosks, chances are that 1 in 10 are frozen with a Windows error message on the screen.

  7. A lot of folks won’t remember Pan-Am, it was the world’s biggest airline, six months later it was gone. Why? – security issues resulted in American deaths.

    Everyone says Windows is here to stay, may slowly fade, etc. Windows could disappear in a matter of months if it’s security issues actually have catastrophic effects.

  8. Very nice. It is going to be hard to make for this news. It will look bad, whether it is or not the fault of Windows. People will look at it as Windows fault. Which is good: FUD is striking back and biting its own masters.

    Spread the FUD: they deserve it.

  9. A great many licence agreements have clauses which state that the software isn’t supposed to be installed in applications involved in mission ciritical military or airline environments. I’d be very surprised if MS Windows didn’t have such a clause.

    Regardless, this is clearly the FAA’s fault for choosing this particular software. Evidently someone is very ignorant of the stability and the need to “clean out” windows every month or so. Meat of Moose above is probably right – it is likely a systemic FAA problem.

    The scary part is MS software is running in mission critical environments (whether MS agree to it or not) and so we are all exposed whether we like it or not.

  10. ultimatly the safty of flights lies in the professionals in the cockpits. the airplanes transponers talk to eachother giving pilots exact location and direction of flight. flying it is always the safest way to travel.

  11. Clearly yet another example of some CTO-like person in power who only learned Windows while in school and uses it at home because they do not know any better. I see it all the time. Some new CTO or manager arrives at a company and boots UNIX because they themselves don’t know it or understand it, and are MCSE certified to push MS products. This is costing companies huge amounts of money. Now, our lives are even depending upon it. No wonder the Chinese have outlawed Windows to be used in any government function whatsoever.

  12. DakRoland: You can relax a bit. The servers running on U.S. nuclear subs are Apple Xserve G5s. After the Yorktown incident, the Navy’s learned not to take stupid risks with the some of the most dangerous weapons on earth.

  13. “No air traffic control system should be wholly dependent on just one feature for its safety and effectiveness, and should incorporate redundant systems for both ground-based and plane-based traffic control and warning systems.”

    I agree completely, and I’d go even further: no mission-critical system should depend on just one operating system. Software monoculture is a dangerous thing.

    Snagglepuss,
    You’re being disingenuous with your remark about not having to reboot every 30 days. How often do you have to reboot Windows 2000? And if you say never, you’re a liar. A good friend of mine is CIO for a large financial services company (their backbone network runs Solaris) with several Windows subnets. He told me that as a matter of course they reboot the Windows networks periodically to keep them running optimally. He said that if they don’t, all sorts of problems creep in that render them progressively slower. They haven’t rebooted Solaris in years. Mission critical systems should never have to be rebooted. Just ask any sysadmin who runs VMS. These folks are used to uptimes of decades. They laugh at Unix and Linux as toy systems, and don’t even ask what they think about Windows.

  14. By the way, is it just me, or did anyone else think that the MDN headline was wa-aay over the top? “800-plane disaster”, forsooth! The article states that only five aircraft came “too close” to one another, and I don’t remember much about IFR regulations, but I think that means less than 1 mile of horizontal separation. Someone please correct me if I’m mistaken, but it certainly doesn’t qualify as a “near miss”.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.