Monday, July 09, 2007

More Engineering Disasters

I've written several times about engineering disasters here and elsewhere.

Watching more man-made failures on The History Channel's "Engineering Disasters," I realized lessons learned the hard way by safety, mechanical, and structural engineers and operators can be applied to those practicing digital security. >In 1983, en route from Virginia to Massachusetts, the World War II-era bulk carrier SS Marine Electric sank in high seas. The almost forty year-old ship was ferrying 24,000 tons of coal and 34 Merchant Mariners, none of whom had survival suits to resist the February chill of the Atlantic. All but three died.

The owner of the ship, Marine Transport Lines (MTL), blamed the crew and one of the survivors, Chief Mate Bob Cusick, for the disaster. Investigations of the wreck and a trial revealed the Marine Electric's coal hatch covers were in disrepair, as reported by Cusick prior to the disaster. Apparently the American Bureau of Shipping (ABS), an inspection organization upon which the Coast Guard relied, but funded by ship operators like MTL, had faked reports on the Marine Electric's status. With gaping holes in the coal hatches, the ship's coal containers filled with water in high seas and doomed the crew.

In the wake of the disaster, the Coast Guard recognized that ABS could not be an impartial investigator because ship owners could essentially pay to have their vessels judged seaworthy. Widespread analysis of ship inspections revealed many similar ships and others were unsound, and they were removed from service. Unreliable Coast Guard inspectors were removed. Finally, the Coast Guard created its rescue swimmer team (dramatized by the recent movie "The Guardian") to act as a rapid response unit.

The lessons from the Marine Electric disaster are numerous.

  1. First, be prepared for incidents and have an incident response team equipped and trained for rapid and effective "rescue."

  2. Second, be suspicious of reports done by parties with conflicts of interest. Stories abound of vulnerability assessment companies who find all of their clients "above average." To rate them otherwise would be to potentially lose future business.

  3. Third, understand how to perform forensics to discover root causes of security incidents, and be willing to act decisively if those findings demonstrate problems applicable to other business assets.

In 1931, a Fokker F-10 Trimotor carrying eight passengers and crew crashed near Kansas City, Kansas. All aboard died, including Notre Dame football coach Knute Rockne. At the time of the disaster, plane crashes were fairly common. Because commercial passenger service had only become popular in the late 1920's, the public did not have much experience with flying. The death of Knute Rockne caused shock and outrage.

Despite the crude state of crash forensics in 1931, the Civil Aeronautics Authority (CAA) determined the plane crashed because its wood wing separated from its steel body during bad weather. TWA, operator of the doomed flight, removed all F-10s from service and burned them. Public pressure forced the CAA, forerunner of today's Federal Aviation Administration, to remove the veil of secrecy applied to its investigation and reporting processes. TWA turned to Donald Douglas for a replacement aircraft, and the very successful DC-3 was born.

The crash of TWA flight 599 provides several sad lessons for digital security.

  1. First, few seem to care about disasters involving new technologies until a celebrity dies. While no one would like to see such an event occur, it's possible real change of opinion and technology will not happen until a modern Knute Rockne suffers at the hands of a security incident.

  2. Second, authorities often do not have a real incentive to fix processes and methods until a tragedy like this occurs. Out of this incident came pressure to deploy flight data recorders and more robust aviation organizations.

  3. Third, real inspection regulations and technological innovation followed the crash, so such momentum may appear after digital wrecks.

The final engineering disaster involves the Walt Disney Concert Hall in Los Angeles. This amazing, innovative structure, with a polished stainless steel skin, was completed in October 2003. When finished, visitors immediately realized a problem with its construction. The sweeping curves of its roof acting like a parabolic mirror, focusing the sun's ray like laser on nearby buildings, intersections, and sections of the sidewalk. Temperatures exceeded 140 degrees Fahrenheit in some places, while drivers and passersby were temporarily blinded by the glare.

Investigators decided to model the entire facility in a computer simulation, then monitor for the highest levels of sunlight over the course of a year. Using this data, 2% of the building's skin was discovered to be causing the reflection problems. The remediation plan, implemented in March 2005, resulted in sanding problematic panels to remove their sheen. The six-week, $60,000 effort fixed the glare.

The lessons from the concert hall involve complexity and unexpected consequences. Architect Frank Geary wanted to push the envelope of architecture with his design. His innovation caused a building that no one, prior to its construction, really understood. Had the system been modeled before being built, it's possible problems could have been avoided. This situation is similar to those involving enterprise network and software architects who design systems that no single person truly understands. Worse, the system may expose services or functionality never expected by its creators. Explicitly taking steps to simulate and test a new design prior to deployment is critical.

Digital security engineers should not ignore the lessons their analog counterparts have to offer. A commitment to learn from the past is the best way to avoid disasters in the future.

6 comments:

Paul Schmehl said...

An interesting anecdote supporting your second disaster's lessons learned. After a database at UT Austin was broken into, and the names of such luminaries as the System Chancellor and the current Texas Governor were exposed, interest in security issues at all UT schools significantly increased.

For the first time, security positions were defined and funded, all schools were mandated to create security offices and security policies were reviewed and extensively revised.

There really is nothing new under the sun.

LonerVamp said...

I like how only 2% of the building skin was causing the problems, and with careful planning, took only $60,000 (directly) to fix. Such issues are sometimes canned and results in costs or losses of millions in the wake of knee-jerk reactions.

Excellent examples, and I think your added bullets are spot on.

Anonymous said...

Very enjoyable read. I too enjoy the Engineering Disasters program.

http://www.architectsban.webs.com said...
This comment has been removed by a blog administrator.
control valves said...

I believe construction of such projects requires knowledge of engineering and management principles and business procedures, economics, and human behavior.

gate valves said...

what a terrible disaster.