Engineering Disaster Lessons for Digital Security
I watched an episode of Modern Marvels on the History Channel this afternoon. It was Engineering Disasters 11, one in a series of videos on engineering failures. A few thoughts came to mind while watching the show. I will provide commentary on each topic addressed by the episode.
I believe all of these cases can teach us something useful about digital security engineering. The main difference between the first four cases and the digital security world is the failure in the analog world is blatantly obvious. Digital failures can be far more subtle; it may take weeks or months (or years) for secuirty failures to be detected, unlike sink holes in parking lots. The fifth case, describing asbestos, is similar to digital security because harmful effects were not immediately apparent.
- First discussed was the 1944 Cleveland liquified natural gas (LNG) fire. Engineers built a new LNG tank out of material that failed when exposed to cold, torching nearby homes and businesses when ignited. 128 people died. Engineers were not aware of the metal's failure properties, and absolutely no defensive measures were in place around the tank to protect civilian infrastructure.
This disaster revealed the need to (1) implement plans and defenses to contain catastrophe, (2) monitor to detect problems and warn potential victims, and (3) thoroughly test designs against possible environmental conditions prior to implementation. These days LNG tanks are surrounded by berms capable of containing a complete spill, and they are closely monitored for problems. Homes and businesses are also located far away from the tanks. - Next came the 1981 Kansas City Hyatt walkway collapse that killed 114 people. A construction change resulted in an incredibly weak implementation that failed under load. Cost was not to blame; a part that might have prevented failure cost less than $1. Instead, lack of oversight, poor accountability, broken processes, a rushed build, and compromise of the original design resulted in disaster. This case introduced me to the term "structural engineer of record," a person who assigns a seal to the plans used to construct a building. The two engineers of record for the Hyatt plans lost their licenses.
I wonder what would happen if network architectures were stamped by "security engineers of record?" If they were not willing to afix their stamp, that would indicate problems they could not tolerate. If they are willing to stamp a plan, and massive failure from poor design occurs, the engineer should be fired. - The third event was a massive sink hole in 1993 in an Atlanta Marriott hotel parking lot. A sewer drain originally built above ground decades earlier was buried 40 feet under the parking lot. A so-called "safety net" built under the parking lot was supposed to provide additional security by giving hotel owners time to evacuate the premises if a sink hole began to develop.
Instead, the safety net masked the presence of the sink hole and let it enlarge until it was over 100 feet wide and beyond the net's capacity. Two people standing in the parking lot died when the sewer, sink hole, and net collapsed. This disaster demonstrated the importance of not operating a system (the sewer) outside of its operating design (above ground). The event also showed how products (the net) may introduce a false sense of security and/or unintended consequences. - Next came the 1931 Yangzi River floods that killed 145,000 people. The floods were the result of extended rain that overcame levees built decades earlier by amateur builders, usually farmers protecting their lands. The Chinese government's relief efforts were hampered by the Japanese invasion and subsequent civil war. This disaster showed the weaknesses of defenses built by amateurs, for which no one is responsible. It also showed how other security incidents can degrade recovery operations.
Does your organization operate critical infrastructure that someone else built before you arrived? Perhaps it's the DNS server that no one knows how to administer. Maybe its the time service installed on the Windows server that no one touches. What amateur levee is waiting to break in your organization? - The final disaster revolved around the deadly substance asbestos. The story began by extolling the virtues of asbestos, such as its resistance to heat. This extremely user-friendly feature resulted in asbestos deployments in countless products and locations. In 1924 a 33-year-old, 20-year textile veteran died, and her autopsy provided the first concrete evidence of asbestos' toxicity. A 1930 British study of textile workers revealed abnormally high numbers of asbestos-related deaths. As early as 1918 insurance companies were relucant to cover textile workers due to their susceptibility to early death. As early as the 1930s the asbestos industry suppressed conclusions in research they sponsored when it revealed asbestos' harmful effects.
By 1972, the US Occupational Safety and Health Administration arrived on the scene and chose asbestos as the first substance it would regulate. Still, today there are hundreds of thousands of pending legal cases, but asbestos is not banned in the US. This case demonstrated the importance of properly weighing risks against benefits. The need to independently measure and monitor risks outside of a vendor's promises was also shown.
I believe all of these cases can teach us something useful about digital security engineering. The main difference between the first four cases and the digital security world is the failure in the analog world is blatantly obvious. Digital failures can be far more subtle; it may take weeks or months (or years) for secuirty failures to be detected, unlike sink holes in parking lots. The fifth case, describing asbestos, is similar to digital security because harmful effects were not immediately apparent.
Comments
Interesting thought. Without going into any detail regarding how this would occur, I'd think that there would be much simpler network designs, less ad hoc additions to the network, etc.
H. Carvey
"Windows Forensics and Incident Recovery"
http://www.windows-ir.com
http://windowsir.blogspot.com
As always with incidents like these, though, I always caution that hindsight is 20/20, much like New Orleans of present and 9/11 of recent past.
I think the biggest problem in IT is managers and senior mgmt who do not properly understand or give proper respect to these issues. Instead, many of them want "insert latest buzzword here" nownownownow and at little cost. This leaves many competent admins (such as myself) with our hands tied and knowingly making inferior or incomplete efforts on projects, despite our high personal integrity of wanting to achieve high quality (which results in low morale).
-LonerVamp