Enterprise Data Centralization
I've written about thin client computing for several years. However, I haven't written about a natural complement to thin client computing -- enterprise data centralization. In this world, the thin client is merely a window to a centralized data store (sufficiently implemented according to business continuity processes and methods like redundancy, etc.). That vision can be implemented today, albeit really only where low-latency, uninterrupted, decent bandwidth is available.
Thanks to EDD Blog I just read an article that makes me think legal forces will drive the adoption of this strategy: Opinion: Data Governance Will Eclipse CIO Role by Jay Cline. He writes:
In response to the new U.S. Federal Rules on Civil Procedure regarding legal discovery, for example, several general counsels have ordered the establishment of centralized "litigation servers" that store copies of all of the companies’ electronic files. They think this is the only way to preserve and cheaply produce evidence for pending or foreseeable litigation. It’s a very small leap of logic for them to propose that all of their companies’ data, not just copies, should be centralized...
Data must soon become centralized, its use must be strictly controlled within legal parameters, and information must drive the business model. Companies that don’t put a single, C-level person in charge of making this happen will face two brutal realities: lawsuits driving up costs and eroding trust in the company, and competitive upstarts stealing revenues through more nimble use of centralized information.
The rest of the article talks about the role of CIOs, CTOs, "chief information strategists," etc., but I don't care about that. I care about the data centralization aspect.
For me, data centralization will be a major theme in my new job. If only to meet ediscovery requirements, at the very least, copies of all business information will need to be stored centrally. This strategy will give users of any computing platform the flexibility to create information locally, but that data will quickly find a second home (at the very least) in the central data store. Ideally (once bandwidth is ubiquitous) all business data will reside centrally, from creation to destruction (in accordance with data rentention and data destruction policies). Furthermore, that data will be subjected to protections at the document level, not just at the application, OS, and platform level.
This strategy addresses many problems very nicely.
What data centralization and/or thin computing is your organization pursuing, and why?
Thanks to EDD Blog I just read an article that makes me think legal forces will drive the adoption of this strategy: Opinion: Data Governance Will Eclipse CIO Role by Jay Cline. He writes:
In response to the new U.S. Federal Rules on Civil Procedure regarding legal discovery, for example, several general counsels have ordered the establishment of centralized "litigation servers" that store copies of all of the companies’ electronic files. They think this is the only way to preserve and cheaply produce evidence for pending or foreseeable litigation. It’s a very small leap of logic for them to propose that all of their companies’ data, not just copies, should be centralized...
Data must soon become centralized, its use must be strictly controlled within legal parameters, and information must drive the business model. Companies that don’t put a single, C-level person in charge of making this happen will face two brutal realities: lawsuits driving up costs and eroding trust in the company, and competitive upstarts stealing revenues through more nimble use of centralized information.
The rest of the article talks about the role of CIOs, CTOs, "chief information strategists," etc., but I don't care about that. I care about the data centralization aspect.
For me, data centralization will be a major theme in my new job. If only to meet ediscovery requirements, at the very least, copies of all business information will need to be stored centrally. This strategy will give users of any computing platform the flexibility to create information locally, but that data will quickly find a second home (at the very least) in the central data store. Ideally (once bandwidth is ubiquitous) all business data will reside centrally, from creation to destruction (in accordance with data rentention and data destruction policies). Furthermore, that data will be subjected to protections at the document level, not just at the application, OS, and platform level.
This strategy addresses many problems very nicely.
- Ediscovery: With at least copies of all data stored locally, all relevant data can be searched and produced.
- Business Continuity: If your computing platform is destroyed, all your data (or at least a copy) is stored elsewhere.
- Incident Recovery: As I said in my Five Thoughts on Incident Response:
Today, in 2007, I am still comfortable saying that existing hardware can usually be trusted, without evidence to the contrary, as a platform for reinstallation. This is one year after I saw John Heasman discuss PCI rootkits (.pdf)... John's talks indicate that the day is coming when even hardware that hosted a compromised OS will eventually not be trustworthy.
One day I will advise clients to treat an incident zone as if a total physical loss has occurred and new platforms have to be available for hosting a reinstallation. If you doubt me now, wait for the post in a few years where I link back to this point. In brief, treat an incident like a disaster, not a nuisance. Otherwise, you will be perpetually compromised.
With thin client computing and data centralization, incident recovery means discarding the old computing platform and starting with a fresh one. - System Administration: We will avoid Marcus Ranum's "Infocalypse," preventing every man, woman, and child from becoming a Windows system administrator. Scare IT staff will administer centralized systems and end users will no longer have the power or need to install software. The Personal Computer will be replaced by a window to the Business Computer, although the platform itself might be a consumer platform like a smartphone. (Of course, PCs will still be options outside business needs.)
- Information Lifecycle Management: ILM includes data classification, defense, retention, and destruction. With all data in a central location, it will be easier to classify it and apply classification-appropriate handing and defense tools and techniques. It is important to remember that not all data is worth the same value and trying to protect it all with the same tools and techniques is too costly. (Did you know the US Postal Service will carry up to Secret classified data within the US, provided it is wrapped appropriately and sent via registered mail? The idea is that the risk of interception is worth the savings over having a courier transport Secret material.)
What data centralization and/or thin computing is your organization pursuing, and why?
Comments
What I find intriguing about this article is that this so-described centralized pendulum effect of data centralization (data warehousing, BI/DI) and resource centralization (D.C. virtualization, WAN optimization/caching, thin client) seem to be on a direct collision course with the way in which applications and data are being distributed with Web2.0/SOA and underpinnings such as AJAX...
How do you balance centralizing data when the infrastructure and information architectures are bound and determined to chew it up and spit it out willy-nilly?
Something doesn't compute here...
Blog entry should be finished soon discussing this...
/Hoff
http://rationalsecurity.typepad.com
I forgot to recognize the distinction in my last comment between my response to both the original article and your commentary.
The original article isn't focused on "data centralization" at all; it talks about a "...centralized function for data governance" which are two completely different things.
My comments were directed at your article, not the original. Sorry for the confusion.
/Hoff
So, your central repository idea must not be used to force undesirable centralization on IT; instead, other strategies such as replication and/or metadata tagging/searching must be used.
The replication idea is a bad one, because you've then created the Mother of All Targets for an attacker - he gets into your big infosec data warehouse, and the game's over. Instead, the way to accomplish what you're talking about (along with a lot of other things which will benefit IT and the business) is to develop an information/data architecture which allows this 'centralization' to be virtual in nature.
Well said. An organization that I used to work for did some testing for thin client computing. While it does make the admin's life easier and of course an array of other things. People still like having their computer right there on their desktop.
They feel as if this is 'their space' in the office, and computer is 'personal'. Taking the computer away and forcing them to a keyboard, mouse, and monitor only was widely rejected.
While I agree with you in theory, in the analysis line of work that we worked in, thin client computing wasn't an option.
While I normally enjoy your POV on things. I have to fundamentally disagree on this. Data, as well as the "data center", is becoming increasingly virtualized. For example, examine Google's data processing methodologies, there is no central repository. In fact, the data itself is nothing but shards stored across thousands of boxes.
While the user may see the data as being in a given place, it has nothing to do with where the data is or how it is stored.
Our classic constructs of security boundaries and enclaves fundamentally fall down when there is no longer an easily identifiably 'chokepoint' where overarching security policies (speaking about things like fw rules, ids monitoring, etc.) can be implemented.
While the concept of centralized data _management_ is a great and incredible thing; the idea of that data, in the near future, being centraly located is a farce.
Take any building in any city in the world. Connect it to the nearest data center, which should be a carrier neutral facility, preferably an internet exchange, preferably SAS70 Type 2 compliant, preferably dual-power-grid and "class 5".
The office should connect to the Internet using 2 disparate metro fiber providers using Optical Ethernet. Also connect to the Internet using a fixed wireless solution or two. Run BGP to the ISP's over said links and run GRE tunnels and IPSec over routing protocols to the BGP-connected data center. Some cities may not have these options, but there may be similar options so try to get it as close as possible.
Take all the servers out of your office. Take all the desktops out. Take all the CRT's out. Write them off as a loss. You'll have patch panels, switches, and a few routers (maybe router-firewalls or just firewalls) as your entire infrastructure at the office. Thin clients and dual-headed LCD's should be on every user desk.
Build a standard enterprise data center. Run LTSP version 5. Connect thin clients at office over MAN and boot from PXE. If Windows is required for certain users (legitimate uses that I can think of: accounting with QuickBooks, CAD, graphic design with Adobe CS3 - MS Office does *not* count, IE7 does *not* count, etc) - then those users can connect to a 64-bit Windows Server 2003 Data Center Edition SP2 cluster with NLB. Everyone else can use Firefox and OpenOffice under a clustered CentOS LTSP.
Users that require phone access will get a softphone and a computer headset that connects over the MAN to a centrally-managed IPBX cluster such as CCM or Asterisk. Some users may not require outside phone access, so they will be given internal extensions only. All digital circuits for voice will also be in the nearby data center... except a few redphones in the office labeled for E911 use only that connect to analog lines that you don't even need to pay for service on.
Make some users mobile and give them Samsung Q1U UMPC's with Vista/BitLocker, SSL VPN, softphones+headsets, and an EVDO Rev A (or similar) connection.
My current company does some of the above (not all yet), but anywhere I worked - I would try to work towards a similar model.
You mean, like, mainframes?
:)