Data Center Vulnerabilities Exactly why are Modern Data Centers Failing
  4 seasons 2008 began having a dire prediction from Subodh Bapat, smoking president in the eco-computing team at Sun Microsystems, as he declared, “You?ll go to a massive failure every year.” He continued to convey, “We are likely to view a data center failure of their scale,” talking about the worm that took down 5% of worldwide UNIX boxes in 1988.¹ Now the final results citing security lapses as the source instead failure a result of the large computing power required to run today’s applications.Though certainly a serious position, earlier times year has seen a rash of information center failures that can bring into question how reliable single data centers are to the delivery of mission-critical applications.Vulnerabilities between the most common, like natural disasters and infrastructure failure (data center power outage, burst pipes, construction work damaging fibre lines, ) to hardware failure, storage or database failure and customary software problems, have been causing regular disruptions to businesses and are available having a high price.Recent events in the news support the fact that even with good planning, resourcing and design, some of the most sophisticated facilities could experience catastrophic failure.Last summer, the state-of-the-art 365 Data Center, in San Francisco – built with a lot more than $125 million – was offline for hours because of power grip outage by Pacific Gas & Electric that put a substantial part of S . fransisco at nighttime. Subsequently, the backup generators at the facility also experienced failure along for being manually started.² “When researching data centers, new facilities often boast N+2 amounts of redundancy,” says Roger Smith, V.P. of Operations at Ceryx Inc. “However, since these same facilities refill and age, that always becomes N+1, or perhaps some areas no redundancy in any respect.” According to Sun Microsystems Executives, the conventional expected life of a data center is about 10 to 12 many many data centers – built at the start of the dot-com era – now should be rebuilt. “As the person who is to blame for uptime I have to balance which applications are viewed critical by upper-management and clearly communicate the charge and investment required to provide high-availability,” says Roger Smith. “When you present the important points, it becomes clear to everyone that in-house data center couldn’t possibly give you the degrees of redundancy required and in many cases on the co-location level we may need redundancy.”In many cases, no contingency plan could avoid the points that plague individual data centers. On July 14th on this year, the Peer 1 data center in downtown Vancouver – one of many largest facilities in Canada – was offline for pretty much a total day. An underground fire caused massive power outages throughout downtown Vancouver. While backup generators at Peer 1 started without issue, the water-based cooling system failed as firefighters – into their make an effort to douse the fireplace – depleted water pressure required to keep the cooling systems operational. This caused the backup generators to overheat and then for any failover to UPS was on a a brief battery lifespan.³ In the similar event september, The globe, a prominent hosting provider in Houston, experienced a major explosion within their data center, taking a lot more than 9000 customer servers offline for a holiday. Backup generators worked perfectly, nevertheless the hearth department wouldn’t enable the facility to resume power until it had been deemed safe. In some cases servers were physically migrated completely to another facility. A direct consequence in this disaster the globe was applauded for his or her reaction to the crisis; allocating every resource they can to treat the condition and proactively communicating status reports and issuing SLA credits. Google, whose Enterprise App customers experienced multiple outages on August 6th, 11th and 15th with this year, took a more reactive stance, promising to create a communication dashboard and issuing a blanket credit for all those customers, no matter whether we were holding impacted by the outage.The important question remains, what is the tariff of data center failure as well as the resulting downtime for organizations? Is it insured by SLA credits? Most SLA credits reflect the price tag on the assistance rendered and almost never look after business losses.In the Continuity Insights Management Conference in 2006, Agility Recovery Solutions stated that 78% of companies that are suffering a catastrophe without a contingency plan are belly up within 24 months. And 90% of companies unable to resume business operations within 5 days of your disaster are bankrupt within 1 year. Clearly some applications are considered more critical and still have more visibility as opposed to others. Large companies have the impact immediately when their ERP, CRM (SaleForce.com remains plagued by a prime-time outage more than two years ago the consequence of failure having an Oracle Database Cluster ?), Business Intelligence or E-mail systems become unavailable.However, together with the proliferation of cellular devices and ‘everywhere access’, e-mail clearly is considered the premier mission-critical employing today. Systems like Lotus Notes® and Microsoft® Exchange conserve a living record of any company’s existence, storing every activity, process and thought a corporation as well as its employees have. It’s no surprise public companies at the moment are necessary to conserve a record of e-mail activity for compliance purposes.Even though the vast majority of businesses make use of e-mail everyday to send contracts, proposals, quotes and the majority of of correspondence, most e-mail systems have never yet reached the stage that reliability that telephone service provides (99.999% or 5.2 minutes of downtime each year) According to Osterman Research, most United states businesses experience more than one e-mail outage each and every month — and lots of indicate that they can could lose more than $100,000 as a result of one particular major e-mail outage.¹ Osterman also found out that the common business experiences nearly seven hours of e-mail downtime annually and that outages may bring many workers to some virtual standstill, who on average are 25% less productive during e-mail downtime.”Forget the very fact my billing rate gets impacted easily can’t access my email system,” says somebody in a major United states lawyer who would rather remain anonymous. “My company image gets tarnished immeasurably while i am implementing a multi-million dollar, highly-confidential deal and that i should send a couple of documents using my Hotmail account because my email method is down. Somebody gets fired for your.”Michael Osterman, goes on to say, “Organizations are not meeting their targets for messaging system availability,” and adds the average e-mail system experiences about 70 minutes of downtime in a typical month, which translates to 99.84% uptime. To this particular he poses the question, “Is this sufficiently good?” ¹¹  Ceryx Inc., a Hosted Microsoft Exchange provider with data center facilities in Canada plus the United States, doesn?t think so. These folks were the initial in the market to offer a real 100% SLA dependant on their multi-data center architecture and software design. Customers? results are replicated in real-time and resides both in data centers – in excess of 500 miles apart – to ensure even in the wedding of catastrophic failure, the main system would fail over and done with almost no impact to your end-user. “We run on the premise that even reliable data center can and will experience failure due to circumstances beyond anyone’s control,” says Dr. David Penny, CIO at Ceryx. “We focus our R&D on keeping the applying highly available and make use of our replication technology to mitigate the vulnerabilities that can be found on the data center level. Therefore we increase the risk for operating and capital investments essential to execute daily.”For yesteryear Four years Penny and the team have worked with Enterprise Messaging systems, like Lotus Notes and Microsoft Exchange, developing technology to supply high availability. Since 2004 they’ve been providing a geo-replicated Microsoft® Exchange 2003 intend to medium and large-sized companies who begin to see the cost as well as benefits associated with the Ceryx solution.Not too long ago Dr. Penny and his team have already been utilizing Geographic Clustering in Server 2008 and native Microsoft Exchange 2007 CCR (Cluster Continuous Replication) technology. What this will give for is clustering over a wide area network. Traditional clusters, which depend upon exactly the same RAID system so as to continue to function properly, are given to logical corruption and certain physical corruptions that can propagate across a whole RAID array causing complete failure. Geo-Clustering eliminates the reliance of redundant servers on a single set of disks thereby eliminating a very common single reason for failure.”Even with WAN replication we should instead ensure that the corruption itself isn?t replicated,” says Dr. Penny. Just for this they’re utilizing log-shipping with delayed application rather than block-level replication, thereby avoiding the replication of corruptions a result of application defects. By monitoring performance about the primary system closely they’re able to stop bad changes from being dedicated to the secondary system.At night physical vulnerabilities of a single data center, Ceryx remains safe and secure against many other vulnerabilities anyone employing a single data center is exposed to. “When negotiating our contract, our provider knows how easy it’s for us to advance facilities,” says Roger Smith. “The info is already replicated and that we don?t must physically migrate servers. Migration to a different facility may occur without any impact to customers. We can not be held hostage with a bad contract or radical increases in pricing or continued poor performance.”Ceryx also has a great deal of flexibility where routing is concerned and really should a backbone be down or congested, Ceryx with front-end servers operating at both facilities, has the flexibility to route traffic by way of a separate facility and bypass potential network congestion that can plague operators running out of a single data center.While there are numerous of solutions already in the market that offer continuity via an interim e-mail system in the event of downtime, the Ceryx method is different because it does not have to have the user to even change settings in the event the e-mail system fails up to the secondary facility. Moreover, stuff like e-mail history, sent items and calendar entries all remain intact.In this regard the Ceryx solution is not just a continuity solution but instead a high-availability solution that can offer layers of redundancy, from your software level up for the facility level.  Hosted archiving solutions – an excellent insurance policy for service repair shop facing regulatory and legal compliance – offers a layer of assurance and access to e-mail records, if your primary facility suffer complete failure. However, these solutions will not likely provide business continuity or availability.Moreover, in the event the primary e-mail provider experiences failure due to data corruption, the info being archived can be corrupt also. Large data stores, even at the mailbox level, lead to corruption as well as the current trend of Hosted Exchange vendors selling e-mail accounts with massive storage allowances is introducing a larger possibility of data corruption and subsequent failure. A superb archiving strategy may be used to keep mailbox sizes manageable and subsequently limit the probability of corruption.So while extremely useful for today’s world of mission-critical e-mail, archiving with an external hosted facility must not be mistaken for the multi-data center strategy. Instead archiving is a great backup plan and won’t give the protection businesses today need from the inevitable vulnerabilities that exist which has a single-data center strategy. These vulnerabilities are generally covered inside small type of a facility?s SLAs, in the term “Force Majeure”; a phrase often translated as a possible “Act of God? or literal French translation, “Superior Force” and is included as a clause to excuse interruptions in services brought on by extraordinary circumstances after dark management of the provider. Circumstances that – as demonstrated within the last few year – have grown to be a lot more common.Michael Osterman concludes, in their presentation within the Incredible importance of E-mail Continuity, that the only means to fix the inevitable problems that plague mission-critical service delivery is to use a geo-replicated, multi-data center solution, just like the one for sale by Ceryx.Footnotes: ¹ CNET News: http://news.cnet.com/8301-10784_3-9828570-7.html ² Data Center Knowledge: http://www.datacenterknowledge.com/archives/2007/07/24/generator-failures-caused-365-main-outage  ³ Data Center Knowledge: http://www.datacenterknowledge.com/archives/2008/07/15/vancouver-power-outage-kos-plenty-of-fish/  4 Data Center Knowledge: http://www.datacenterknowledge.com/archives/2008/06/01/explosion-at-the-planet-causes-major-outage/  5 Center Networks: http://www.centernetworks.com/the-planet-data-center-fire  6 CIO WebBlog: http://www.cio-weblog.com/50226711/google_manning_up_for_august_outages.php 7 London Chamber of Commerce Study, 2006 8 The value of Messaging inside the Enterprise: A survey of email application continuity, Applicationcontinuity.org, 2006 9 CIO WebBlog: http://www.cio-weblog.com/50226711/salesforcecom_outage_root_cause_oracle.php  10 BNET Business Network: http://findarticles.com/p/articles/mi_m4PRN/is_2008_July_8/ai_n27893385  ¹¹ Appcon 2007: Application Continuity Conference, ‘The need for Email Continuity’ – (webinar at http://www.teneros.com/infocenter/ ) Â
Related posts:






