DATA CENTERS
Automatic failover, back-up sites, outsourcing among the strategies for ensuring data availability

September 10, 2005

   
 

After the 9/11 terrorist attacks, Arch Chemicals upgraded its security—especially IT security.

"We have an initiative that includes how to secure our perimeters, how to secure ourselves from internal attacks, and how to recover if something bad happens," says Al Schmidt, CIO of the Norwalk, Conn.-based global specialty chemicals manufacturer, which has annual sales in excess of $1 billion. "Part of that initiative involved making sure that every department in the company had a plan for recovery, and that all of our plans were current and under tight change control."

While other manufacturers may not be as interested in security in general as Arch, many do share an emphasis on ensuring high availability of mission-critical data. The need to access data at all times to ensure business continuity—despite technical problems or forces of nature—is growing for manufacturers of all sizes, in all vertical industries.

"The rise of globalization and initiatives such as lean manufacturing and just-in-time inventory make it important to be able to exchange data with companies around the world 24/7," says Todd Taylor, who leads the Technology and Infrastructure Solutions team for Hewlett-Packard's (HP) Manufacturing & Distribution Industries unit. In short, today's business environment requires companies to conduct business at all times without interruption, he says.

"As manufacturers work to become flexible enough to adapt to constantly changing business pressures, they realize that the changing communications infrastructure drives a need for information and system availability," adds Taylor. "A lot of work is being done to make IT a line of business that supports operations."

Consider, for example, the automotive industry, where Taylor says IT must support supply chain operations and communications because companies simply can't afford production or supply chain delays caused by lack of access to mission-critical data.

Focus on availability
To ensure business continuity, organizations must be prepared for two types of IT interruptions, says Vick Vaishnavi, director of product marketing at BladeLogic, which offers data center automation software. The first involves individual servers, and the other involves the entire site at which those servers are located.

At the server level, interruptions can stem from things such as server overload at peak usage times, server configuration errors, operator errors, or hardware failures. The solution is to have back-up servers geared to take over as necessary. The key to this scenario—often called a fault-tolerance or failover strategy—is to ensure that back-up servers are configured to completely and accurately reflect the state of the primary servers they will protect.

"If it's done correctly, failover usually appears seamless to an end user—and business continuity is ensured," Vaishnavi says. "One or more end users connected to the primary server may notice a momentary glitch, but business processes will continue. IT organizations must ensure, though, that the back-up servers are always in sync with the primary ones in terms of configuration."

Physical site interruptions happen when an entire physical location of a data center is affected. Potential causes range from acts of God to terrorist activity or serious hacking. The result is an entire set of servers at the primary location is knocked off-line, which could bring business to a halt.

"To prevent this type of interruption, IT organizations often rely on back-up physical sites," says Vaishnavi. "These remote data centers usually are located a few hundred miles away from the primary site, and are configured to kick in when an entire physical site suffers an interruption."

Do it yourself, or partner
HP and BladeLogic—as well as PolyServe and Stratus Technologies—can provide the tools, solutions, and strategies to ensure high data-center availability. However, some users follow a different model—choosing instead to outsource some of the work to mitigate risk.

"Ensuring the appropriate level of information availability involves aligning the right sourcing model—using a third-party provider versus in-house efforts—with the degree of business risk a company is willing to accept," says David Tapper, a program manager with IDC, an IT research firm based Framingham, Mass. "The choices are for companies to assume full management of information availability themselves, follow a fully outsourced model, or fall somewhere between the two. Companies may reduce risk levels by retaining some level of control over information availability while allowing a third party to help with the aspects that may be better performed by a provider."

An IDC white paper published last spring, Ensuring Information Availability: Aligning Customer Needs With An Optimal Investment Strategy, indicates there are significant potential financial and performance benefits for companies that allow a third party to handle some or all of their information-availability services. For instance, the paper reports, using a third party can result in cost savings of more than 30 percent, and companies that don't outsource have recovery times that are 77-percent longer than their outsourcing counterparts.

The point is, outsourcing may make sense for any of numerous reasons, says Jim Grogan, a VP with SunGard Availability Services, an information availability services provider. For instance, company management may have decided that ensuring data availability requires too much technical know-how, and that ongoing maintenance costs have grown too high, he says.

"It's common for a company to want to focus on its core competencies, and outsource to eliminate staff as well as facilities," Grogan says. "That's particularly true for companies concerned with failover strategies that require separate facilities. It doesn't do any good to have two machines running side by side if there is a power outage, fire, or earthquake."

It's important to realize that at its core, this isn't a technical issue, but a business question, Grogan says. The key is determining the most cost-effective method of creating a reliable, real-time IT environment that supports customer service and streamlines manufacturing and supply chain operations.

Arch Chemical answered that question after performing a formal risk analysis to find out what would happen if its systems went down, says Arch's Schmidt.
The company discovered that it needed its systems to be back up within 48 hours, Schmidt says. However, due to the size of its IT infrastructure, Arch Chemical would have a hard time meeting that target.

"We brought in a number of vendors and asked what it would take for us to reach a 48-hour recovery window," Schmidt says. "Some suggested an automatic failover, but that wasn't within our budget. Other approaches included archiving the data. We were looking for something in between—something that would get us back up in 24 to 48 hours, and be within our budget."

Working with SunGard and other vendors, Arch has cut restore time in half while staying in budget, says Schmidt. "Previously, it took well over 48 hours to restore our entire database, but this effort brought restore time down to between 18 and 20 hours."

—Jim Fulcher, contributing editor