Don't Let Your Data Die!
Written by John J. Xenakis for
CFO.com,
Nov 08, 2000.
Why RAIDing your data is the best way to keep it safe.
If you're on the Internet, chances are you need a lot more disk space
to hold all those graphics, movies, sound clips, documents and
databases your Web site uses. All of those have to be stored on a disk
drive, and hopefully the drive won't crash 45 seconds after you run
that big TV ad that draws 1 million new visitors to your site.
High end disk drives have been around for decades, but thanks to the
Internet, the demand for them has never been stronger. Moreover,
unlike their predecessors from 20 or 30 years ago, these disks aren't
dumb, magnetic file cabinets. The latest systems are powerful,
intelligent, complex and expensive.
That's what software development firm Raydium Inc. discovered when it
had its eye on a Symmetrix multi-terabyte disk storage system from EMC
Corp. The software company wanted to store massive amounts of data it
planned to collect about users' behavior on the Internet.
Problem was, it would have cost over $2 million up front, even though
the firm wouldn't need all that storage right away. What's more, the
system required sophisticated maintenance and upgrading. As a result,
the Chicago based company decided to outsource not only its disk
storage, but also its the management of the company's entire Web
site, to a Rolling Meadows, Ill., outsourcer, Telenisus Corp.
(http://www.telenisus.com), one of a new breed of "storage service
providers" (SSPs).
"We looked at several solutions, that would have taken multiple
partners to accomplish, and we felt that as a small startup, with all
our resources dedicated to our business, we didn't want to have to
focus our attention on a disk system and Web site," says Jordan Ho,
vice president of strategic alliances and operations. "They offered us
a great deal of freedom. One of the biggest reasons we went with them
is they provided us with a single point of contact, one stop shop, for
all our needs at the time."
The initial cost saving can be enormous, since you only have to pay
for storage that you're actually using, and administrative and support
costs can be shared with the outsourcers' other clients. For example,
Telenisus charges $5,000 per month for an entry level system with 9
gigabytes of storage, including Web site hosting, security, and other
services.
The major SSPs are listed at the end of this article.
EMC Corp. vs. IBM Corp.
Perhaps, you wouldn't normally expect an ordinary disk drive to have
special functions, but Ho says, "EMC's disks have a Business
Continuance Volume [BCV], an exact copy of the data or database, and
we need an exact copy. It allows us to manage our data in ways that
other storage solutions don't."
It turns out that this feature -- making an extra copy of a disk
volume -- is crucial. This is a precise copy of a disk volume, which
is made as transactions are added during normal processing. The
customer can use the copy to do data analysis or application
development, without disturbing the online database or interrupting
the transaction processing.
Today, the two major competitors for high end disk systems are EMC
Corp.'s Symmetrix systems and IBM Corp.'s Shark systems. Choosing
between these two competitors involves evaluating a mix of pricing,
features and performance issues. Anyone considering using a high end,
mission critical disk system should weigh all of these factors.
Before addressing the "business continuance volume" feature that sold
Jordan Ho on EMC, we have to begin with the most important feature of
all, reliability.
I know from widely published statistics that most of you reading this
column do not regularly make backup copies of your hard disks, even
though your hard disk is just a mechanical device that might simply
stop working at any time. This means that if you don't back up your
disk, then you could turn on your computer tomorrow morning and find
out that you've lost your hard disk, and along with it every memo,
letter, spreadsheet, and your entire customer database.
But if you make daily backups, then at least you can always go back to
the previous day's copy, and you'll have lost only one day's work.
Enough said.
However, backups aren't enough for mission- critical online databases,
where you can't afford to lose one minute's transactions, let alone an
entire day's.
That's why IBM, EMC and other manufacturers implement a technology
known as RAID - Redundant Arrays of Inexpensive Disks. There are
several variations of this 10 year old technology, labeled RAID 1
through RAID 5. What all of them have in common is that instead of
storing data on one very large disk drive, the data, as the name RAID
implies, is stored on an array of small, cheap disk drives in a
redundant fashion, with the idea that if one of the cheap drives is
lost, all the data on it is recoverable by examining the other drives
in the array.
EMC Corp. tends to favor RAID 1, which is the most straightforward of
the five variations. For example, in an array of eight disk drives,
RAID 1 splits them into pairs, so that the data on each drive in each
pair is duplicated on the other drive of the pair. If one drive is
lost, the data is still available on the other drive. This means that
out of the eight drives, four are used for data, and four are used for
copies.
RAID 1 has an obvious performance advantage for certain types of
applications, where overall performance depends on reading data
that's already been stored. The reason is that, with two copies of the
data available, the disk system can perform two separate read
operations simultaneously, one from each disk, meaning that disk read
performance is effectively doubled in those cases. Of course, disk
writes do not double in speed, since each data change must be written
to both disks simultaneously, to keep the disks synchronized.
IBM Corp. tends to favor RAID 5, an algorithm so complex that over the
years some disk vendors have been unable to get it working.
If you have an array of eight disk drives, then RAID 5 writes each
data record in seven pieces on seven of the eight drives. The eighth
drive is called a "parity" drive. If one of the eight drives fails,
then data from the remaining seven drives (which may or may not
include the parity drive) can be used, by means of a very complex
algorithm, to entirely recover the data on the failed drive.
The reason we've gone into these details is obvious: When you buy a
RAID 1 system, only 1/2 of the available physical space is effectively
available; when you buy a RAID 5 system, 7/8 of the available space is
effectively available. That appears to be a huge difference. But if
you recall that the "I" in "RAID" stands for "inexpensive," this may
or may not mean a significant cost difference between the two
systems.
What about performance? The trick about overlapping reads on a RAID 1
system described above also works on RAID 5, but in a different way:
multiple read operations can still be done in parallel, but the
algorithm is much more complex, and we won't attempt to describe it
here. In fact, RAID 5 may permit a little parallelism on write
operations as well as read, improving RAID 5 performance.
On the other hand, on RAID 1, the same amount of data is spread over
a greater number of disks, and so two disk systems with the same
effective amount of disk space may perform better with RAID 1 simply
because the same amount of data is spread over a larger disk array.
The moral? Before you spend several million dollars on a disk system,
be sure you benchmark all the systems to determine how performance
will be with your mix of applications. And don't confuse physical disk
space with effective disk space.
BCV's versus FlashCopy's
The purpose of EMC Corp.'s Business Continuance Volume (BCV)
capability, as we said, is to give you an exact copy of your disk
volume at any particular instant. This feature of EMC's system
appealed to Raydium's Ho.
Recall that EMC's RAID 1 implementations work by creating and
maintaining two disk volumes so that they're constantly identical. The
BCV is a third volume, another copy that's created when you ask the
EMC disk system to maintain a third volume identical to the other two.
The system does this, and at any time you can "break off" the third
volume, and use it for data analysis or software development or
whatever you want.
Does IBM's Shark system have this feature? It turns out that Shark
does have a similar feature, but implemented differently.
IBM's feature, called FlashCopy, works as follows. Shark doesn't
actually create a copy until you request it. When you ask for a copy
of the volume containing your database, Shark appears to create such a
copy instantly, and uses some black magic and trickery to create that
appearance.
When you request your FlashCopy, Shark begins making the copy -- which
might actually take some time to complete. In the meantime, if your
data analysis software actually tries to use data from the new copy --
data that hasn't yet been copied from the original to the new volume
-- the Shark system tricks you by supplying data from the original
drive. Therefore, it looks to your application software that you have
a precise copy of your data at the exact instance you requested -- the
same as in the case of EMC's business continuance volumes.
Both of these algorithms have their advantages and disadvantages in
terms of flexibility, pricing and performance, and your choice will
ultimately be determined by your company's needs.
The Major Storage Service Providers
If you want to outsource your disk storage, you have a number of
choices, since different outsourcers perform different services. Some
provide bare support, doing backups and maintenance as required but no
more. Others host your Web site, provide for firewalls and security,
and perform all sorts of related services.
Prices are a bit high right now, according to William Hurley, analyst
at the Boston based Yankee Group. "One of the major drivers of
outsourcing is that we're seeing a rate of 100% growth, an explosion
in demand for disk capacity, because of the Internet," says Hurley.
"There's a wide variety of pricing schemes in this space, partially
because it's very new, and right now prices tend to be a little high.
But I don't think that price points will bear the pressure, and I
expect them to come down because of competition and technology."
According to Hurley, Telenisus (http://www.telenisus.com) provides
off-site and on-site disk storage outsourcing management, virtual
private networks (VPNs), security and Web site hosting.
Hurley gives the following as the major established SSPs:
StorageNetworks, Inc. (http://www.storagenetworks.com) provides
for primary data storage, tape backup and restore, and high
availability disaster recovery.
StorageWay (http://www.storageway.com) specializes in providing
for storage needs of companies with a high Internet presence.
Arsenal Digital Solutions (http://www.arsenaldigital.com) provides
off-site outsourcing for disk management.
Managed Storage International (http://www.managedstorage.com)
provides disk storage on demand, server backup and management of Web
site content.
Zantaz (http://www.zantaz.com) archives high volume Internet-based
e-mail, documents and transactions, meeting SEC and IRS regulatory
requirements for archiving.
Scale Eight (http://www.s8.com) provides storage on demand with a
focus on rich media and content delivery and distribution.
(This is a modified version of an article that originally
appeared on
Nov 08, 2000
on
CFO.com
at
this location.
)
|