Storage ought to be one of those things that IT
managers can deal with out of their back pockets. Need
more? Just buy more disks.
Are there trusty rules of thumb for this? Not on your
life. It's a complex equation that drives demand for
data storage in enterprises, and all the old rules of
thumb are changing.
I learned about this from a white paper, "Rules of
Thumb in Data Engineering," written by Jim Gray and
Prashant Shenoy of Microsoft and presented at last
spring's IEEE International Conference on Data
Engineering. (You can read the paper at http://research.microsoft.com/~gray.)
Some key points:
• Because the growth of disk capacity is outpacing
that of speed to access the disk, the system "cost" of
access is rising. Data engineers are working on schemes
where the disks are accessed sequentially, like tape
drives, rather than randomly to keep the cost of these
"fetches" down.
• Tape drives are being relegated to use as data
archives because it can take days to reload all the
information in a multiterabyte tape drive. Automated
tape libraries help, turning off-line storage into
near-line storage, but disk drives are almost as
economical. Many companies now keep an entire set of
duplicate disk systems at remote locations as backups.
• RAM costs are falling faster than the costs of
magnetic storage. A megabyte of RAM used to cost 10
times as much as a megabyte of disk RAM and 1,000 times
as much as tape RAM. Now, 1MB of RAM costs only three
times as much as 1MB of disk RAM and 10 times as much as
tape RAM. So when in doubt, put it in RAM.
To net it out, processor speed improvements are
outpacing main memory improvements, which are outpacing
magnetic media access time improvements. More
information on disk must be cached so that the
information on the disk can be read sequentially, and
the caches themselves must get bigger in order to keep
memory full.
Put anything over a network, and the storage equation
grows even more complex. The overhead of sending
messages around wide-area networks is so much more than
sending a message from a computer to a disk drive that,
according to Gray and Shenoy, it pays to cache any Web
page that will ever be called up again.
There are four implications for IT professionals. One
is that the performance of tomorrow's systems will be at
least as dependent on the data transfer and caching
software running on them as on the hardware itself.
Second, the proliferation of caches in and around the
network will stress current system management tools.
Third, storage dynamics and optimal system design will
vary from application to application with, say,
scientific computing and Internet commerce representing
two extremes. Fourth, no one but you will understand
this.
Designing multiple, complex applications will be
tough enough in the next few years. Deciding how to
optimize performance by implementing storage management
systems will add to the challenge. How much do you
cache? Where do you locate proxy servers? Do you go with
RAID 5 (efficient with space) or mirroring (efficient
with access)? And so on.
This is the rocket science of IT systems management.
It's not something others in your organization care to
know about or are even capable of appreciating. But you
should.