Scalable Distributed Data Structures
Scalable distributed data structures (SDDS) are a family of distributed
database structures introduced by Litwin et al. in [LNS93]. They share
the following properties [LNS94]:
- there is no central directory, avoiding read hotspots
- client "images" - i.e. client information on where data is
located - may be outdated, and is only adjusted in response to read
queries
- a client may send a request to the incorrect server, which will
be forwarded to the correct server and the client image will be updated
A number of systems have been built using SDDSs, by Litwin et al.
(see http://ceria.dauphine.fr/witold.html),
as well as the P-Grid system at
EPFL in Switzerland [ACD03] and the Snowball distributed filesystem at
the University of Kentucky [VBW98]. Litwin's work seems very much
oriented towards creating a semi-traditional cluster database, while I
have not had time to look into the other two systems.
Litwin et al. have modified the original hash-based LH structure to
support range queries in RP* [LNS94], by keeping contiguous records
within a bucket when splitting it, and (in the absense of multicast)
maintaining copies of a Btree-based index on servers. Additional work
(e.g. [K00]) has been done on efficiently accessing multi-dimensional
and spatial data in an SDDS.
To date, all of the work I have found on SDDSs has been oriented
towards the traditional database notion that clients upload their data
to a server, and thus the data location may be chosen at insert time.
For that matter, unless the SDDS is used solely as an index, this is
inherent in any method derived from [LNS93], as when buckets are split
a new location in the network is chosen for the new buckets.
There is some work by Gribble at Berkeley [GBHC00]
on P2P systems that is vaguely related. In addition, there's some work
by Johnson at U. Florida [JK93] that I need to look through at some
point, I think.
Bibliography
additionally available in BibTex format
[LNS93] Witold Litwin and Marie-Anne Neimat and Donovan A. Schneider.
LH: Linear Hashing for distributed files.
1993.
in Proceedings of the 1993 ACM SIGMOD international conference on
Management of data, pages 327--336.
ACM Press.
[LNS96] Witold Litwin and Marie-Anna Neimat and Donovan A.
Schneider.
LH* -- a scalable, distributed data structure.
1996.
ACM Trans. Database Syst., 21(4):480--525.
[LNS94] Witold Litwin and Marie-Anne Neimat and Donovan A.
Schneider.
RP*: A Family of Order Preserving Scalable Distributed Data Struc
tures.
1994.
in VLDB'94, Proceedings of 20th International Conference on Very
Large Data Bases.
[K00] Jonas S. Karlsson.
HQT*:
a scalable distributed data structure for high-performance spatial
accesses.
2000. Information organization and databases: foundations of data
organization, pages 295--312.
[KW95] Brigitte Kroll and Peter Widmayer.
Balanced
Distributed Search Trees Do Not Exist.
1995.
in Workshop on Algorithms and Data Structures, pages 50-61.
[A02] Karl Aberer.
Efficient
Search in Unbalanced, Randomized Peer-To-Peer Search Trees.
2002.
EPFL Technical Report IC/2002/79.
[ACD03] Karl Aberer and Philippe Cudré-Mauroux and Anwitaman
Datta and Zoran
Despotovic and Manfred Hauswirth and Magdalena Punceva and Roman
Schmidt.
P-Grid: a self-organizing structured P2P system.
2003.
SIGMOD Rec., 32(3):29--33.
[VBW98] Radek Vingralek and Yuri Breitbart and Gerhard Weikum.
Snowball:
Scalable Storage on Networks of Workstations with Balanced Load.
1998.
Distributed and Parallel Databases, 6(2):117-156.
[GBHC00] Steven D. Gribble, Eric A. Brewer, Joseph M. Hellerstein,
and David Culler. Scalable, Distributed Data Structures for
Internet Service Construction. 2000. Proceedings of the Fourth
Symposium on Operating Systems Design and Implementation (OSDI 2000).
[JK93] Theodore Johnson and Padmashree Krishna.
Designing
Distributed Search Structures with Lazy Updates.
Technical Report TR93-037, University of Florida.
last modified: 9/13/04 pjd