Scalable Distributed Data Structures


Scalable distributed data structures (SDDS) are a family of distributed database structures introduced by Litwin et al. in [LNS93]. They share the following properties [LNS94]:
A number of systems have been built using SDDSs, by Litwin et al. (see  http://ceria.dauphine.fr/witold.html), as well as the P-Grid system at EPFL in Switzerland [ACD03] and the Snowball distributed filesystem at the University of Kentucky [VBW98]. Litwin's work seems very much oriented towards creating a semi-traditional cluster database, while I have not had time to look into the other two systems.

Litwin et al. have modified the original hash-based LH structure to support range queries in RP* [LNS94], by keeping contiguous records within a bucket when splitting it, and (in the absense of multicast) maintaining copies of a Btree-based index on servers. Additional work (e.g. [K00]) has been done on efficiently accessing multi-dimensional and spatial data in an SDDS.

To date, all of the work I have found on SDDSs has been oriented towards the traditional database notion that clients upload their data to a server, and thus the data location may be chosen at insert time. For that matter, unless the SDDS is used solely as an index, this is inherent in any method derived from [LNS93], as when buckets are split a new location in the network is chosen for the new buckets.

There is some work by Gribble at Berkeley [GBHC00] on P2P systems that is vaguely related. In addition, there's some work by Johnson at U. Florida [JK93] that I need to look through at some point, I think.

Bibliography

additionally available in BibTex format

[LNS93] Witold Litwin and Marie-Anne Neimat and Donovan A. Schneider. LH: Linear Hashing for distributed files. 1993. in Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pages 327--336. ACM Press.

[LNS96] Witold Litwin and Marie-Anna Neimat and Donovan A. Schneider. LH* -- a scalable, distributed data structure. 1996. ACM Trans. Database Syst., 21(4):480--525.

[LNS94] Witold Litwin and Marie-Anne Neimat and Donovan A. Schneider. RP*: A Family of Order Preserving Scalable Distributed Data Struc tures. 1994. in VLDB'94, Proceedings of 20th International Conference on Very Large Data Bases.

[K00] Jonas S. Karlsson. HQT*: a scalable distributed data structure for high-performance spatial accesses. 2000. Information organization and databases: foundations of data organization, pages 295--312.

[KW95] Brigitte Kroll and Peter Widmayer. Balanced Distributed Search Trees Do Not Exist. 1995. in Workshop on Algorithms and Data Structures, pages 50-61.

[A02] Karl Aberer. Efficient Search in Unbalanced, Randomized Peer-To-Peer Search Trees. 2002. EPFL Technical Report IC/2002/79.

[ACD03] Karl Aberer and Philippe Cudré-Mauroux and Anwitaman Datta and Zoran Despotovic and Manfred Hauswirth and Magdalena Punceva and Roman Schmidt. P-Grid: a self-organizing structured P2P system. 2003. SIGMOD Rec., 32(3):29--33.

[VBW98] Radek Vingralek and Yuri Breitbart and Gerhard Weikum. Snowball: Scalable Storage on Networks of Workstations with Balanced Load. 1998. Distributed and Parallel Databases, 6(2):117-156.

[GBHC00] Steven D. Gribble, Eric A. Brewer, Joseph M. Hellerstein, and David Culler. Scalable, Distributed Data Structures for Internet Service Construction. 2000. Proceedings of the Fourth Symposium on Operating Systems Design and Implementation (OSDI 2000).

[JK93] Theodore Johnson and Padmashree Krishna. Designing Distributed Search Structures with Lazy Updates. Technical Report TR93-037, University of Florida.



last modified: 9/13/04 pjd