ZZ: Cheap Practical BFT Using Virtualization
Overview
Despite numerous efforts to improve their performance and scalability, byzantine fault-tolerance (BFT) techniques remain expensive, and few commercial systems use BFT today. We are developing the ZZ system, which allows the replication cost of BFT to be reduced to practically f+1, halving the 2f+1 or higher cost incurred by state of the art approaches. This is possible because ZZ maintains only f+1 replicas in the normal correct case, and activates additional replicas only upon failure. ZZ uses virtual machines for fast replica activation and we have developed several mechanisms to enable rapid recovery after a failure is detected.
Exploiting virtualization means that BFT services can be provided at a much lower cost by multiplexing a small number of spare servers across a large number of applications. The image below shows an example fault tolerant data center where four primary servers run VMs for three different applications (A, B, and C). The additional two free hosts are kept ready to initialize extra replicas for any of the applications after a fault is detected. Such a virtualized data center can easily provide multiple levels of fault tolerance to different applications. For example, in this scenario, application A is provided a strong isolation guarantee since each of it's VMs reside on their own host, as well as a fast recovery time since the extra recovery VM is maintained in a paused state on free server 1. In contrast, applications B and C are not completely isolated, and will experience a larger recovery delay since the additional replicas need to be loaded from disk onto one of the free servers.
Key Ideas
Participants |
|
Publications
ZZ and the Art of Practical BFT
Timothy Wood, Rahul Singh, Arun Venkataramani, Prashant Shenoy, and Emmanuel Cecchet. University of Massachusetts Technical Report TR24-09, 2008.Tech Report
