CS 677 Distributed Operating Systems

Spring 2005

Programming Assignment 3: A Fault-tolerant Banking System

Due: May 12th (Thursday)

(for off-campus students - 14 days after viewing Lecture 23 )



1 The Problem

In this programming assignment you will implement a Fault-tolerant Banking System.
The assignment uses concepts of fault-tolerance, resynchronization using logs, centralized locking and encryption..

An pictorial representation of the system is as shown in the figure below.


Figure 1: Distributed Fault-tolerant Online Banking System and Components

The system has two important components:

  1. Replicated Database Servers (this is similar to Assignment 2)
    The bank database of accounts (and corresponding information of each) is replicated on both servers.
  2. Coordinator (similar to Assignment 2 but no load balancing)
    The coordinator acts as the interface to the banking database. Each client sends its request to the coordinator which in turn forwards the request to one of the servers to perform the desired action. Additionally, the coordinator also fetches any response from the server and sends it back to the client.


2 Functionalities of the System:
  1. Replicated Database Servers
  2. Coordinator
    Each client request is forwarded to the coordinator, which in-turn forwards the request to both the servers or the server that has not failed.

Some things to keep in mind:


3 Evaluation and Measurement

Correctness
Demonstrate that your system works correctly according to requirements stated in the description and functionality of the system. In particular:

  1. Show that the bank database is distributed and replicated, operations are sent to both servers or single server (based on failure).
  2. Demonstrate that the account-level locking mechanism works correctly. i.e: show (via output snapshots) that simultaneous updates to an account is handled correctly by the locking mechanism.
  3. Demonstrate the logging mechanism, how log is updated on operations and maintained.
  4. Demonstrate the proper functioning of the resynchronization step.
    - between the 2 servers
    - no requests being sent by coordinator
  5. Demonstrate the working of heartbeat messages, by showing that a coordinator initially waits for 2 OP-DONE messsages, but a heartbeat discovers a server failure and waits for only 1 OP-DONE message.
Evaluation
Additionally, experiment with your system to measure its performance in different scenarios and test conditions.
Design and present results of your own experiments to demonstrate the characteristics of the system. A few examples are:
  1. Measure the average time for of requests, when both servers are on and no queuing at the coordinator.
  2. Measure the average time for of requests, when both servers are on and and queuing at the coordinator (i.e., queued due to account-level locking).
  3. Keeping request rate of each client constant, vary the number of clients and measure the latency of each request.
  4. Keep number of clients constant, but vary request rate to measure latency of each request.
  5. Measure how much time resychronization requires, by measuring time between ALIVE and RESYNCH-DONE messages at the coordinator.
    How does this change with log size?

It is important that you describe the results of your experiment and not just describe what the experiment did. Please state what the experiment demonstrates or what you expected and what was seen etc.
These are guidelines only, so be creative in what can be evaluated and measured as part of your experiments to test the system.


4 What you will submit

When you have finished implementing the complete assignment as described above, put all the code in a separate directory in your edlab account (/cs677/project3).

You are required to submit your solution in the form of printouts (please only attach relevant outputs that demonstrate your points and demonstrate functionality, DO NOT printout entire output logs and source code).

Each program must work correctly and be documented. You should hand in:
  1. Outputs generated by running your program. (in EdLab account)
  2. Outputs to demonstrate correct working of the system. (in EdLab account and Printout)
    This is important, as it will show that your system works according to the requirements.
  3. A separate (typed) document of approximately two pages describing the overall program design, a description of "how it works", and design tradeoffs considered and made. Describe clearly how each system is designed and implemented. Also describe possible improvements and extensions to your program (and sketch how they might be made). (Edlab and Printout)
  4. Prepare a list of design considerations you made while designing your system and describe each briefly. This is similar to the design considerations discussed in class of the Email system on the last slide of Lecture 2.
  5. (in Edlab and Printout)
  6. A program listing containing in-line documentation. (in Edlab account)
  7. Instructions to compile and run the code from 677/project3. (in Edlab account)
  8. A separate description of the tests you ran on your program to convince yourself that it is indeed correct. Also describe any cases for which your program is known not to work correctly. (in Edlab account and Printout)
  9. Performance results to test scalability and performance parameters. (in Edlab abd Printout)

    Let us not waste a lot of trees. So, if any of the above turn out to be large, just save the relevant information in a file, leave it on your EDLAB account and submit the name of the file.


    5 Grading policy for all programming assignments

    Grading:

    Grades for late programs will be lowered 12 points per day late.