Spring 2015
Programming Assignment 3: Internet of Things - Fault tolerance, Replication, and Consistency
Due: 5pm, Wed April 29, 2015
It is assumed that any replica can serve any sensor / device. the gateway replicas implement a consistency technique to ensure that their states (e.g., database states) are syncronized. You may choose any connsistency mechanism for this purpose but be sure to clearly describe the algorithm used in your design document and also discuss the consistency semantics provided by your chosen approach.
Next, implement a cache in the front-end tier to enhance performance of the gateway. The cache will store all recently accessed data items / query results from the database in the in-memory cache. When the front-end tier needs to make a request to the database tier, it should first look in the in-memory cache to see if the results are already cached (and if so, use it). In the event of a cache miss, the front tier should make a request to the database tier like before. Assume that the cache can store up to maximum N item (N should be configurable). Implement a simple cache replcaement startegy such as LRU or least frequently used policy to evict cached items when a new item needs to inserted in the cache and the cache is full.
Since the cache holds copies of certain data from the database, you should extends your cache consistency technique to handle replicated data items in the cache as well as that in the database replicas.
Since the gateway is replicated, you should also make your gateway fault tolerant. It is sufficient to handle crash faults (Byzantine faults need not be handled). Also for simplicity, assume that the both tiers of a gateway replicas fail at once and in this case, the other gateway needs to take over the functions of the failed replica. A gateway node needs to dynamically determine the failure of the other replica (this can be done by any method that you choose such as exhanging "I am alive" heartbeat messages). Upon detecting a failure, the remaining gateway replica implements a failure recovery algorithm thay involves taking over the responsibility of servicing all sensors and devices that were communicating with the failed replica. Your failure recovery method needs to inform the sensors / devices of the failure and have them reconfigure themselves to communicate with the new replica for subsequent requests. While the failure recovery "algorithm" can be straightforward, clearly document how failures are detected and all the steps your replica peforms to take over the functions of the failed gateway. Also explain if failiures can lead to any data loss in your system (which will depend on the choice of your consistency mechanisms that syncronize state between the replicas) and the impact of any such data loss. Implement your replication, caching, cache consistency and fault tolerance techniques in your code. This lab does not need vector, logial clocks or leader election aspects of lab 2 and it is fine to simply assume that clocks are syncronized and simple timestamptsfor determining event ordering.
Design for Paxos: The final part of this lab requires you to provide a design if you were asked to implement Paxos in this sytem (you only need to writeup a high-level design / algorithm in your Design doc and do not need to implement the algorithm). Assume that there are k gateway replicas and that each request is sent to all of them and the replicas run Paxos to reach agreement on the answer before providing a reply to a request. How might such a system work? Explain clearly how the Paxos algorithm can be used by your gateway nodes and you would have implemented it in your current design. Do not blindly cut and paste thr algorithm from the class slides or from the Internet - you are expected to gain some faimiliarity with it and come up with a design that uses Paxos. Provide a writeup of your design with the main design document (no implementation is necessary to get credit for this part).
Extra Credit: This part is optional. For extra credit, implement your Paxos design in the gateway nodes and conduct simple experiment to demonstrate it works (e.g., the system functions even when nodes fail or one of the node produces an incorrent answer).
You are free to develop your solution on any platform, but please ensure that your programs compile and run on the edlab machines (See note below).
Make necessary timeline plots or figures to support your conclusions.