This is a paper summary for “Remus: High Availability via Asynchronous Virtual Machine Replication” from University of Britishi Columbia
The paper question is
The Remus paper’s Figure 6 suggests that less frequent checkpoints can lead to better performance. Of course, checkpointing only every X milliseconds means that up to X milliseconds of work are lost if the primary crashes. Suppose it was OK to lose an entire second of work if the primary crashed. Explain why checkpointing every second would lead to terrible performance if the application running on Remus were a Web server.
My Question for the paper is
The paper summary through 7 key questions
- Focus / Problem to be solved
- Unique contributions
- Possible applications