CAP Theorem


CAP theorem is a fundamental theorem in distributed systems.  It states that a distributed data store or any distributed system with a state can have at most two of the following three properties. 

  • Consistency
  • Availability
  • Partition Tolerance

Woah!! 


A lot of terms. But the definition mentioned above is not complete. 


Let’s break it down and understand step by step. 



Distributed Systems


A distributed system contains multiple nodes which are physically separate but linked together by the network.  All nodes communicate with each other and together form a distributed system. 

 A single node cannot be called a distributed system. It has to be definitely more than one node. 



Now let’s go over what it is meant by the system to be consistent, available, and partition tolerant. 



Consistency


Every read must receive the most recent write or an error.  It means every read operation that begins after a write must receive that write value. 


If not, it is said to be an inconsistent system.  


Let’s assume there are two servers S1, S2, and Client C. Initially S1 and S2 have values v0 with them. Now, the Client sends a write request with value v1 to S1.  

The S1 sends out an acknowledgment to the client only after all nodes are updated. 

After that client sends a read request to S2. If the value returned by S2 is v1, the distributed system is consistent. As all the nodes are updated with the most recent write.

This is the expected functionality of a consistent system. 


In case of an inconsistent system, the server S1 sends out the acknowledgment after it is updated and the client may not receive the most recent write when it queries S2. 



Availability


Every non-failing node returns a response for all read and write requests in a reasonable amount of time without the guarantee that it contains the most recent write. 


Here, the server is not allowed to ignore the requests. If it is not crashed, it must eventually respond to the client. 




Partition Tolerance


The system continues to operate despite an arbitrary number of messages being dropped by the network between nodes.  


It means our system should function correctly despite the network partitions.  



What is a network partition?  


Below pictures without and with partition answers that. 


cap theorem example



Note: The consistency in ACID Properties represents a different concept than the consistency in the CAP theorem.



The CAP Theorem doesn’t imply sacrificing one of the properties for the other two. It is a heavily misunderstood concept.


It actually implies that in the presence of a network partition, one has to choose between consistency and availability.  At other times, no trade-off has to be made. 

In a single node, all three properties hold but a single node is not a distributed system so the CAP theorem is not applicable in single-node systems. 


It is also a fact that no distributed system is safe from network failures.  Hence, partition tolerance is a must. In the presence of partition, one is left with two options: either consistency or availability. 



Let’s see what happens when you choose availability over consistency?


In this case, the system will process the query and try to return the most recent write as a response but it cannot guarantee that it is up to date due to network partitioning. 

Let’s assume there are two servers S1, S2, and Client C. Initially S1 and S2 have values v0 with them. Now, the Client sends a write request with value v1 to S1.  

Here, S1 sends out an acknowledgment after it is updated in S1 and before updating the same value in S2, a partition occurred. Now S1 and S2 have two different values and not consistent. 



What happens when you choose consistency over availability?


In this case, the system will return an error or a timeout as it cannot guarantee up-to-date information due to network partitioning. 

The nodes need time to update the information and cannot be available as often. 




Gilbert and Lynch’s proof of the CAP Theorem



Assume a contradiction that there does exist a distributed system that is consistent, available, and partition tolerant. 


Let’s assume there are two servers S1, S2, and Client C. Initially S1 and S2 have values v0 with them.  But there is a partition between S1 and S2. 

Now, the Client sends a write request with value v1 to S1.   Since the system is available, S1 must respond. Since there is a network partition, S1 cannot replicate to S2.


S1 is updated with value v1 and S2 still holds the value v0.  Gilbert and Lynch call this phase of execution Î±1. 


Now, the Client issues a read request to S2. Since the system is available, it must respond. It responds with v0 as there is a network partition and data is not replicated as a result. 

Gilbert and Lynch call this phase of execution Î±2. 


S2 returns v0 to the client but the client had already written v1 to S1. This is inconsistent. 


Initially, we assumed there exists a system that is consistent, available, and partition tolerant. But we have proved that any such system to exist acts inconsistent. Thus, there is no such system as per Gilbert and Lynch. 




Resources 



 


Post a Comment