Chapter 2.  Data Distribution

Cassandra's peer-to-peer architecture and scalability characteristics are directly tied to its data placement scheme. Cassandra employs a distributed hash table data structure that allows for data to be stored and retrieved by key quickly and efficiently. Consistent hashing is at the core of this strategy, as it enables all nodes to understand where data exists in the cluster without complicated coordination mechanisms.

In this chapter, we'll cover the following topics:

  • The fundamentals of distributed hash tables
  • Cassandra's consistent hashing mechanism
  • Token assignment, both manual and using vnodes
  • The implications of Cassandra's partitioner implementations
  • How hotspots form in the cluster 

By the time you finish this chapter, you should have a deep understanding of these concepts. Let us begin with some basics about hash tables in general, and then we can delve deeper into Cassandra's distributed hash table implementation.