Skip to main content

CockroachDB | High Availability Tech Brief

CockroachDB is a database that can enable your application to provide five 9’s of high availability while still maintaining strict consistency and giving OLTP performance.

Download the pdf

In the context of the CAP Theorem, CockroachDB is designed to be a CP database that guarantees data will be correct and will survive partition failure. The “A” in CAP is defined as “Every request receives a (non-error) response, without the guarantee that it contains the most recent write”. CockroachDB can serve almost every request and guarantee every write.  Further, there are several components of the database architecture that allow it to deliver five 9’s high availability for any application, whether it is deployed on a single rack, across a data center or even spanning regions.

This brief delivers a high-level outline of these characteristics and how we deliver high availability with our database.  If you would like a deep dive into the architecture of our database, please visit our documentation.

Five nines and CockroachDB

Most organizations consider an acceptable level for high availability to be when the downtime is less than 5.26 minutes per year or five nines (99.999%) high availability.  While this varies based on workload expectations it is a well-accepted barometer for this measure. In a database, we often extend this to performance and will measure p99 latencies for queries, where the probability of 99% of all transactions is under an expected response time.

CockroachDB is quite different than a normal database and the concept of high availability can be applied at two levels, the database itself and each individual row of data. High availability of explicit data is a new concept and one that is worth exploring.

System High Availability in CockroachDB

A CockroachDB deployment is comprised of a connected set of instances (nodes) that coordinate and allow you to deploy a database across multiple physical servers; however, it is accessed as a single logical database. A single binary is used for every node and each can serve as a single consistent gateway to the entirety of that database, no matter where data physically resides. This means that every node is available for reads and writes.  Further, our management console for the database can also be accessed via any node. 

Nodes can be deployed anywhere as long as they can be accessed by the other nodes.  This means our database can be deployed across multiple servers, racks, data centers, regions, and even Kubernetes clusters. These choices depend on your survival goals and our team and documentation help you choose a deployment architecture that will suit your High Availability requirements.

Data High Availability in CockroachDB: A Range of Nines

CockroachDB is presented as a relational database comprised of tables, rows, and columns. By default, every row of data in CockroachDB is written in triplicate across three different physical nodes and this replication factor is configurable based on your survivability goals (this number needs to be odd as our transactional model depends on quorum writes). We also use a distributed consensus algorithm, Raft - which ensures that physical nodes agree on the state of a row, so you can be certain every row is always consistent within in the cluster. This is the first level of data high availability in the database.

CockroachDB also delivers a unique feature for geo-partitioning, which allows you to tie an explicit row of data to any physical node in the cluster. This configuration is defined for each table and allows you to use a particular column to determine where data should live.  For instance, if you have a CUSTOMER table and you want all records that are attributed to German customers (country_code = “DE”) to live on servers in Germany, you can set a property of the table to assign these rows to the appropriate nodes. You can change this value in the table, and the database will move the data accordingly and if you want to add new regions, you can simply alter the table to accommodate the new region.

This is a powerful tool for both reducing query latency and for delivering high availability as you can ensure where data lives and now define a failure domain to be an entire region if you like. For instance, you could define that for every row in a table, the data must be distributed across three regions, so that you will be guaranteed quorum for every replica set and that the data will remain available. 

Ultimately, this powerful feature of CockroachDB allows you to set a failure domain for each table based on your survival goals. So you have a range of nines for high availability of every individual table in the database based on the business goals for each. 

Summary

Because of CockroachDB’s distributed nature, its built-in auto-healing capabilities, its ability to adopt different topologies as a way to tune the cluster for specific survivability goals, and its ability to place data intelligently within the cluster, CockroachDB is a database that can enable your application to provide five 9’s of high availability while still maintaining strict consistency and giving OLTP performance.   

About CockroachDB

CockroachDB is a cloud database, architected and built from the ground up to deliver a single or multi-region, active-active, strongly consistent SQL Database that gives developers: 

  • Simple, Consistent SQL - Use the tools and the language you use today and eliminate data anomalies
  • Elastic, Efficient Scale - Eliminate need for complex database operations and gain efficient scale 
  • Bulletproof Resilience - Ensure uninterrupted, low latency  queries during failure of a server, rack or region
  • Multi-region, Multi-cloud - Start small and grow to meet your users everywhere with a single database
  • Distributed & Cloud-native - Align containerized apps with a modern, distributed database 

You can get started today with CockroachCloud, a fully managed version of CockroachDB.