Skip to main content

What is Fault Tolerance? | Automated Recovery | Cluster Health

In this Cockroach University lesson titled “Fault Tolerance and Automated Recovery”, we will look at the resilience that is built into CockroachDB. We’ll also look at how ranges are split among nodes, and how to observe this in the Admin UI. In this lesson, you will learn to: •Manually kill one node with the kill utility (Unix) or control-c (if the process is in Windows or otherwise not in background) •Use the Admin UI to: ◦Determine when a node is down (and which node it is) ◦Identify when under-replicated ranges are present ◦Watch the cluster repair itself •Explain why under-replicated ranges don't last very long in a large cluster •Check load continuity and cluster health •Add node back to a cluster and see the replicas redistribute Learn more about CockroachDB by signing up for free training at Cockroach University: https://university.cockroachlabs.com/ --------------------------------------------------------------------------------------------------------------------------- Reference Links •HAProxy.org --------------------------------------------------------------------------------------------------------------------------- Careers: https://www.cockroachlabs.com/careers CockroachCloud: https://www.cockroachlabs.com/product/cockroachcloud/ Blog: https://www.cockroachlabs.com/blog/ Docs: https://www.cockroachlabs.com/docs/stable/ Community Slack: https://cockroa.ch/Welcome-to-Slack Twitter: https://twitter.com/CockroachDB