Production Checklist

On this page Carat arrow pointing down
Warning:
CockroachDB v19.1 is no longer supported as of October 30, 2020. For more details, refer to the Release Support Policy.

This page provides important recommendations for production deployments of CockroachDB.

Topology

When planning your deployment, it's important to carefully review and choose the topology patterns that best meet your latency and resiliency requirements. This is especially crucial for multi-region deployments.

Also keep in mind some basic topology recommendations:

  • Run each node on a separate machine. Since CockroachDB replicates across nodes, running more than one node per machine increases the risk of data loss if a machine fails. Likewise, if a machine has multiple disks or SSDs, run one node with multiple --store flags and not one node per disk. For more details about stores, see Start a Node.

  • When starting each node, use the --locality flag to describe the node's location, for example, --locality=region=west,zone=us-west-1. The key-value pairs should be ordered from most to least inclusive, and the keys and order of key-value pairs must be the same on all nodes.

  • When deploying in a single availability zone:

    • To be able to tolerate the failure of any 1 node, use at least 3 nodes with the .default 3-way replication factor. In this case, if 1 node fails, each range retains 2 of its 3 replicas, a majority.
    • To be able to tolerate 2 simultaneous node failures, use at least 5 nodes and increase the .default replication factor for user data to 5. The replication factor for important internal data is 5 by default, so no adjustments are needed for internal data. In this case, if 2 nodes fail at the same time, each range retains 3 of its 5 replicas, a majority.
  • When deploying across multiple availability zones:

    • To be able to tolerate the failure of 1 entire AZ in a region, use at least 3 AZs per region and set --locality on each node to spread data evenly across regions and AZs. In this case, if 1 AZ goes offline, the 2 remaining AZs retain a majority of replicas.
    • To be able to tolerate the failure of 1 entire region, use at least 3 regions.

Hardware

Note:

Mentions of "CPU resources" refer to vCPUs, which are also known as hyperthreads.

Basic hardware recommendations

Nodes should have sufficient CPU, RAM, network, and storage capacity to handle your workload. It's important to test and tune your hardware setup before deploying to production.

CPU and memory

  • At a bare minimum, each node should have 2 vCPUs and 2 GB of RAM. More data, complex workloads, higher concurrency, and faster performance require additional resources.

    Warning:

    Avoid "burstable" or "shared-core" virtual machines that limit the load on CPU resources.

  • To optimize for throughput, use larger nodes, up to 32 vCPUs. Based on internal testing results, 32 vCPUs is the sweet spot for OLTP workloads. For RAM, aim for a ratio of 2 GB per vCPU.

    To increase throughput further, add more nodes to the cluster instead of increasing node size.

  • To optimize for resiliency, use many smaller nodes (e.g., 4 vCPUs per node) instead of fewer larger ones. Recovery from a failed node is faster when data is spread across more nodes.

  • In all cases, make sure nodes are uniform to ensure consistent SQL performance.

Storage

  • The recommended Linux filesystem is ext4.

  • Avoid using shared storage such as NFS, CIFS, and CEPH storage.

  • For the best performance results, use SSD or NVMe devices. The recommended volume size is 300-500 GB.

    Monitor IOPS for higher service times. If they exceed 1-5 ms, you will need to add more devices or expand the cluster to reduce the disk latency. To monitor IOPS, use tools such as iostat (part of sysstat).

  • To calculate IOPS, use sysbench. If IOPS decrease, add more nodes to your cluster to increase IOPS.

  • Place a ballast file in each node's storage directory. In the unlikely case that a node runs out of disk space and shuts down, you can delete the ballast file to free up enough space to be able to restart the node.

  • Use zone configs to increase the replication factor from 3 (the default) to 5 (across at least 5 nodes).

    This is especially recommended if you are using local disks with no RAID protection rather than a cloud provider's network-attached disks that are often replicated under the hood, because local disks have a greater risk of failure. You can do this for the entire cluster or for specific databases, tables, or rows (enterprise-only).

  • The optimal configuration for striping more than one device is RAID 10. RAID 0 and 1 are also acceptable from a performance perspective.

Cloud-specific recommendations

Cockroach Labs recommends the following cloud-specific configurations based on our own internal testing. Before using configurations not recommended here, be sure to test them exhaustively.

AWS

  • Use m (general purpose) or c (compute-optimized) instances.

    For example, Cockroach Labs has used c5d.4xlarge (16 vCPUs and 32 GiB of RAM per instance, EBS) for internal testing. Note that the instance type depends on whether EBS is used or not. If you're using EBS, use a c5 instance.

    Warning:

    Do not use "burstable" t instances, which limit the load on CPU resources.

  • Use c5 instances with EBS as a primary AWS configuration. To simulate bare-metal deployments, use c5d with SSD Instance Store volumes.

    Provisioned IOPS SSD-backed (io1) EBS volumes need to have IOPS provisioned, which can be very expensive. Cheaper gp2 volumes can be used instead, if your performance needs are less demanding. Allocating more disk space than you will use can improve performance of gp2 volumes.

Azure

  • Use compute-optimized F-series VMs. For example, Cockroach Labs has used Standard_F16s_v2 VMs (16 vCPUs and 32 GiB of RAM per VM) for internal testing.

    Warning:

    Do not use "burstable" B-series VMs, which limit the load on CPU resources. Also, Cockroach Labs has experienced data corruption issues on A-series VMs and irregular disk performance on D-series VMs, so we recommend avoiding those as well.

  • Use Premium Storage or local SSD storage with a Linux filesystem such as ext4 (not the Windows ntfs filesystem). Note that the size of a Premium Storage disk affects its IOPS.

  • If you choose local SSD storage, on reboot, the VM can come back with the ntfs filesystem. Be sure your automation monitors for this and reformats the disk to the Linux filesystem you chose initially.

Digital Ocean

  • Use any droplets except standard droplets with only 1 GB of RAM, which is below our minimum requirement. All Digital Ocean droplets use SSD storage.

GCP

Security

An insecure cluster comes with serious risks:

  • Your cluster is open to any client that can access any node's IP addresses.
  • Any user, even root, can log in without providing a password.
  • Any user, connecting as root, can read or write any data in your cluster.
  • There is no network encryption or authentication, and thus no confidentiality.

Therefore, to deploy CockroachDB in production, it is strongly recommended to use TLS certificates to authenticate the identity of nodes and clients and to encrypt in-flight data between nodes and clients. You can use either the built-in cockroach cert commands or openssl commands to generate security certificates for your deployment. Regardless of which option you choose, you'll need the following files:

  • A certificate authority (CA) certificate and key, used to sign all of the other certificates.
  • A separate certificate and key for each node in your deployment, with the common name node.
  • A separate certificate and key for each client and user you want to connect to your nodes, with the common name set to the username. The default user is root.

    Alternatively, CockroachDB supports password authentication, although we typically recommend using client certificates instead.

For more information, see the Security Overview.

Networking

Networking flags

When starting a node, two main flags are used to control its network connections:

  • --listen-addr determines which address(es) to listen on for connections from other nodes and clients.
  • --advertise-addr determines which address to tell other nodes to use.

The effect depends on how these two flags are used in combination:

--listen-addr not specified --listen-addr specified
--advertise-addr not specified Node listens on all of its IP addresses on port 26257 and advertises its canonical hostname to other nodes. Node listens on the IP address/hostname and port specified in --listen-addr and advertises this value to other nodes.
--advertise-addr specified Node listens on all of its IP addresses on port 26257 and advertises the value specified in --advertise-addr to other nodes. Recommended for most cases. Node listens on the IP address/hostname and port specified in --listen-addr and advertises the value specified in --advertise-addr to other nodes. If the --advertise-addr port number is different than the one used in --listen-addr, port forwarding is required.
Tip:

When using hostnames, make sure they resolve properly (e.g., via DNS or etc/hosts). In particular, be careful about the value advertised to other nodes, either via --advertise-addr or via --listen-addr when --advertise-addr is not specified.

Cluster on a single network

When running a cluster on a single network, the setup depends on whether the network is private. In a private network, machines have addresses restricted to the network, not accessible to the public internet. Using these addresses is more secure and usually provides lower latency than public addresses.

Private? Recommended setup
Yes Start each node with --listen-addr set to its private IP address and do not specify --advertise-addr. This will tell other nodes to use the private IP address advertised. Load balancers/clients in the private network must use it as well.
No Start each node with --advertise-addr set to a stable public IP address that routes to the node and do not specify --listen-addr. This will tell other nodes to use the specific IP address advertised, but load balancers/clients will be able to use any address that routes to the node.

If load balancers/clients are outside the network, also configure firewalls to allow external traffic to reach the cluster.

Cluster spanning multiple networks

When running a cluster across multiple networks, the setup depends on whether nodes can reach each other across the networks.

Nodes reachable across networks? Recommended setup
Yes This is typical when all networks are on the same cloud. In this case, use the relevant single network setup above.
No This is typical when networks are on different clouds. In this case, set up a VPN, VPC, NAT, or another such solution to provide unified routing across the networks. Then start each node with --advertise-addr set to the address that is reachable from other networks and do not specify --listen-addr. This will tell other nodes to use the specific IP address advertised, but load balancers/clients will be able to use any address that routes to the node.

Also, if a node is reachable from other nodes in its network on a private or local address, set --locality-advertise-addr to that address. This will tell nodes within the same network to prefer the private or local address to improve performance. Note that this feature requires that each node is started with the --locality flag. For more details, see this example.

Load balancing

Each CockroachDB node is an equally suitable SQL gateway to a cluster, but to ensure client performance and reliability, it's important to use load balancing:

  • Performance: Load balancers spread client traffic across nodes. This prevents any one node from being overwhelmed by requests and improves overall cluster performance (queries per second).

  • Reliability: Load balancers decouple client health from the health of a single CockroachDB node. To ensure that traffic is not directed to failed nodes or nodes that are not ready to receive requests, load balancers should use CockroachDB's readiness health check.

    Tip:
    With a single load balancer, client connections are resilient to node failure, but the load balancer itself is a point of failure. It's therefore best to make load balancing resilient as well by using multiple load balancing instances, with a mechanism like floating IPs or DNS to select load balancers for clients.

For guidance on load balancing, see the tutorial for your deployment environment:

Environment Featured Approach
On-Premises Use HAProxy.
AWS Use Amazon's managed load balancing service.
Azure Use Azure's managed load balancing service.
Digital Ocean Use Digital Ocean's managed load balancing service.
GCP Use GCP's managed TCP proxy load balancing service.

Connection pooling

Multiple active connections to the database enable efficient use of the available database resources. However, creating new authenticated connections to the database is CPU and memory-intensive and also adds to the latency since the application has to wait for database to authenticate the connection.

Connection pooling helps resolve this dilemma by creating a set of authenticated connections that can be reused to connect to the database. To determine the size of the maximum connection pool, our recommendation is to start your testing with a pool size of (core_count * 2) + ssd_count), where core_count is the total number of cores in the cluster, and ssd_count is the total number of SSDs in the cluster. The optimal pool size will vary based on your workload and application, and may be smaller than that.

In addition to setting a maximum connection pool size, the idle connection pool size must also be set. Our recommendation is to set the idle connection pool size equal to the maximum pool size. While this uses more application server memory, this allows there to be many connections when concurrency is high, without having to dial a new connection for every new operation.

Monitoring and alerting

Despite CockroachDB's various built-in safeguards against failure, it is critical to actively monitor the overall health and performance of a cluster running in production and to create alerting rules that promptly send notifications when there are events that require investigation or intervention.

For details about available monitoring options and the most important events and metrics to alert on, see Monitoring and Alerting.

Clock synchronization

CockroachDB requires moderate levels of clock synchronization to preserve data consistency. For this reason, when a node detects that its clock is out of sync with at least half of the other nodes in the cluster by 80% of the maximum offset allowed (500ms by default), it spontaneously shuts down. While serializable consistency is maintained regardless of clock skew, skew outside the configured clock offset bounds can result in violations of single-key linearizability between causally dependent transactions. It's therefore important to prevent clocks from drifting too far by running NTP or other clock synchronization software on each node.

The one rare case to note is when a node's clock suddenly jumps beyond the maximum offset before the node detects it. Although extremely unlikely, this could occur, for example, when running CockroachDB inside a VM and the VM hypervisor decides to migrate the VM to different hardware with a different time. In this case, there can be a small window of time between when the node's clock becomes unsynchronized and when the node spontaneously shuts down. During this window, it would be possible for a client to read stale data and write data derived from stale reads. To protect against this, we recommend using the server.clock.forward_jump_check_enabled and server.clock.persist_upper_bound_interval cluster settings.

Considerations

When setting up clock synchronization:

  • All nodes in the cluster must be synced to the same time source, or to different sources that implement leap second smearing in the same way. For example, Google and Amazon have time sources that are compatible with each other (they implement leap second smearing in the same way), but are incompatible with the default NTP pool (which does not implement leap second smearing).
  • For nodes running in AWS, we recommend Amazon Time Sync Service. For nodes running in GCP, we recommend Google's internal NTP service. For nodes running elsewhere, we recommend Google Public NTP. Note that the Google and Amazon time services can be mixed with each other, but they cannot be mixed with other time services (unless you have verified leap second behavior). Either all of your nodes should use the Google and Amazon services, or none of them should.
  • If you do not want to use the Google or Amazon time sources, you can use chrony and enable client-side leap smearing, unless the time source you're using already does server-side smearing. In most cases, we recommend the Google Public NTP time source because it handles smearing the leap second. If you use a different NTP time source that doesn't smear the leap second, you must configure client-side smearing manually and do so in the same way on each machine.
  • Do not run more than one clock sync service on VMs where cockroach is running.

Tutorials

For guidance on synchronizing clocks, see the tutorial for your deployment environment:

Environment Featured Approach
On-Premises Use NTP with Google's external NTP service.
AWS Use the Amazon Time Sync Service.
Azure Disable Hyper-V time synchronization and use NTP with Google's external NTP service.
Digital Ocean Use NTP with Google's external NTP service.
GCE Use NTP with Google's internal NTP service.

Cache and SQL memory size

By default, each node's cache size and temporary SQL memory size is 128MiB respectively. These defaults were chosen to facilitate development and testing, where users are likely to run multiple CockroachDB nodes on a single computer. When running a production cluster with one node per host, however, it's recommended to increase these values:

  • Increasing a node's cache size will improve the node's read performance.
  • Increasing a node's SQL memory size will increase the number of simultaneous client connections it allows (the 128MiB default allows a maximum of 6200 simultaneous connections) as well as the node's capacity for in-memory processing of rows when using ORDER BY, GROUP BY, DISTINCT, joins, and window functions.

To manually increase a node's cache size and SQL memory size, start the node using the --cache and --max-sql-memory flags:

icon/buttons/copy
$ cockroach start --cache=.25 --max-sql-memory=.25 <other start flags>
Warning:

Avoid setting --cache and --max-sql-memory to a combined value of more than 75% of a machine's total RAM. Doing so increases the risk of memory-related failures.

Dependencies

The CockroachDB binary for Linux depends on the following libraries:

Library Description
glibc The standard C library.

If you build CockroachDB from source, rather than use the official CockroachDB binary for Linux, you can use any C standard library, including MUSL, the C standard library used on Alpine.
libncurses Required by the built-in SQL shell.
tzdata Required by certain features of CockroachDB that use time zone data, for example, to support using location-based names as time zone identifiers. This library is sometimes called tz or zoneinfo. On Windows, time zone data is derived from a zoneinfo.zip file.

If a machine running a CockroachDB node is missing this time zone data, the node will not be able to resolve location-based time zone names. For workaround steps, see this known limitation.

These libraries are found by default on nearly all Linux distributions, with Alpine as the notable exception, but it's nevertheless important to confirm that they are installed and kept up-to-date. For the time zone data in particular, it's important for all nodes to have the same version; when updating the library, do so as quickly as possible across all nodes.

Note:

In Docker-based deployments of CockroachDB, these dependencies do not need to be manually addressed. The Docker image for CockroachDB includes them and keeps them up to date with each release of CockroachDB.

File descriptors limit

CockroachDB can use a large number of open file descriptors, often more than is available by default. Therefore, please note the following recommendations.

For each CockroachDB node:

  • At a minimum, the file descriptors limit must be 1956 (1700 per store plus 256 for networking). If the limit is below this threshold, the node will not start.
  • It is recommended to set the file descriptors limit to unlimited; otherwise, the recommended limit is at least 15000 (10000 per store plus 5000 for networking). This higher limit ensures performance and accommodates cluster growth.
  • When the file descriptors limit is not high enough to allocate the recommended amounts, CockroachDB allocates 10000 per store and the rest for networking; if this would result in networking getting less than 256, CockroachDB instead allocates 256 for networking and evenly splits the rest across stores.

Increase the file descriptors limit

Yosemite and later

To adjust the file descriptors limit for a single process in Mac OS X Yosemite and later, you must create a property list configuration file with the hard limit set to the recommendation mentioned above. Note that CockroachDB always uses the hard limit, so it's not technically necessary to adjust the soft limit, although we do so in the steps below.

For example, for a node with 3 stores, we would set the hard limit to at least 35000 (10000 per store and 5000 for networking) as follows:

  1. Check the current limits:

    icon/buttons/copy
    $ launchctl limit maxfiles
    
    maxfiles    10240          10240
    

    The last two columns are the soft and hard limits, respectively. If unlimited is listed as the hard limit, note that the hidden default limit for a single process is actually 10240.

  2. Create /Library/LaunchDaemons/limit.maxfiles.plist and add the following contents, with the final strings in the ProgramArguments array set to 35000:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
      <plist version="1.0">
        <dict>
          <key>Label</key>
            <string>limit.maxfiles</string>
          <key>ProgramArguments</key>
            <array>
              <string>launchctl</string>
              <string>limit</string>
              <string>maxfiles</string>
              <string>35000</string>
              <string>35000</string>
            </array>
          <key>RunAtLoad</key>
            <true/>
          <key>ServiceIPC</key>
            <false/>
        </dict>
      </plist>
    

    Make sure the plist file is owned by root:wheel and has permissions -rw-r--r--. These permissions should be in place by default.

  3. Restart the system for the new limits to take effect.

  4. Check the current limits:

    icon/buttons/copy
    $ launchctl limit maxfiles
    
    maxfiles    35000          35000
    

Older versions

To adjust the file descriptors limit for a single process in OS X versions earlier than Yosemite, edit /etc/launchd.conf and increase the hard limit to the recommendation mentioned above. Note that CockroachDB always uses the hard limit, so it's not technically necessary to adjust the soft limit, although we do so in the steps below.

For example, for a node with 3 stores, we would set the hard limit to at least 35000 (10000 per store and 5000 for networking) as follows:

  1. Check the current limits:

    icon/buttons/copy
    $ launchctl limit maxfiles
    
    maxfiles    10240          10240
    

    The last two columns are the soft and hard limits, respectively. If unlimited is listed as the hard limit, note that the hidden default limit for a single process is actually 10240.

  2. Edit (or create) /etc/launchd.conf and add a line that looks like the following, with the last value set to the new hard limit:

    limit maxfiles 35000 35000
    
  3. Save the file, and restart the system for the new limits to take effect.

  4. Verify the new limits:

    icon/buttons/copy
    $ launchctl limit maxfiles
    
    maxfiles    35000          35000
    

Per-Process Limit

To adjust the file descriptors limit for a single process on Linux, enable PAM user limits and set the hard limit to the recommendation mentioned above. Note that CockroachDB always uses the hard limit, so it's not technically necessary to adjust the soft limit, although we do so in the steps below.

For example, for a node with 3 stores, we would set the hard limit to at least 35000 (10000 per store and 5000 for networking) as follows:

  1. Make sure the following line is present in both /etc/pam.d/common-session and /etc/pam.d/common-session-noninteractive:

    session    required   pam_limits.so
    
  2. Edit /etc/security/limits.conf and append the following lines to the file:

    *              soft     nofile          35000
    *              hard     nofile          35000
    

    Note that * can be replaced with the username that will be running the CockroachDB server.

  3. Save and close the file.

  4. Restart the system for the new limits to take effect.

  5. Verify the new limits:

    $ ulimit -a
    

Alternately, if you're using Systemd:

  1. Edit the service definition to configure the maximum number of open files:

    [Service]
    ...
    LimitNOFILE=35000
    
  2. Reload Systemd for the new limit to take effect:

    $ systemctl daemon-reload
    

System-Wide Limit

You should also confirm that the file descriptors limit for the entire Linux system is at least 10 times higher than the per-process limit documented above (e.g., at least 150000).

  1. Check the system-wide limit:

    $ cat /proc/sys/fs/file-max
    
  2. If necessary, increase the system-wide limit in the proc file system:

    $ echo 150000 > /proc/sys/fs/file-max
    

CockroachDB does not yet provide a Windows binary. Once that's available, we will also provide documentation on adjusting the file descriptors limit on Windows.

Attributions

This section, "File Descriptors Limit", is in part derivative of the chapter Open File Limits From the Riak LV 2.1.4 documentation, used under Creative Commons Attribution 3.0 Unported License.

Orchestration / Kubernetes

When running CockroachDB on Kubernetes, making the following minimal customizations will result in better, more reliable performance:

For more information and additional customization suggestions, see our full detailed guide to CockroachDB Performance on Kubernetes.

Transaction Retries

When several transactions are trying to modify the same underlying data concurrently, they may experience contention that leads to transaction retries. In order to avoid failures in production, your application should be engineered to handle transaction retries using client-side retry handling.


Yes No
On this page

Yes No