Build a Python App with CockroachDB

On this page Carat arrow pointing down
Warning:
CockroachDB v19.1 is no longer supported as of October 30, 2020. For more details, refer to the Release Support Policy.

This tutorial shows you how build a simple Python application with CockroachDB using SQLAlchemy.

We have tested the psycopg2 driver and SQLAlchemy enough to claim beta-level support. If you encounter problems, please open an issue with details to help us make progress toward full support.

Before you begin

  1. Install CockroachDB.
  2. Start up a secure or insecure local cluster.
  3. Choose the instructions that correspond to whether your cluster is secure or insecure:
Note:

The example code on this page uses Python 3.

Warning:

SQLAlchemy relies on the existence of foreign keys to generate JOIN expressions from your application code. If you remove foreign keys from your schema, SQLAlchemy will not generate joins for you. As a workaround, you can create a "custom foreign condition" by adding a relationship field to your table objects, or do the equivalent work in your application.

Step 1. Install SQLAlchemy

To install SQLAlchemy, as well as a CockroachDB Python package that accounts for some differences between CockroachDB and PostgreSQL, run the following command:

icon/buttons/copy
$ pip install sqlalchemy sqlalchemy-cockroachdb psycopg2
Tip:

You can substitute psycopg2 for other alternatives that include the psycopg python package.

For other ways to install SQLAlchemy, see the official documentation.

Step 2. Create the maxroach user and bank database

Start the built-in SQL client:

icon/buttons/copy
$ cockroach sql --certs-dir=certs

In the SQL shell, issue the following statements to create the maxroach user and bank database:

icon/buttons/copy
> CREATE USER IF NOT EXISTS maxroach;
icon/buttons/copy
> CREATE DATABASE bank;

Give the maxroach user the necessary permissions:

icon/buttons/copy
> GRANT ALL ON DATABASE bank TO maxroach;

Exit the SQL shell:

icon/buttons/copy
> \q

Step 3. Generate a certificate for the maxroach user

Create a certificate and key for the maxroach user by running the following command. The code samples will run as this user.

icon/buttons/copy
$ cockroach cert create-client maxroach --certs-dir=certs --ca-key=my-safe-directory/ca.key

Step 4. Run the Python code

The code below uses SQLAlchemy to map Python objects and methods to SQL operations.

You can run this script as many times as you want; on each run, the script will create some new accounts and shuffle money around between randomly selected accounts.

Specifically, the script:

  1. Reads in existing account IDs (if any) from the bank database.
  2. Creates additional accounts with randomly generated IDs. Then, it adds a bit of money to each new account.
  3. Chooses two accounts at random and takes half of the money from the first and deposits it into the second.

It does all of the above using the practices we recommend for using SQLAlchemy with CockroachDB, which are listed in the Best practices section below.

Note:

You must use the cockroachdb:// prefix in the URL passed to sqlalchemy.create_engine to make sure the cockroachdb dialect is used. Using the postgres:// URL prefix to connect to your CockroachDB cluster will not work.

Copy the code below or download it directly.

icon/buttons/copy
import random
from math import floor
from sqlalchemy import create_engine, Column, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from cockroachdb.sqlalchemy import run_transaction

Base = declarative_base()


# The Account class corresponds to the "accounts" database table.
class Account(Base):
    __tablename__ = 'accounts'
    id = Column(Integer, primary_key=True)
    balance = Column(Integer)


# Create an engine to communicate with the database. The
# "cockroachdb://" prefix for the engine URL indicates that we are
# connecting to CockroachDB using the 'cockroachdb' dialect.
# For more information, see
# https://github.com/cockroachdb/sqlalchemy-cockroachdb.

secure_cluster = True           # Set to False for insecure clusters
connect_args = {}

if secure_cluster:
    connect_args = {
        'sslmode': 'require',
        'sslrootcert': 'certs/ca.crt',
        'sslkey': 'certs/client.maxroach.key',
        'sslcert': 'certs/client.maxroach.crt'
    }
else:
    connect_args = {'sslmode': 'disable'}

engine = create_engine(
    'cockroachdb://maxroach@localhost:26257/bank',
    connect_args=connect_args,
    echo=True                   # Log SQL queries to stdout
)

# Automatically create the "accounts" table based on the Account class.
Base.metadata.create_all(engine)


# Store the account IDs we create for later use.

seen_account_ids = set()


# The code below generates random IDs for new accounts.

def create_random_accounts(sess, n):
    """Create N new accounts with random IDs and random account balances.

    Note that since this is a demo, we do not do any work to ensure the
    new IDs do not collide with existing IDs.
    """
    new_accounts = []
    elems = iter(range(n))
    for i in elems:
        billion = 1000000000
        new_id = floor(random.random()*billion)
        seen_account_ids.add(new_id)
        new_accounts.append(
            Account(
                id=new_id,
                balance=floor(random.random()*1000000)
            )
        )
    sess.add_all(new_accounts)


run_transaction(sessionmaker(bind=engine),
                lambda s: create_random_accounts(s, 100))


# Helper for getting random existing account IDs.

def get_random_account_id():
    id = random.choice(tuple(seen_account_ids))
    return id


def transfer_funds_randomly(session):
    """Transfer money randomly between accounts (during SESSION).

    Cuts a randomly selected account's balance in half, and gives the
    other half to some other randomly selected account.
    """
    source_id = get_random_account_id()
    sink_id = get_random_account_id()

    source = session.query(Account).filter_by(id=source_id).one()
    amount = floor(source.balance/2)

    # Check balance of the first account.
    if source.balance < amount:
        raise "Insufficient funds"

    source.balance -= amount
    session.query(Account).filter_by(id=sink_id).update(
        {"balance": (Account.balance + amount)}
    )


# Run the transfer inside a transaction.

run_transaction(sessionmaker(bind=engine), transfer_funds_randomly)

Then run the code:

icon/buttons/copy
$ python3 sqlalchemy-basic-sample.py

The output should look something like the following:

2018-12-06 15:59:58,999 INFO sqlalchemy.engine.base.Engine select current_schema()
2018-12-06 15:59:58,999 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,001 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2018-12-06 15:59:59,001 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,001 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2018-12-06 15:59:59,001 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,002 INFO sqlalchemy.engine.base.Engine select version()
2018-12-06 15:59:59,002 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,003 INFO sqlalchemy.engine.base.Engine SELECT table_name FROM information_schema.tables WHERE table_schema=%s
2018-12-06 15:59:59,004 INFO sqlalchemy.engine.base.Engine ('public',)
2018-12-06 15:59:59,005 INFO sqlalchemy.engine.base.Engine SELECT id from accounts;
2018-12-06 15:59:59,005 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,008 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2018-12-06 15:59:59,008 INFO sqlalchemy.engine.base.Engine SAVEPOINT cockroach_restart
2018-12-06 15:59:59,008 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,083 INFO sqlalchemy.engine.base.Engine INSERT INTO accounts (id, balance) VALUES (%(id)s, %(balance)s)
2018-12-06 15:59:59,083 INFO sqlalchemy.engine.base.Engine ({'id': 298865, 'balance': 208217}, {'id': 506738, 'balance': 962549}, {'id': 514698, 'balance': 986327}, {'id': 587747, 'balance': 210406}, {'id': 50148, 'balance': 347976}, {'id': 854295, 'balance': 420086}, {'id': 785757, 'balance': 364836}, {'id': 406247, 'balance': 787016}  ... displaying 10 of 100 total bound parameter sets ...  {'id': 591336, 'balance': 542066}, {'id': 33728, 'balance': 526531})
2018-12-06 15:59:59,201 INFO sqlalchemy.engine.base.Engine RELEASE SAVEPOINT cockroach_restart
2018-12-06 15:59:59,201 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,205 INFO sqlalchemy.engine.base.Engine COMMIT
2018-12-06 15:59:59,206 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2018-12-06 15:59:59,206 INFO sqlalchemy.engine.base.Engine SAVEPOINT cockroach_restart
2018-12-06 15:59:59,206 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,207 INFO sqlalchemy.engine.base.Engine SELECT accounts.id AS accounts_id, accounts.balance AS accounts_balance
FROM accounts
WHERE accounts.id = %(id_1)s
2018-12-06 15:59:59,207 INFO sqlalchemy.engine.base.Engine {'id_1': 769626}
2018-12-06 15:59:59,209 INFO sqlalchemy.engine.base.Engine UPDATE accounts SET balance=%(balance)s WHERE accounts.id = %(accounts_id)s
2018-12-06 15:59:59,209 INFO sqlalchemy.engine.base.Engine {'balance': 470580, 'accounts_id': 769626}
2018-12-06 15:59:59,212 INFO sqlalchemy.engine.base.Engine UPDATE accounts SET balance=(accounts.balance + %(balance_1)s) WHERE accounts.id = %(id_1)s
2018-12-06 15:59:59,247 INFO sqlalchemy.engine.base.Engine {'balance_1': 470580, 'id_1': 158447}
2018-12-06 15:59:59,249 INFO sqlalchemy.engine.base.Engine RELEASE SAVEPOINT cockroach_restart
2018-12-06 15:59:59,250 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,251 INFO sqlalchemy.engine.base.Engine COMMIT

To verify that the table and rows were created successfully, start the built-in SQL client:

icon/buttons/copy
$ cockroach sql --certs-dir=certs --database=bank

Then, issue the following statement:

icon/buttons/copy
> SELECT COUNT(*) FROM accounts;
 count
-------
   100
(1 row)

Step 2. Create the maxroach user and bank database

Start the built-in SQL client:

icon/buttons/copy
$ cockroach sql --insecure

In the SQL shell, issue the following statements to create the maxroach user and bank database:

icon/buttons/copy
> CREATE USER IF NOT EXISTS maxroach;
icon/buttons/copy
> CREATE DATABASE bank;

Give the maxroach user the necessary permissions:

icon/buttons/copy
> GRANT ALL ON DATABASE bank TO maxroach;

Exit the SQL shell:

icon/buttons/copy
> \q

Step 3. Run the Python code

The code below uses SQLAlchemy to map Python objects and methods to SQL operations.

You can run this script as many times as you want; on each run, it will create some new accounts and shuffle money around between randomly selected accounts.

Specifically, it:

  1. Reads in existing account IDs (if any) from the bank database.
  2. Creates additional accounts with randomly generated IDs. Then, it adds a bit of money to each new account.
  3. Chooses two accounts at random and takes half of the money from the first and deposits it into the second.

It does all of the above using the practices we recommend for using SQLAlchemy with CockroachDB, which are listed in the Best practices section below.

Note:

You must use the cockroachdb:// prefix in the URL passed to sqlalchemy.create_engine to make sure the cockroachdb dialect is used. Using the postgres:// URL prefix to connect to your CockroachDB cluster will not work.

icon/buttons/copy
import random
from math import floor
from sqlalchemy import create_engine, Column, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from cockroachdb.sqlalchemy import run_transaction

Base = declarative_base()


# The Account class corresponds to the "accounts" database table.
class Account(Base):
    __tablename__ = 'accounts'
    id = Column(Integer, primary_key=True)
    balance = Column(Integer)


# Create an engine to communicate with the database. The
# "cockroachdb://" prefix for the engine URL indicates that we are
# connecting to CockroachDB using the 'cockroachdb' dialect.
# For more information, see
# https://github.com/cockroachdb/sqlalchemy-cockroachdb.

secure_cluster = True           # Set to False for insecure clusters
connect_args = {}

if secure_cluster:
    connect_args = {
        'sslmode': 'require',
        'sslrootcert': 'certs/ca.crt',
        'sslkey': 'certs/client.maxroach.key',
        'sslcert': 'certs/client.maxroach.crt'
    }
else:
    connect_args = {'sslmode': 'disable'}

engine = create_engine(
    'cockroachdb://maxroach@localhost:26257/bank',
    connect_args=connect_args,
    echo=True                   # Log SQL queries to stdout
)

# Automatically create the "accounts" table based on the Account class.
Base.metadata.create_all(engine)


# Store the account IDs we create for later use.

seen_account_ids = set()


# The code below generates random IDs for new accounts.

def create_random_accounts(sess, n):
    """Create N new accounts with random IDs and random account balances.

    Note that since this is a demo, we do not do any work to ensure the
    new IDs do not collide with existing IDs.
    """
    new_accounts = []
    elems = iter(range(n))
    for i in elems:
        billion = 1000000000
        new_id = floor(random.random()*billion)
        seen_account_ids.add(new_id)
        new_accounts.append(
            Account(
                id=new_id,
                balance=floor(random.random()*1000000)
            )
        )
    sess.add_all(new_accounts)


run_transaction(sessionmaker(bind=engine),
                lambda s: create_random_accounts(s, 100))


# Helper for getting random existing account IDs.

def get_random_account_id():
    id = random.choice(tuple(seen_account_ids))
    return id


def transfer_funds_randomly(session):
    """Transfer money randomly between accounts (during SESSION).

    Cuts a randomly selected account's balance in half, and gives the
    other half to some other randomly selected account.
    """
    source_id = get_random_account_id()
    sink_id = get_random_account_id()

    source = session.query(Account).filter_by(id=source_id).one()
    amount = floor(source.balance/2)

    # Check balance of the first account.
    if source.balance < amount:
        raise "Insufficient funds"

    source.balance -= amount
    session.query(Account).filter_by(id=sink_id).update(
        {"balance": (Account.balance + amount)}
    )


# Run the transfer inside a transaction.

run_transaction(sessionmaker(bind=engine), transfer_funds_randomly)

Run the code:

icon/buttons/copy
$ python3 example.py

The output should look something like the following:

2018-12-06 15:59:58,999 INFO sqlalchemy.engine.base.Engine select current_schema()
2018-12-06 15:59:58,999 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,001 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2018-12-06 15:59:59,001 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,001 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2018-12-06 15:59:59,001 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,002 INFO sqlalchemy.engine.base.Engine select version()
2018-12-06 15:59:59,002 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,003 INFO sqlalchemy.engine.base.Engine SELECT table_name FROM information_schema.tables WHERE table_schema=%s
2018-12-06 15:59:59,004 INFO sqlalchemy.engine.base.Engine ('public',)
2018-12-06 15:59:59,005 INFO sqlalchemy.engine.base.Engine SELECT id from accounts;
2018-12-06 15:59:59,005 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,008 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2018-12-06 15:59:59,008 INFO sqlalchemy.engine.base.Engine SAVEPOINT cockroach_restart
2018-12-06 15:59:59,008 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,083 INFO sqlalchemy.engine.base.Engine INSERT INTO accounts (id, balance) VALUES (%(id)s, %(balance)s)
2018-12-06 15:59:59,083 INFO sqlalchemy.engine.base.Engine ({'id': 298865, 'balance': 208217}, {'id': 506738, 'balance': 962549}, {'id': 514698, 'balance': 986327}, {'id': 587747, 'balance': 210406}, {'id': 50148, 'balance': 347976}, {'id': 854295, 'balance': 420086}, {'id': 785757, 'balance': 364836}, {'id': 406247, 'balance': 787016}  ... displaying 10 of 100 total bound parameter sets ...  {'id': 591336, 'balance': 542066}, {'id': 33728, 'balance': 526531})
2018-12-06 15:59:59,201 INFO sqlalchemy.engine.base.Engine RELEASE SAVEPOINT cockroach_restart
2018-12-06 15:59:59,201 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,205 INFO sqlalchemy.engine.base.Engine COMMIT
2018-12-06 15:59:59,206 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2018-12-06 15:59:59,206 INFO sqlalchemy.engine.base.Engine SAVEPOINT cockroach_restart
2018-12-06 15:59:59,206 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,207 INFO sqlalchemy.engine.base.Engine SELECT accounts.id AS accounts_id, accounts.balance AS accounts_balance
FROM accounts
WHERE accounts.id = %(id_1)s
2018-12-06 15:59:59,207 INFO sqlalchemy.engine.base.Engine {'id_1': 769626}
2018-12-06 15:59:59,209 INFO sqlalchemy.engine.base.Engine UPDATE accounts SET balance=%(balance)s WHERE accounts.id = %(accounts_id)s
2018-12-06 15:59:59,209 INFO sqlalchemy.engine.base.Engine {'balance': 470580, 'accounts_id': 769626}
2018-12-06 15:59:59,212 INFO sqlalchemy.engine.base.Engine UPDATE accounts SET balance=(accounts.balance + %(balance_1)s) WHERE accounts.id = %(id_1)s
2018-12-06 15:59:59,247 INFO sqlalchemy.engine.base.Engine {'balance_1': 470580, 'id_1': 158447}
2018-12-06 15:59:59,249 INFO sqlalchemy.engine.base.Engine RELEASE SAVEPOINT cockroach_restart
2018-12-06 15:59:59,250 INFO sqlalchemy.engine.base.Engine {}
2018-12-06 15:59:59,251 INFO sqlalchemy.engine.base.Engine COMMIT

To verify that the table and rows were created successfully, start the built-in SQL client:

icon/buttons/copy
$ cockroach sql --insecure --database=bank

Then, issue the following statement:

icon/buttons/copy
> SELECT COUNT(*) FROM accounts;
 count
-------
   100
(1 row)

Best practices

Use the run_transaction function

We strongly recommend using the sqlalchemy_cockroachdb.run_transaction() function as shown in the code samples on this page. This abstracts the details of transaction retries away from your application code. Transaction retries are more frequent in CockroachDB than in some other databases because we use optimistic concurrency control rather than locking. Because of this, a CockroachDB transaction may have to be tried more than once before it can commit. This is part of how we ensure that our transaction ordering guarantees meet the ANSI SERIALIZABLE isolation level.

In addition to the above, using run_transaction has the following benefits:

  • Because it must be passed a sqlalchemy.orm.session.sessionmaker object (not a session), it ensures that a new session is created exclusively for use by the callback, which protects you from accidentally reusing objects via any sessions created outside the transaction.
  • It abstracts away the client-side transaction retry logic from your application, which keeps your application code portable across different databases. For example, the sample code given on this page works identically when run against Postgres (modulo changes to the prefix and port number in the connection string).

For more information about how transactions (and retries) work, see Transactions.

Avoid mutations of session and/or transaction state inside run_transaction()

In general, this is in line with the recommendations of the SQLAlchemy FAQs, which state (with emphasis added by the original author) that

As a general rule, the application should manage the lifecycle of the session externally to functions that deal with specific data. This is a fundamental separation of concerns which keeps data-specific operations agnostic of the context in which they access and manipulate that data.

and

Keep the lifecycle of the session (and usually the transaction) separate and external.

In keeping with the above recommendations from the official docs, we strongly recommend avoiding any explicit mutations of the transaction state inside the callback passed to run_transaction, since that will lead to breakage. Specifically, do not make calls to the following functions from inside run_transaction:

  • sqlalchemy.orm.Session.commit() (or other variants of commit()): This is not necessary because cockroachdb.sqlalchemy.run_transaction handles the savepoint/commit logic for you.
  • sqlalchemy.orm.Session.rollback() (or other variants of rollback()): This is not necessary because cockroachdb.sqlalchemy.run_transaction handles the commit/rollback logic for you.
  • Session.flush(): This will not work as expected with CockroachDB because CockroachDB does not support nested transactions, which are necessary for Session.flush() to work properly. If the call to Session.flush() encounters an error and aborts, it will try to rollback. This will not be allowed by the currently-executing CockroachDB transaction created by run_transaction(), and will result in an error message like the following: sqlalchemy.orm.exc.DetachedInstanceError: Instance <FooModel at 0x12345678> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/bhk3).

Break up large transactions into smaller units of work

If you see an error message like transaction is too large to complete; try splitting into pieces, you are trying to commit too much data in a single transaction. As described in our Cluster Settings docs, the size limit for transactions is defined by the kv.transaction.max_intents_bytes setting, which defaults to 256 KiB. Although this setting can be changed by an admin, we strongly recommend against it in most cases.

Instead, we recommend breaking your transaction into smaller units of work (or "chunks"). A pattern that works for inserting large numbers of objects using run_transaction to handle retries automatically for you is shown below.

icon/buttons/copy
from sqlalchemy import create_engine, Column, Float, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from cockroachdb.sqlalchemy import run_transaction
from random import random

Base = declarative_base()

# The code below assumes you are running as 'root' and have run
# the following SQL statements against an insecure cluster.

# CREATE DATABASE pointstore;

# USE pointstore;

# CREATE TABLE points (
#     id INT PRIMARY KEY DEFAULT unique_rowid(),
#     x FLOAT NOT NULL,
#     y FLOAT NOT NULL,
#     z FLOAT NOT NULL
# );

engine = create_engine(
    'cockroachdb://root@localhost:26257/pointstore',
    connect_args={
        'sslmode': 'disable',
    },
    echo=True
)


class Point(Base):
    __tablename__ = 'points'
    id = Column(Integer, primary_key=True)
    x = Column(Float)
    y = Column(Float)
    z = Column(Float)


def add_points(num_points):
    chunk_size = 1000        # Tune this based on object sizes.

    def add_points_helper(sess, chunk, num_points):
        points = []
        for i in range(chunk, min(chunk + chunk_size, num_points)):
            points.append(
                Point(x=random()*1024, y=random()*1024, z=random()*1024)
            )
        sess.bulk_save_objects(points)

    for chunk in range(0, num_points, chunk_size):
        run_transaction(
            sessionmaker(bind=engine),
            lambda s: add_points_helper(
                s, chunk, min(chunk + chunk_size, num_points)
            )
        )


add_points(10000)

Use IMPORT to read in large data sets

If you are trying to get a large data set into CockroachDB all at once (a bulk import), avoid writing client-side code that uses an ORM and use the IMPORT statement instead. It is much faster and more efficient than making a series of INSERTs and UPDATEs such as are generated by calls to session.bulk_save_objects().

For more information about importing data from Postgres, see Migrate from Postgres.

For more information about importing data from MySQL, see Migrate from MySQL.

Prefer the query builder

In general, we recommend using the query-builder APIs of SQLAlchemy (e.g., Engine.execute()) in your application over the Session/ORM APIs if at all possible. That way, you know exactly what SQL is being generated and sent to CockroachDB, which has the following benefits:

  • It's easier to debug your SQL queries and make sure they are working as expected.
  • You can more easily tune SQL query performance by issuing different statements, creating and/or using different indexes, etc. For more information, see SQL Performance Best Practices.

See also

You might also be interested in the following pages:


Yes No
On this page

Yes No