Kyle Conroy Gray

Per-Test Database Isolation in Postgres

Over the years, I've tried many different strategies for isolating individual tests at the database layer.

One shared database, but all data is sharded by an account: Works in the short term, but eventually you'll have global tables that aren't scoped to an account.

Creating a new database for each test: Can be slow and error prone. Difficult to ensure all connections have been closed before dropping the database. Some CI providers only provide one logical test database.

Use an in-memory SQLite database: Only works if your application avoids Postgres-specific features. Even then you may still run into runtime differences between database engines.

Truncating the tables between tests: Test parallelization is tricky, if not impossible. No command to truncate all tables, drop all indexes, etc.

Wrap each test in a transaction: Works fine as long as none of your code uses transactions or parallel queries. While Postgres can't nest transactions, some SQL layers simulate support using savepoints.

Give each test a single connection, scoped to temporary tables: Supports individual transactions, but not two open transactions at once. Temporary tables can't be shared across connections.

After trying all these different solutions, I think I've found a winning strategy. Postgres schemas and the search path allow you to isolate tests from each other while using the same logical database. You can think of a schema as a namespace for tables. From the docs:

A database contains one or more named schemas, which in turn contain tables. […] The same object name can be used in different schemas without conflict; for example, both schema1 and myschema can contain tables named mytable.

There are several reasons why one might want to use schemas: […] To allow many users to use one database without interfering with each other

Establish a connection to your database and create a new schema. Here I'm using foo as a placeholder name. In the real world, you'd want to randomly generate a name for each individual test.

CREATE SCHEMA foo

Create a new connection pool to the database and set the search path to only include the new schema.

postgres://localhost/chalkbag?sslmode=disable&search_path=foo

Now you can create as many tables, functions, enums, and indexes as you want, in complete isolation. Once the test is finished, drop the schema. This step is optional if you're using a CI provider that destroys the database at the end of a test run.

DROP SCHEMA foo CASCADE

This strategy only works if you aren't using schemas for other purposes in your application. Here's an example in Go of the above steps.

package main

import (
        "database/sql"

        _ "github.com/lib/pq"
)

func main() {
        dburl := "postgres://localhost/db?sslmode=disable"
        db, err := sql.Open("postgres", dburl)
        if err != nil {
                panic(err)
        }
        if err := provision(db); err != nil {
                panic(err)
        }
}

func provision(db *sql.DB) error {
        // For each test, pick a new schema name at random.
        // `foo` is used here only as an example
        if _, err := db.Exec("CREATE SCHEMA foo"); err != nil {
                return err
        }

        defer func() {
                db.Exec("DROP SCHEMA foo CASCADE")
        }()

        surl := "postgres://localhost/db?sslmode=disable&search_path=foo"
        sdb, err := sql.Open("postgres", surl)
        if err != nil {
                return err
        }

        // Run tests
        _, err = sdb.Exec("SELECT 1")
        return err
}

Using schemas avoids race conditions, makes clean up easy, and requires only a single logical database.