724 views
 owned this note
--- tags: CockroachDB, benchmark, SQLAlchemy, python, pgbench --- # **BEFORE YOU READ** "Because CockroachDB uses optimistic concurrency control, there is no locking (like in Postgres). Because your SELECT queries access the same rows that are being inserted, conflicts are unavoidable."([source](https://forum.cockroachlabs.com/t/insert-failed-on-transaction/831/7)). This means that without [CockroachDB client side intervention](https://www.cockroachlabs.com/docs/stable/transactions.html#client-side-intervention) you are not able to have more than one concurrent connection to the database. Cockroach Labs have been asked the question of how to perform comparable testing in 26.03.2018, but the GitHub thread was closed in 11.04.2019 with the only response being "tl;dr use [workload](https://www.cockroachlabs.com/docs/stable/cockroach-workload.html)" ([source](https://github.com/cockroachdb/cockroach/issues/24217#issuecomment-376386052)), which cannot be ran without an enterprise licence. ([explanation](https://codimd.web.cern.ch/s/BkSShRAGr#In-a-secure-cluster)) # Testing CockroachDB This document is a report of testing the performance of the cluster from [Implementing CockroachDB on Kubernetes](https://codimd.web.cern.ch/s/HybblSufH). The tests include [python](https://www.python.org/) using the [SQLAlchemy](https://www.sqlalchemy.org/) toolkit for database access, [pgbench](https://www.postgresql.org/docs/10/pgbench.html) which is [PostgreSQL](https://www.postgresql.org/)'s own testing application, and the CockroachDB's in-built workload generator. [TOC] ## SQLAlchemy The first approach was to manually code testing scripts to benchmark the overall performance of CockroachDB, mainly targetting the insertion rate. ### Deciding on insertion method To decide on the best way of inserting data, some experimenting was done. SQLAlchemy's docs include a [testing section](https://docs.sqlalchemy.org/en/13/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow), where they tried five different ways of inserting data to the database and showed results of the insertion. This was crucial information as it might've had an effect on how useful CockroachDB could be found. To get the best possible result, four different insertionmethods were tested. To gain reliable testdata, all methods were given a task (or _n_) to insert 100000 accounts in to the database. All tests were ran 10 times with three instances running in parallel resulting in 30 runs per method. The [original](https://docs.sqlalchemy.org/en/13/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow) methods are written in python2, but CockroachDB runs in python3 so some modification was needed to make the code run. Also most of the original code used ``` session.flush() ``` every 1000 accounts, but due to CockroachDB timing out, that part was changed to ``` session.commit() ``` in the testscripts. ### SQLAlchemy ORM method ```python= for i in range(n): account = Accounts() account.id = 'a' + str(i) account.balance = (3 + i) session.add(account) if i % 1000 == 0 session.commit() session.commit() ``` ### SQLAlchemy primary key given method ```python= for i in range(n): account = Accounts(id='a' + str(i), balance=3 + i) session.add(account) if i % 1000 == 0: session.commit() session.commit() ``` ### SQLAlchemy bulk insertion method ```python= for chunk in range(0, n, 10000): session.bulk_save_objects( [ Accounts(id="NAME " + str(i), balance=3 + i) for i in range(chunk, min(chunk + 10000, n)) ] ) session.commit() ``` ### SQLAlchemy core insertion method The main idea was to insert all the lines to add once, which got timed out by CockroachDB. After 10 tries, the method was split into chunks of 10000 accounts. The method might've been faster if insertions were done in larger chunks, but the speedup might've not been enough to pass the fastest one so it wasn't tried out. Original core method: ```python= engine.execute( Account.__table__.insert(), [{"name": 'NAME ' + str(i)} for i in range(n)] ) ``` Script used in tests: ```python= for k in range(10): engine.execute( Accounts.__table__.insert(), [{"id": 'NAME ' + str(i) + str(k), "balance": 3 + 1} for i in range(int(n/10))] ) ``` ## Pgbench CockroachDB doesn't naturally support pgbench, but pgbench can be used after manually initializing the tables used by pgtest [(source)](https://forum.cockroachlabs.com/t/i-use-postgres-pgbench-to-test-cockroachdb-bug-error/1052/3). ### Enabling pgtest To use pgbench in testing of CockroachDB, the following [SQLAlchemy](https://www.sqlalchemy.org/) script was used to initialize and populate the database: ```python= from sqlalchemy import create_engine, String, Integer, Column, DateTime, PrimaryKeyConstraint from sqlalchemy.ext.automap import automap_base from sqlalchemy.orm import sessionmaker from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() # mapping the database and tables created by "pgbench -i" # all classes have primary keys which are not included in "-i" due to SQLAlchemy demanding a PK # normally Postgres would assign primary keys. class Accounts(Base): __tablename__ = 'pgbench_accounts' aid = Column(Integer, nullable= False, primary_key = True, index=True) bid = Column(Integer) abalance = Column(Integer) filler = Column(String(84)) class Branches(Base): __tablename__ = 'pgbench_branches' bid = Column(Integer, nullable= False, primary_key = True, index=True) bbalance = Column(Integer) filler = Column(String(88)) class History(Base): __tablename__ = 'pgbench_history' tid = Column(Integer) bid = Column(Integer) aid = Column(Integer) delta = Column(Integer) mtime = Column(DateTime(timezone= False)) filler = Column(String(22)) __table_args__ = ( PrimaryKeyConstraint('tid', 'bid', 'aid', 'mtime'), {}, ) class Teller(Base): __tablename__ = 'pgbench_tellers' tid = Column(Integer, nullable= False, primary_key = True, index=True) bid = Column(Integer) tbalance = Column(Integer) filler = Column(String(84)) # connection args to connect into a secure cockroach database connect_args = { 'sslmode': 'require', 'sslrootcert': 'cockroach-certs/ca.crt', 'sslkey': 'cockroach-certs/client.root.key', 'sslcert': 'cockroach-certs/client.root.crt' } # engine for database connection engine = create_engine( 'cockroachdb://haha:haha123@188.185.117.55:30333/test', connect_args=connect_args, echo=True ) # create the mapped database to given location Base.metadata.create_all(engine) # automap_base-object to transfer information on databases current state base = automap_base() base.prepare(engine, reflect=True) # open a new session for testclasses to use ses = sessionmaker(bind=engine, autoflush=True) session = ses() # vacuuming the database after a run (deleting all data from tables) pg_accounts = base.classes.pgbench_accounts session.query(pg_accounts).delete() pg_branches = base.classes.pgbench_branches session.query(pg_branches).delete() pg_history = base.classes.pgbench_history session.query(pg_history).delete() pg_tellers = base.classes.pgbench_tellers session.query(pg_tellers).delete() # insert the same data into the database as "pgbench -i" # 1 branch # 10 tellers # 100000 accounts branch = pg_branches(bid = 1, bbalance = 0) session.add(branch) session.commit() a = 1 # aid for accounts for j in range(10): teller = pg_tellers(tid = j+1, bid = 1, tbalance = 0) session.add(teller) session.commit() for i in range(10000): account = pg_accounts(aid = a, bid = 1, abalance = 0) session.add(account) a += 1 session.commit() ``` To understand connecting to the database better, please read [connecting with an application](https://codimd.web.cern.ch/s/HybblSufH#Connecting-to-the-database-with-an-application) from the implement documentation. The original tables created by pgbench initialization function: ``` postgres=# \d+ List of relations Schema | Name | Type | Owner | Size | Description --------+------------------+-------+----------+---------+------------- public | pgbench_accounts | table | postgres | 13 MB | public | pgbench_branches | table | postgres | 40 kB | public | pgbench_history | table | postgres | 5120 kB | public | pgbench_tellers | table | postgres | 48 kB | (4 rows) postgres=# \d+ pgbench_accounts Table "public.pgbench_accounts" Column | Type | Modifiers | Storage | Stats target | Description ----------+---------------+-----------+----------+--------------+------------- aid | integer | not null | plain | | bid | integer | | plain | | abalance | integer | | plain | | filler | character(84) | | extended | | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) Has OIDs: no Options: fillfactor=100 postgres=# \d+ pgbench_branches Table "public.pgbench_branches" Column | Type | Modifiers | Storage | Stats target | Description ----------+---------------+-----------+----------+--------------+------------- bid | integer | not null | plain | | bbalance | integer | | plain | | filler | character(88) | | extended | | Indexes: "pgbench_branches_pkey" PRIMARY KEY, btree (bid) Has OIDs: no Options: fillfactor=100 postgres=# \d+ pgbench_history Table "public.pgbench_history" Column | Type | Modifiers | Storage | Stats target | Description --------+-----------------------------+-----------+----------+--------------+------------- tid | integer | | plain | | bid | integer | | plain | | aid | integer | | plain | | delta | integer | | plain | | mtime | timestamp without time zone | | plain | | filler | character(22) | | extended | | Has OIDs: no postgres=# \d+ pgbench_tellers Table "public.pgbench_tellers" Column | Type | Modifiers | Storage | Stats target | Description ----------+---------------+-----------+----------+--------------+------------- tid | integer | not null | plain | | bid | integer | | plain | | tbalance | integer | | plain | | filler | character(84) | | extended | | Indexes: "pgbench_tellers_pkey" PRIMARY KEY, btree (tid) Has OIDs: no Options: fillfactor=100 ``` ### Pgbench results Running pgbench against a fresh empty postgresql database: ``` # pgbench -c 1 -t 100000 number of clients: 1 number of threads: 1 number of transactions per client: 100000 number of transactions actually processed: 100000/100000 tps = 744.244528 (including connections establishing) tps = 744.264729 (excluding connections establishing) ``` Running pgbench against the [tutorial](https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes.html) CockroachDB cluster set up on Kubernetes with tables set up via SQLAlchemy: ``` # pgbench -h 188.185.117.55 -p 30333 -U haha -c 1 -t 1000 -n test number of clients: 1 number of threads: 1 number of transactions per client: 1000 number of transactions actually processed: 1000/1000 tps = 26.229644 (including connections establishing) tps = 26.308052 (excluding connections establishing) ``` It can be seen that in this test performed on pgbench CockroachDB was ~30 times slower than the traditional postgres. Also because CockroachDB performs concurrency control "optimistically" ([source](https://forum.cockroachlabs.com/t/insert-failed-on-transaction/831/7)), it does not support pgbench connecting from multiple clients, so this was the only test that could be performed. ## CockroachDB workload The CockroachDB client offers the inbuilt [workload](https://www.cockroachlabs.com/docs/stable/cockroach-workload.html) which should be able to be used to run test workloads against the database. This should include tests that "Tests a batch of queries very similar to those that that PGBench runs" ([source](https://github.com/cockroachdb/cockroach/blob/master/pkg/bench/pgbench_test.go)). ### In a secure cluster After following the [tutorial](https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes.html) from CockroachDB, the cluster created is secure. Trying to initialize the workload provides: ``` # kubectl exec -it cockroachdb-client-secure -- ./cockroach workload init bank # \ 'postgres://haha:haha123@188.185.117.55:30192' Error: pq: only superusers are allowed to CREATE DATABASE command terminated with exit code 1 ``` This means that you need to initialize the test database manually, because you are not able to [grant admin role](https://www.cockroachlabs.com/docs/stable/grant-roles.html#grant-the-admin-option) to another user due to it being an [enterprise feature](https://www.cockroachlabs.com/docs/stable/enterprise-licensing.html). Creating the Bank-table inside the CockroachDB client from [Bank workload](https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/bank/bank.go): ```sql CREATE TABLE bank( id INT PRIMARY KEY, balance INT, payload STRING, FAMILY (id, balance, payload) ); ``` After creating the database manually, the workload calls `SHOW CLUSTER SETTINGS`, which can only be called by an admin user ([source](https://www.cockroachlabs.com/docs/stable/cluster-settings.html)). ``` # kubectl exec -it cockroachdb-client-secure -- ./cockroach workload run bank # \ 'postgres://haha:haha123@188.185.117.55:30333' Error: pq: only superusers are allowed to SHOW CLUSTER SETTINGS command terminated with exit code 1 ``` Root cannot be used while connecting with a database connection string: ``` # kubectl exec -it cockroachdb-culient-secure -- ./cockroach workload init bank # \ 'postgres://root@188.185.117.55:30333' Error: pq: user root must use certificate authentication instead of password authentication command terminated with exit code 1 ``` and with the proper certificates attached: ``` #kubectl exec -it cockroachdb-client-secure -- ./cockroach workload init bank # \'postgres://root@188.185.117.55:30333?sslmode=require&sslrootcert=cockroach-certs/ca.crt&sslkey=cockroach-certs/client.root.key&sslcert=cockroach-certs/client.root.crt' Error: pq: user root must use certificate authentication instead of password authentication command terminated with exit code 1 ``` Also GSSAPI Authentication is an enterprise feature in CockroachDB, which means that certificates cannot be passed as parameters to the workload generator ([source](https://www.cockroachlabs.com/docs/stable/gssapi_authentication.html)). This means that running workload on a secure cluster cannot be done without an enterprise licence. ### Insecure cluster The other approach is to create an insecure cluster following [this tutorial](https://www.cockroachlabs.com/docs/stable/orchestrate-a-local-cluster-with-kubernetes-insecure.html#step-3-use-the-built-in-sql-client), and then using root as the user with privileges to `CREATE DATABASE`. This was not tested due to insecure never being the use-case in production. ## Conclusion CockroachDB cannot be tested with any of the presented methods comparatively. Properly benchmarking the framework with the inbuilt workload generator requires an enterprise licence, which means that the open source version cannot be benchmarked with comparable results.