ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries.
This supplemental guide explains how the data generated for TSBS is stored, additional flags available when using the data importer (tsbs_load_clickhouse
),
and additional flags available for the query runner (tsbs_run_queries_clickhouse
).
This should be read after the main README.
Data generated by tsbs_generate_data
for ClickHouse is serialized in a "pseudo-CSV" format,
along with a custom header at the beginning. The header is several lines long:
- one line composed of a comma-separated list of tag labels, with the literal string
tags
as the first value in the list - one or more lines composed of a comma-separated list of field labels, with the hypertable name as the first value in the list
- a blank line
An example for the cpu-only
use case:
tags,hostname,region,datacenter,rack,os,arch,team,service,service_version,service_environment
cpu,usage_user,usage_system,usage_idle,usage_nice,usage_iowait,usage_irq,usage_softirq,usage_steal,usage_guest,usage_guest_nice
Following this, each reading is composed of two rows:
- a comma-separated list of tag values for the reading, with the literal string
tags
as the first value in the list - a comma-separated list of field values for the reading, with the hypertable the reading belongs to being the first value and the timestamp as the second value
An example for the cpu-only
use case:
tags,host_0,eu-central-1,eu-central-1b,21,Ubuntu15.10,x86,SF,6,0,test
cpu,1451606400000000000,58.1317132304976170,2.6224297271376256,24.9969495069947882,61.5854484633778867,22.9481393231639395,63.6499207106198313,6.4098777048301052,44.8799140503027445,80.5028770761136201,38.2431182911542820
Hostname of the ClickHouse server.
User to use to connect to the ClickHouse server. Yes, default user is really called default
Password to use to connect to the ClickHouse server. Default password is empty
Whether to consistently hash data across the multiple insert workers by the value of the primary (first) tag. For datasets with larger numbers of devices, this option helps improve data locality on disk which can lead to better query performance. For datasets with smaller numbers of devices, it is typically not necessary.
File to output periodic CPU and memory statistics. Useful for understanding system performance while writing data to the database.
Comma separated list of hostnames for the ClickHouse servers. Workers are connected to a server in a round-robin fashion.
User to use to connect to the ClickHouse server. Yes, default user is really called default
Password to use to connect to the ClickHouse server. Default password is empty
Add ClickHouse repo
sudo bash -c "echo 'deb http://repo.yandex.ru/clickhouse/deb/stable/ main/' > /etc/apt/sources.list.d/clickhouse.list"
Add key and update repolist
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 # optional
sudo apt-get update
Install binaries
sudo apt-get install -y clickhouse-client clickhouse-server
More details on how to get started with ClickHouse is available here
Ensure ClickHouse is running
sudo service clickhouse-server restart
Install golang
sudo apt install golang-1.9
Add go binaries to PATH for convenience and setup GOPATH env
echo 'export PATH="$HOME/gocode/bin:/usr/lib/go-1.9/bin:$PATH"' >> ~/.bashrc
echo 'export GOPATH="$HOME/gocode"' >> ~/.bashrc
Apply PATH and GOPATH
source ~/.bashrc
Create initial Go folders
mkdir -p $GOPATH/{bin,src}
Get and build TSBS
go get github.com/timescale/tsbs
cd $GOPATH/src/github.com/timescale/tsbs/cmd
go get ./...
go install ./...
Run test
cd $GOPATH/src/github.com/timescale/tsbs/scripts
Generate test dataset. This may take some time.
FORMATS=clickhouse ./generate_data.sh
Generate test queries set. This should not take much time
FORMATS=clickhouse ./generate_queries.sh
Load data set
./load_clickhouse.sh
Run test query set. In this example, there are restrictions on both number of concurrent workers and number of test queries to run. If you have powerful hardware, feel free to rise limits higher.
NUM_WORKERS=1 MAX_QUERIES=10 ./run_queries_clickhouse.sh
Enjoy results in /tmp/bulk_queries/result_queries_clickhouse*
files.