Steps to use

Prerequisites

Hadoop 3.1 or later cluster.
Apache Hive.
Between 15 minutes and 2 days to generate data (depending on the Scale Factor you choose and available hardware).
Have the following packages. If your system does not have it, install it using apt-get or similar.
```
bc, date, gcc, nohup, python3, timeout, zip
```

Clone

git clone https://github.com/kcheeeung/hive-benchmark.git
cd hive-benchmark/

Individual Steps

1. Build the benchmark

Build the benchmark you want to use (do all the prerequisites)

TPC-DS

./tpcds-build.sh

TPC-H

./tpch-build.sh

2. Generate the tables

Decide how much data you want. SCALE approximately is about # ~GB. Supported FORMAT includes: orc and parquet.

TPC-DS

nohup sh util_tablegentpcds.sh 10 orc

TPC-H

nohup sh util_tablegentpch.sh 10 orc

3. Run all the queries

SCALE must be the SAME size from an existing database!
Modify your settings in settings.sql.
By default each query has a timeout it set to 2 hours! Change in util_internalRunQuery.sh where TIME_TO_TIMEOUT=120m.
Run the queries!

TPC-DS Benchmark

nohup sh util_runtpcds.sh 10 orc

TPC-H Benchmark

nohup sh util_runtpch.sh 10 orc

Troubleshooting

Advanced Usage

Learn about Advanced Usage and Recommended Setup

Did my X step finish?

Check aaa_clock.txt file.

ps -ef | grep sshuser
ps -ef | grep .sh
ps -ef | grep beeline

How to debug

Debug is enabled by default.

DEBUG_SCRIPT=true

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
PAT		PAT
ddl-tpcds		ddl-tpcds
ddl-tpch		ddl-tpch
sample-queries-tpcds		sample-queries-tpcds
sample-queries-tpch		sample-queries-tpch
settings		settings
spark-queries-tpcds		spark-queries-tpcds
tpcds-gen		tpcds-gen
tpch-gen		tpch-gen
.gitignore		.gitignore
README.md		README.md
README_advanced.md		README_advanced.md
parselog.py		parselog.py
settings.sql		settings.sql
tpcds-build.sh		tpcds-build.sh
tpch-build.sh		tpch-build.sh
util_connect.sh		util_connect.sh
util_internalGetPAT.sh		util_internalGetPAT.sh
util_internalRunQuery.sh		util_internalRunQuery.sh
util_runtpcds.sh		util_runtpcds.sh
util_runtpch.sh		util_runtpch.sh
util_tablegentpcds.sh		util_tablegentpcds.sh
util_tablegentpch.sh		util_tablegentpch.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Steps to use

Prerequisites

Clone

Individual Steps

1. Build the benchmark

2. Generate the tables

3. Run all the queries

Troubleshooting

Advanced Usage

Did my X step finish?

How to debug

About

Releases 4

Packages

Languages

kcheeeung/hive-benchmark

Folders and files

Latest commit

History

Repository files navigation

Steps to use

Prerequisites

Clone

Individual Steps

1. Build the benchmark

2. Generate the tables

3. Run all the queries

Troubleshooting

Advanced Usage

Did my X step finish?

How to debug

About

Topics

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages