Skip to content

Latest commit

 

History

History
90 lines (67 loc) · 4.89 KB

README.md

File metadata and controls

90 lines (67 loc) · 4.89 KB

cjq

cjq is an implementation of the jq programming language. jq is a powerful JSON processing tool similar in flavor to sed, awk, and grep. The standard implementation of jq is a bytecode interpreter. cjq borrows many of jq's internals but implements a bytecode to LLVM compiler rather than a bytecode interpreter.

cjq does not currently lower to LLVM IR in the same way most traditional compilers do. cjq works by first running the standard jq interpreter on a jq program (or filter) and 0 or more JSON documents. As the VM executes jq bytecode instructions, the dynamic sequence of bytecode opcodes is recorded (or traced) and lowered to LLVM IR while the VM is still runnning. Using clang, the generated LLVM IR can be lowered to native machine code. The resultant binary executable can then process JSON input(s) from the command line in a similar way to the standard jq implementation. The only difference being that the binary executable contains the traced information so the jq filter does not need to be provided.

Example:

  1. Build the LLVM IR generator. NOTE: This step only needs to be performed once.
make llvmgen
  1. Trace and lower to LLVM IR
./llvmgen -Cf /path/to/jq/file /path/to/json/file
  1. Compile to native machine code
make run    # or 'make run_opt' for O3 clang optimizations
            # could use -j flag to parallelize the compilation process (e.g. -j4)
  1. Run the binary executable on compatible JSON documents
./run -Cf /path/to/json/file

cjq is currently in version 0.1... and it shows. Obviously, tracing a dynamic sequence of opcodes becomes a bottleneck as the number of dynamic opcodes becomes larger. To help mitigate this, cjq does use a variant of run-length encoding to compress the dynamic sequence of opcodes in an effort to slow the rapid explosion in size of the generated LLVM IR. However, compression can only reduce the code size so much. Consequently, cjq is not the ideal tool for executing simple jq filters that incur many dynamic opcodes.

Example:

echo 1000000 | ./llvmgen -Cf  path/to/jq_benchmarks/add.jq

add.jq

[range(.) | [.]] | add

The above jq filter incurs over 1,000,000 dynamic opcodes and produces slightly more lines of LLVM IR. This results in tracing taking a long time and compiling taking even longer. It should be noted that the above filter happens to produce a sequence of dynamic opcodes that is not as amenable to compression as others (i.e. opcode sequences of the same scale exist that can be compressed more efficiently than in this example).

cjq is a good choice for compiling long, verbose (for jq) programs to an optimized binary executable. So you can process compatible JSON data without needing to enter a long jq filter every time.

Example (borrowed from this tutorial):

./llvmgen '[.docs[] | { title, author_name: .author_name[0], publish_year: .publish_year[0] } | select(.publish_year != null and .author_name != null)] | group_by(.author_name) | .[] | {author_name: .[0].author_name}' path/to/openlibrary_example1.json
make run_opt -j4   # compile with optimizations
./run_opt /path/to/openlibrary_example1.json    # no more long filter required!

In case you're interested, cjq does perform quite well compared to the standard jq implementation.

The obvious caveat being the compile time overhead introduced by cjq. For very large opcode sequences that are not good candidates for compression, compilation can take hours... consider yourself warned⚠️

Installation

Dependencies

Instructions

NOTE: There is currently an installation bug. Currently working on a Docker image. Better installation instructions to come.

# clone the cjq repo
make llvmgen    # run this command
./llvmgen   # a help menu should show up
./cjq/tests/basic_ops/test_basic_ops.sh   # run a bunch of tests to see if your installation is working. You can go to cjq/tests/basic_ops, and find a testresults.log file
# if you fail any tests, there's an installation issue
# NOTE: you need to install jq to run the above tests
# If something fails, you might need to cd as follows
cd cjq/jq/oniguruma
# Read the oniguruma README on how to install oniguruma locally then try installing it
# Also let me know if there's a chmod type issue. Like you see an error saying access to a certain directory has been denied