Skip to content

rbrandao/heelflip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status license Maven Central codecov

Heelflip is an JSON aggregator library for Java. It's well-known that aggregation processes are useful but it's very expensive to calculate. Instead of calculate aggregations over JSON files into a relational database or even NoSQL database, we provider a library that does this task for us. Heelflip works in-memory or using Redis to aggregate values.

Maven

Heelflip is available at the Central Maven Repository:

<dependency>
    <groupId>com.github.greatjapa</groupId>
    <artifactId>heelflip</artifactId>
    <version>1.3</version>
</dependency>

How to use

Considering the following bookstore JSON sample:

{"name":"The Odyssey",  "author":"Homer",          "genre":"poem",  "inStock":true, "price":12.50}
{"name":"The Godfather","author":"Mario Puzo",     "genre":"novel", "inStock":true, "price":6.49 }
{"name":"Moby-Dick",    "author":"Herman Melville","genre":"novel", "inStock":false,"price":3.07 }
{"name":"Emma",         "author":"Austen",         "genre":"novel", "inStock":true, "price":30.50}

We can read then as follows:

try(InputStream stream = new FileInputStream("bookstore.json")){
    JsonAgg jsonAgg = new JsonAgg();
    jsonAgg.loadNDJSON(zipStream);
    
    ...
}

Global Aggregations

Once you have a JsonAgg object we can get global aggregations (min, max and sum) doing as follows:

IFieldAgg priceAgg = jsonAgg.getFieldAgg("price");
popAgg.getMin(); // 3.07
popAgg.getMax(); // 30.50
popAgg.getSum(); // 52.56

Or counting their values (count and cardinality):

IFieldAgg genreAgg = jsonAgg.getFieldAgg("genre");
genreAgg.count();          // 4
genreAgg.cardinality();    // 2
genreAgg.count("novel");   // 3

Group By Aggregations

We also can get group by aggregations doing as follows:

IGroupByAgg groupByAgg = jsonAgg.getGroupBy("price", "inStock");
IFieldAgg priceBystockAgg = groupByAgg.groupBy("true");
priceBystockAgg.getMin(); // 6.49
priceBystockAgg.getMax(); // 30.50
priceBystockAgg.getSum(); // 49.49

or

IGroupByAgg groupByAgg = jsonAgg.getGroupBy("name", "inStock");
IFieldAgg namesInStockAgg = groupByAgg.groupBy("true");

namesInStockAgg.distinctValues(); 
//"The Odyssey"
//"The Godfather"
//"Emma"

Dump Report

You can also generate a report with all aggregations accumulated. You just need to do:

jsonAgg.dumpReport("report", true);

This code snippet will create a directory in the following structure:

report
│   __missingGroupBy.json
└───name
│   │   __name.json
│   │   name_groupBy_author.json
│   │   name_groupBy_genre.json
│   │   name_groupBy_inStock.json
│   │   name_groupBy_price.json
└───author
└───genre
└───inStock
└───price

Each JSON field has its own directory, for instance, name, author etc. This directory has the following items:

  1. __<field_name>.json file with field global aggregations;
  2. All combinations of group by aggregations separated in different files in this format <field_name>_groupBy_<field_name>.json.

Finally, the root directory contains a file __missingGroupBy.json with a list of missing group by combination.

How to scale?

Creating JsonAgg with default constructor means that you will aggregate JSON files in-memory. If you try to load several JSONs, the JVM may raise OutOfMemoryError. To avoid this kind of error, we provide an alternative to process aggregations in Redis instead of in-memory. See code below:

JedisPool pool = new JedisPool(new JedisPoolConfig(), "localhost");
try (Jedis jedis = pool.getResource()) {
    JsonAgg jsonAgg = new JsonAgg(jedis);
    ...
}

How aggregation works over non-flat JSON

There is no problem if you have JSONs with array or nested objects. What Heelflip does is calculate aggregations over fields and their values. But the Heelflip API is based on field names and, for that, we need to rename JSON fields when arrays or objects appears.

For instance, the following JSON entry:

{"name":"Steve", "age":30, "address":{"street":"8nd Street", "city":"New York"}}

will be read as:

{"name":"Steve", "age":30, "address.street":"8nd Street", "address.city":"New York"}}

To retrieve the information about the field "city" we need to concatenate the field names as showed below:

FieldAgg cityAgg = agg.getFieldAgg("address.city");
cityAgg.count();

Similarly, the following JSON entry with arrays:

{ "city" : "SPRINGFIELD", "loc" : [ -72.577769, 42.128848 ], "pop" : 22115}

will be read as:

{ "city" : "SPRINGFIELD", "loc_0" : -72.577769, "loc_1": 42.128848, "pop" : 22115}

NOTE: The examples above are used only to ilustrate how we rename field names for objects and arrays to generate unique names at aggregation time. It is important to understand that we do not flatten the JSON.

About

JSON aggregator in Java

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages