Skip to content

pegasus-kv/pegasus-spark

Repository files navigation

Pegasus-Spark

Pegasus-Spark is the Spark connector to Pegasus. We've provided several toolkits for manipulate your Pegasus data:

  • pegasus-analyser: pegasus-analyser can read the pegasus snapshot data stored in the remote filesystem(HDFS etc.)
    • Offline analysis of your Pegasus snapshot, see example: count data
    • Transform your Pegasus snapshot into Parquet files, see example: convert parquet.
    • Compare your data which stored in two different pegasus clusters, see detail: duplication verify.
  • pegasus-bulkloader: pegasus-bulkloader can convert source data to pegasus data and load into pegasus cluster with the pegasus server 2.1 support, see example: load csv data