Skip to content

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.

License

Notifications You must be signed in to change notification settings

yagagagaga/shc

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Spark - Apache HBase Connector

forked from hortonworks-spark/shc.

Change Log

This version based on hortonworks SHC. You can store the DataFrame into HBase by using bulkload. This is an example:

df.write
  .format("hbase")
  .option(HBaseTableCatalog.tableName, "test_table")
  .option(HBaseTableCatalog.rowKey, "rk")
  .option(HBaseTableCatalog.cf, "f")
  .option(HBaseRelation.WRITE_MODE, HBaseRelation.Restrictive.BULKLOAD)
  .option(HBaseRelation.HFILE_TEMP_PATH, "hdfs:///tmp/hfile")
  .save()

// structured-streaing is also suported
df.writeStream
  .format("hbase")
  .option("checkpointLocation", "hdfs:///tmp/structured-streaming-checkpoint/")
  .option(HBaseTableCatalog.tableName, "test_table")
  .option(HBaseTableCatalog.rowKey, "rk")
  .option(HBaseTableCatalog.cf, "f")
  .option(HBaseRelation.WRITE_MODE, HBaseRelation.Restrictive.BULKLOAD)
  .option(HBaseRelation.HFILE_TEMP_PATH, "hdfs:///zmk/hfile")
  .outputMode(OutputMode.Append())
  .trigger(Trigger.ProcessingTime(Seconds(10).milliseconds))
  .start()
  .awaitTermination()

More information can be found here.

About

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala 95.8%
  • Java 2.7%
  • Shell 1.5%