- Generating events to an Azure EventHub with a Kafka enabled endpoint. The Apache Kafka connectors for Structured Streaming are packaged in Databricks Runtime.
- Reading the generated events with the Kafka libs. Sample Notebook reading data from an EventHub with Kafka enabled endpoint, writing the data to an Azure Data Lake Store serialized as JSON partitioned by ingest-date.
- Reading the generated events with the Azure Event Hubs libs.
- Best practice is to archive incoming events on an Azure Event Hub with EventHub capturing enabled on an EventHubs. Events are captured in Azure Blob or Azure Data Lakes Store in Avro. This Notebook demonstrated how to read the captured events.
The driver notebook reads all tables in the given schema and triggers the copy-notebook copying the Spark SQL table to the Azure SQL DB.
Demo-Notebooks to read data from an Azure Storage Container added as an additional custom endpoint to Azure IoT-Hub:
In this example the Blob file format is configured as:
input/simdev/ingestdate={YYYY}-{MM}-{DD}/{HH}{mm}{iothub}{partition}.avro
Important:
- use the extension .avro
- think how to partition the data, in this example it is daily, if you want to read data hourly define partitions with a finer granularity
Configure a route to the new custom endpoint for the Device Messages. You have to type in nothing in the route query string, it will be true by default, so all messages are routed to the endpoint.
Now all new messages will be streamed to the new endpoint. If you mount the storage container in Azure Databricks, the data can be accessed directly under the mountpoint.
Sample Notebooks can be found here: