-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Debezium 2.x docs #1149
Add Debezium 2.x docs #1149
Changes from 6 commits
21dd6c8
1595164
0c5e07b
bb480f1
17f6ebb
d39cd98
5e18cce
4e0825f
4fd775f
a25adae
3642046
941f476
9d68f3c
cf480cb
28df78e
f140c77
1313239
75bf12e
47a11bf
d561f2c
d03fbb3
e562794
14ba5e9
1ecd47d
9daaafb
8870d99
31448f4
0c61046
a7c0691
5c1f269
e7c65e7
c2afb09
f85e53c
2cfdd2d
2b8a37b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,165 @@ | ||||||
= CDC Connector | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
[.enterprise]*Enterprise* | ||||||
|
||||||
Change Data Capture (CDC) refers to the process of observing changes | ||||||
made to a database and extracting them in a form usable by other | ||||||
systems, for the purposes of replication, analysis and many more. | ||||||
|
||||||
Change Data Capture is especially important to Hazelcast, because it allows | ||||||
for the _streaming of changes from databases_, which can be efficiently | ||||||
processed by the Jet engine. | ||||||
|
||||||
Implementation of CDC in Hazelcast Enterprise is based on | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
link:https://debezium.io/[Debezium 2.x]. Hazelcast offers a generic Debezium source | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
which can handle CDC events from link:https://debezium.io/documentation/reference/2.7/connectors/index.html[any database supported by Debezium], | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
but we're also striving to make CDC sources first class citizens in Hazelcast. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
The ones for MySQL and PostgreSQL already are. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
== Installing the Connector | ||||||
|
||||||
This connector is included in the full distribution of Hazelcast Enterprise. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
=== Maven | ||||||
For using this connector inside Maven project you can add following entries into `pom.xml`'s `<dependencies>` section: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Generic connector: | ||||||
```xml | ||||||
<dependency> | ||||||
<groupId>com.hazelcast.jet</groupId> | ||||||
<artifactId>hazelcast-enterprise-cdc-debezium</artifactId> | ||||||
<version>{full-version}</version> | ||||||
<classifier>jar-with-dependencies</classifier> | ||||||
</dependency> | ||||||
``` | ||||||
|
||||||
MySQL-specific connector: | ||||||
```xml | ||||||
<dependency> | ||||||
<groupId>com.hazelcast.jet</groupId> | ||||||
<artifactId>hazelcast-enterprise-cdc-mysql</artifactId> | ||||||
<version>{full-version}</version> | ||||||
<classifier>jar-with-dependencies</classifier> | ||||||
</dependency> | ||||||
``` | ||||||
Note: MySQL connector does not include MySQL driver as a dependency. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
PostgreSQL-specific connector: | ||||||
```xml | ||||||
<dependency> | ||||||
<groupId>com.hazelcast.jet</groupId> | ||||||
<artifactId>hazelcast-enterprise-cdc-postgres</artifactId> | ||||||
<version>{full-version}</version> | ||||||
<classifier>jar-with-dependencies</classifier> | ||||||
</dependency> | ||||||
``` | ||||||
|
||||||
== CDC as a Source | ||||||
|
||||||
We have the following types of CDC sources: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
* link:https://docs.hazelcast.org/docs/{full-version}/javadoc/com/hazelcast/jet/cdc/DebeziumCdcSources.html[DebeziumCdcSources]: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
generic source for all databases supported by Debezium | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* link:https://docs.hazelcast.org/docs/{full-version}/javadoc/com/hazelcast/jet/cdc/mysql/MySqlCdcSources.html[MySqlCdcSources]: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
specific, first class Jet CDC source for MySQL databases (also based | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
on Debezium, but benefiting the full range of convenience Jet can | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
additionally provide) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* link:https://docs.hazelcast.org/docs/{full-version}/javadoc/com/hazelcast/jet/cdc/postgres/PostgresCdcSources.html[PostgresCdcSources]: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
specific, first class CDC source for PostgreSQL databases (also based | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
on Debezium, but benefiting the full range of convenience Hazelcast can | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
additionally provide) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
For the setting up a streaming source of CDC data is just the matter of pointing it at the right database via configuration: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```java | ||||||
Pipeline pipeline = Pipeline.create(); | ||||||
pipeline.readFrom( | ||||||
MySqlCdcSources.mysql("customers") | ||||||
.setDatabaseAddress("127.0.0.1", 3306) | ||||||
.setDatabaseCredentials("debezium", "dbz") | ||||||
.setClusterName("dbserver1") | ||||||
.setDatabaseIncludeList("inventory") | ||||||
.setTableIncludeList("inventory.customers") | ||||||
.build()) | ||||||
.withNativeTimestamps(0) | ||||||
.writeTo(Sinks.logger()); | ||||||
``` | ||||||
|
||||||
MySQL- and PostgreSQL-specific source builders contain methods for all major configuration setting and it guards if | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
e.g. mutually exclusive options are not used. For generic source builder user must rely on Debezium's documentation | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
to provide all necessary options. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
For an example of how to use CDC data see xref:pipelines:cdc.adoc[our tutorial]. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
=== Common source builder functions | ||||||
[cols="m,a"] | ||||||
|=== | ||||||
|Method name|Description | ||||||
|
||||||
|changeRecord() | ||||||
| Sets output type to ChangeRecord - a wrapper, providing most of the fields in | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
strongly-typed manner. | ||||||
|
||||||
| json() | ||||||
| Sets output type to JSON - result stage will have `Map<String, String>` as it's type, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
where key is SourceRecord's key in json format and value is whole SourceRecord's value in json string. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
|customMapping(RecordMappingFunction<T>) | ||||||
| Sets output type to some arbitrary user type `T`. Mapping from `SourceRecord` to `T` will | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
be done using provided function. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to improve this - is it done automatically, or can we say what the "provided function" is that they need to use? |
||||||
|
||||||
|withDefaultEngine() | ||||||
|Sets preferred engine to default (non-async) one. This engine is single-threaded, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
but also older and more tested. For most stable results (e.g. no async offset restore) this engine should be preferred. For MySQL and PostgreSQL especially this engine makes the most sense, as MySQL and PostgreSQL Debezium connectors are single-threaded only. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
|withAsyncEngine() | ||||||
|Sets preferred engine to async one. This engine is multithreaded (if connector supports | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
it), but you must be aware of it's async nature, e.g. offset restore after restart is done | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
asynchronously as well, leading to sometimes confusing results. | ||||||
|
||||||
|setProperty(String, String) | ||||||
|Sets connector property to given value. There are multiple overloads, allowing to | ||||||
sets value to some `long`, `String` or `boolean`. | ||||||
|
||||||
|=== | ||||||
|
||||||
=== Fault Tolerance | ||||||
|
||||||
CDC sources offer at least-once processing guarantees. The source | ||||||
periodically saves the database write ahead log offset for which it had | ||||||
dispatched events and in case of a failure/restart it will replay all | ||||||
events since the last successfully saved offset. | ||||||
|
||||||
Unfortunately, however, there is no guaran`tee that the last saved offset | ||||||
is still in the database changelog. Such logs are always finite and | ||||||
depending on the DB configuration can be relatively short, so if the CDC | ||||||
source has to replay data for a long period of inactivity, then there | ||||||
can be a data loss. With careful management though we can say that | ||||||
at-least once guarantee can practically be provided. | ||||||
|
||||||
== CDC as a Sink | ||||||
|
||||||
Change data capture is a source-side functionality in Jet, but we also | ||||||
offer some specialized sinks that simplify applying CDC events to a map, which gives you the ability to reconstruct the contents of the | ||||||
original database table. The sinks expect to receive `ChangeRecord` | ||||||
objects and apply your custom functions to them that extract the key and | ||||||
the value that will be applied to the target map. | ||||||
|
||||||
For example, a sink mapping CDC data to a `Customer` class and | ||||||
maintaining a map view of latest known email addresses per customer | ||||||
(identified by ID) would look like this: | ||||||
|
||||||
```java | ||||||
Pipeline p = Pipeline.create(); | ||||||
p.readFrom(source) | ||||||
.withoutTimestamps() | ||||||
.writeTo(CdcSinks.map("customers", | ||||||
r -> r.key().toMap().get("id"), | ||||||
r -> r.value().toObject(Customer.class).email)); | ||||||
``` | ||||||
|
||||||
[NOTE] | ||||||
==== | ||||||
The key and value functions have certain limitations. They can be used to map only to objects which the Hazelcast member can deserialize, which unfortunately doesn't include user code submitted as a part of the job. So in the above example it's OK to have `String` email values, but we wouldn't be able to use `Customer` directly. | ||||||
|
||||||
If user code has to be used, then the problem can be solved with the help of the User Code Deployment feature. Example configs for that can be seen in our xref:pipelines:cdc-join.adoc#7-start-hazelcast-jet[CDC Join tutorial]. | ||||||
==== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.