Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting binary encoding used with Canal-JSON #13832

Merged
merged 22 commits into from
Jul 18, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
ade774c
Documented encoding of binary data in Canal-JSON
benmeadowcroft Jun 9, 2023
ae48cd3
Merge branch 'pingcap:master' into master
benmeadowcroft Jun 9, 2023
578ec5e
Fixed lint issue of multiple blank lines
benmeadowcroft Jun 9, 2023
24e150e
Merge branch 'pingcap:master' into master
benmeadowcroft Jun 9, 2023
4430560
Merge branch 'master' of https://github.com/benmeadowcroft/tidb-docs
benmeadowcroft Jun 9, 2023
a81c88e
Added brief example of the encoding
benmeadowcroft Jun 12, 2023
d4e96f5
Corrected example encoding
benmeadowcroft Jun 19, 2023
2aef099
Merge branch 'pingcap:master' into master
benmeadowcroft Jun 20, 2023
4c60e75
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
d7c2da4
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
8580fb3
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
9b2b5c8
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
171172b
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
e102f8e
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
51ea41a
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
9ab965c
Merge branch 'pingcap:master' into master
benmeadowcroft Jun 30, 2023
77e6e89
Update to Backup & Recovery roadmap
benmeadowcroft Jun 30, 2023
3a98182
format updates
qiancai Jul 10, 2023
227fd19
Revert "Update to Backup & Recovery roadmap"
benmeadowcroft Jul 11, 2023
55e00ed
Merge branch 'master' of https://github.com/benmeadowcroft/tidb-docs
benmeadowcroft Jul 11, 2023
90f3244
Addressing review comments
benmeadowcroft Jul 11, 2023
3797083
Merge remote-tracking branch 'upstream/master'
benmeadowcroft Jul 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions ticdc/ticdc-canal-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,45 @@ The following table shows the mapping relationships between Java SQL Types in Ti

For more information about Java SQL Types, see [Java SQL Class Types](https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html).

## Binary and Blob types
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved

TiCDC encodes [binary types](/data-type-string.md#binary-type) in the Canal-JSON format by converting each byte to its character representation as follows:

- Printable characters are represented using the ISO/IEC 8859-1 character encodings.
- Non-printable characters and certain characters with special meaning in HTML are represented using their UTF-8 escape sequence.

The following table shows the detailed representation information.

| Character type | Value range | Character representation |
| :---------------------------| :-----------| :---------------------|
| Control characters | [0, 31] | UTF-8 escape (such as `\u0000` through `\u001F`) |
| Horizontal tab | [9] | `\t` |
| Line feed | [10] | `\n` |
| Cariage return | [13] | `\r` |
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved
| Printable characters | [32, 127] | Literal character (such as `A`) |
| Ampersand | [38] | `\u0026` |
| Less-than sign | [60] | `\u0038` |
| Greater-than sign | [62] | `\u003E` |
| Extended control characters | [128, 159] | Literal character |
| ISO 8859-1 (Latin-1) | [160, 255] | Literal character |

### Example of the encoding

For example, the following 16 bytes `[5 7 10 15 36 50 43 99 120 60 38 255 254 45 55 70]` stored in a `VARBINARY` column called `c_varbinary` is encoded in a Canal-JSON `Update` event as:
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved

```json
{
...
"data": [
{
...
"c_varbinary": "\u0005\u0007\n\u000f$2+cx\u003c\u0026ÿþ-7F"
}
]
...
}
```

## Comparison of TiCDC Canal-JSON and the official Canal

The way that TiCDC implements the Canal-JSON data format, including the `Update` Event and the `mysqlType` field, differs from the official Canal. The following table shows the main differences.
Expand Down
9 changes: 2 additions & 7 deletions tidb-cloud/tidb-cloud-roadmap.md
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -144,15 +144,10 @@ For the roadmap of TiDB kernel, refer to [TiDB Roadmap](https://github.com/pingc
<td>✅ TiDB Cloud supports replicating data to Kafka/MySQL.</td>
<td>TiDB Cloud supports TiCDC-based data replication to Kafka and MySQL compatible databases.</td>
</tr>
<tr>
<td>Backup and Restore</td>
<td>✅ Support EBS snapshot-based backup and restore.</td>
<td>BR service on TiDB Cloud uses EBS snapshot-based backup and restore.</td>
</tr>
<tr>
<td>Backup and restore</td>
<td>Backup and restore service based on AWS EBS or GCP persistent disk snapshots.</td>
<td>Provide backup and restore service on the cloud based on AWS EBS or GCP persistent disk snapshots.</td>
<td>Backup and restore performance using optimized file based backup.</td>
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved
<td>Improve backup performance with an updated full and incremental file based backup for all cloud providers.</td>
</tr>
<tr>
<td rowspan="2">Online data migration</td>
Expand Down
Loading