Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting binary encoding used with Canal-JSON #13832

Merged
merged 22 commits into from
Jul 18, 2023
Merged
Changes from 8 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
ade774c
Documented encoding of binary data in Canal-JSON
benmeadowcroft Jun 9, 2023
ae48cd3
Merge branch 'pingcap:master' into master
benmeadowcroft Jun 9, 2023
578ec5e
Fixed lint issue of multiple blank lines
benmeadowcroft Jun 9, 2023
24e150e
Merge branch 'pingcap:master' into master
benmeadowcroft Jun 9, 2023
4430560
Merge branch 'master' of https://github.com/benmeadowcroft/tidb-docs
benmeadowcroft Jun 9, 2023
a81c88e
Added brief example of the encoding
benmeadowcroft Jun 12, 2023
d4e96f5
Corrected example encoding
benmeadowcroft Jun 19, 2023
2aef099
Merge branch 'pingcap:master' into master
benmeadowcroft Jun 20, 2023
4c60e75
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
d7c2da4
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
8580fb3
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
9b2b5c8
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
171172b
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
e102f8e
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
51ea41a
Update ticdc/ticdc-canal-json.md
benmeadowcroft Jun 30, 2023
9ab965c
Merge branch 'pingcap:master' into master
benmeadowcroft Jun 30, 2023
77e6e89
Update to Backup & Recovery roadmap
benmeadowcroft Jun 30, 2023
3a98182
format updates
qiancai Jul 10, 2023
227fd19
Revert "Update to Backup & Recovery roadmap"
benmeadowcroft Jul 11, 2023
55e00ed
Merge branch 'master' of https://github.com/benmeadowcroft/tidb-docs
benmeadowcroft Jul 11, 2023
90f3244
Addressing review comments
benmeadowcroft Jul 11, 2023
3797083
Merge remote-tracking branch 'upstream/master'
benmeadowcroft Jul 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions ticdc/ticdc-canal-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,40 @@ The following table shows the mapping relationships between Java SQL Types in Ti

For more information about Java SQL Types, see [Java SQL Class Types](https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html).

## Binary and Blob types
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved

TiCDC encodes [binary types](/data-type-string.md##binary-type) in the Canal-JSON format by converting each byte to its character representation using the ISO/IEC 8859-1 character encodings. Non-printable characters are represented using their UTF-8 escape sequence. Certain characters with special meaning in HTML are also escaped using their UTF-8 escape sequence. See the table below for details.
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved

| Description | Value range | String representation |
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved
| :---------------------------| :-----------| :---------------------|
| Control characters | [0, 31] | UTF-8 escape, e.g. '\u0000' through '\u001F' |
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved
| Horizontal tab | [9] | \t |
| Line feed | [10] | \n |
| Cariage return | [13] | \r |
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved
| Printable characters | [32, 127] | Literal character (e.g. 'A') |
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved
| Ampersand | [38] | \u0026 |
| Less-than sign | [60] | \u0038 |
| Greater-than sign | [62] | \u003E |
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved
| Extended control characters | [128, 159] | Literal character |
| ISO 8859-1 (Latin-1) | [160, 255] | Literal character |

### Example of the encoding

To illustrate, the following 16 bytes `[5 7 10 15 36 50 43 99 120 60 38 255 254 45 55 70]` stored in a `VARBINARY` called `c_varbinary` would be encoded in a Canal-JSON `Update` Event as:
benmeadowcroft marked this conversation as resolved.
Show resolved Hide resolved

```json
{
...
"data": [
{
...
"c_varbinary": "\u0005\u0007\n\u000f$2+cx\u003c\u0026ÿþ-7F"
}
]
...
}
```

## Comparison of TiCDC Canal-JSON and the official Canal

The way that TiCDC implements the Canal-JSON data format, including the `Update` Event and the `mysqlType` field, differs from the official Canal. The following table shows the main differences.
Expand Down