Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: kill-switch when buffer is full #1034

Merged
merged 1 commit into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions docs/user-guide/reference/edge-tuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Edge Tuning

## Drop message onFull

We need to have an edge level setting to drop the messages if the `buffer.isFull == true`. Even if the UDF or UDSink drops
a message due to some internal error in the user-defined code, the processing latency will spike up causing a natural
back pressure. A kill switch to drop messages can help alleviate/avoid any repercussions on the rest of the DAG.

This setting is an edge-level setting and can be enabled by `onFull` and the default is `retryUntilSuccess` (other option
is `discardLatest`).

This is a **data loss scenario** but can be useful in cases where we are doing user-introduced experimentations,
like A/B testing, on the pipeline. It is totally okay for the experimentation side of the DAG to have data loss while
the production is unaffected.

### discardLatest

Setting `onFull` to `discardLatest` will drop the message on the floor if the edge is full.

```yaml
edges:
- from: a
to: b
onFull: discardLatest
```

### retryUntilSuccess

The default setting for `onFull` in `retryUntilSuccess` which will make sure the message is retried until successful.

```yaml
edges:
- from: a
to: b
onFull: retryUntilSuccess
```
4 changes: 3 additions & 1 deletion docs/user-guide/reference/pipeline-tuning.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Pipeline Tuning

For a data processing pipeline, each vertex keeps running the cycle of reading data from an Inter-Step Buffer (or data source), processing the data, and writing to next Inter-Step Buffers (or sinks). It is possible to make some tuning for this data processing cycle.
For a data processing pipeline, each vertex keeps running the cycle of reading data from an Inter-Step Buffer (or data source),
processing the data, and writing to next Inter-Step Buffers (or sinks). It is possible to make some tuning for this data
processing cycle.

- `readBatchSize` - How many messages to read for each cycle, defaults to `500`.
- `bufferMaxLength` - How many unprocessed messages can be existing in the Inter-Step Buffer, defaults to `30000`.
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ nav:
- Examples: "user-guide/user-defined-functions/reduce/examples.md"
- Reference:
- user-guide/reference/pipeline-tuning.md
- user-guide/reference/edge-tuning.md
- user-guide/reference/autoscaling.md
- user-guide/reference/conditional-forwarding.md
- user-guide/reference/join-vertex.md
Expand Down
Loading