Skip to content

Commit

Permalink
add readme
Browse files Browse the repository at this point in the history
Signed-off-by: Sidhant Kohli <[email protected]>
  • Loading branch information
kohlisid committed Jul 11, 2024
1 parent d289a0d commit 79d615c
Show file tree
Hide file tree
Showing 2 changed files with 71 additions and 0 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ pre-commit install
- [Map](https://github.com/numaproj/numaflow-python/tree/main/examples/map)
- [Reduce](https://github.com/numaproj/numaflow-python/tree/main/examples/reduce)
- [Map Stream](https://github.com/numaproj/numaflow-python/tree/main/examples/mapstream)
- [Batch Map](https://github.com/numaproj/numaflow-python/tree/main/examples/batchmap)
- [Implement User Defined Sinks](https://github.com/numaproj/numaflow-python/tree/main/examples/sink)
- [Implement User Defined SideInputs](https://github.com/numaproj/numaflow-python/tree/main/examples/sideinput)

Expand Down Expand Up @@ -111,6 +112,8 @@ These are the class names for the server types supported by each of the function
- ReduceAsyncServer
- MapStream
- MapStreamAsyncServer
- BatchMap
- BatchMapAsyncServer
- Source Transform
- SourceTransformServer
- SourceTransformMultiProcServer
Expand Down Expand Up @@ -147,6 +150,8 @@ The list of base handler classes for each of the functionalities is given below
- MapStreamer
- Source Transform
- SourceTransformer
- Batch Map
- BatchMapper
- UDSource
- Sourcer
- UDSink
Expand Down
66 changes: 66 additions & 0 deletions examples/batchmap/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## BatchMap Interface
The BatchMap interface allows developers to
process multiple data items together in a single UDF handler.


### What is BatchMap?
BatchMap is an interface that allows developers to process multiple data items
in a UDF single call, rather than each item in separate calls.


The BatchMap interface can be helpful in scenarios
where performing operations on a group of data can be more efficient.


### Understanding the User Interface
The BatchMap interface requires developers to implement a handler with a specific signature.
Here is the signature of the BatchMap handler:

```python
async def handler(datums: AsyncIterable[Datum]) -> BatchResponses:
```
The handler takes an iterable of `Datum` objects and returns
`BatchResponses`.
The `BatchResponses` object is a list of the *same length* as the input
datums, with each item corresponding to the response for one request datum.

To clarify, let's say we have three data items:

```json lines
data_1 = {"name": "John", "age": 25}
data_2 = {"name": "Jane", "age": 30}
data_3 = {"name": "Bob", "age": 45}
```

These data items will be grouped together by numaflow and
passed to the handler as an iterable:

```python
result = await handler([data_1, data_2, data_3])
```

The result will be a BatchResponses object, which is a list of responses corresponding to each input data item's processing.

### Important Considerations
When using BatchMap, there are a few important considerations to keep in mind:

- Ensure that the `BatchResponses` object is tagged with the *correct request ID*.
Each Datum has a unique ID tag, which will be used by Numaflow to ensure correctness.

```python
async for datum in datums:
batch_response = BatchResponse.new_batch_response(datum.id)
```


- Ensure that the length of the `BatchResponses`
list is equal to the number of requests received.
**This means that for every input data item**, there should be a corresponding
response in the BatchResponses list.

Use batch processing only when it makes sense. In some
scenarios, batch processing may not be the most
efficient approach, and processing data items one by one
could be a better option.
The burden of concurrent processing of the data will rely on the
UDF implementation in this use case.

0 comments on commit 79d615c

Please sign in to comment.