Skip to content

Commit

Permalink
docs: add a use case section (#961)
Browse files Browse the repository at this point in the history
Signed-off-by: Jason Zesheng Chen <[email protected]>
  • Loading branch information
jasonzeshengchen committed Aug 17, 2023
1 parent d2d6ced commit e4b5b73
Show file tree
Hide file tree
Showing 11 changed files with 55 additions and 17 deletions.
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ stream processing platforms.
- Event driven applications such as anomaly detection, monitoring, and alerting.
- Streaming applications such as data instrumentation and data movement.
- Workflows running in a streaming manner.
- [Learn more in our User Guide](./user-guide/use-cases/overview.md).

## Key Features

Expand All @@ -20,15 +21,14 @@ stream processing platforms.
- Exactly-Once semantics: No input element is duplicated or lost even as pods are rescheduled or restarted.
- Auto-scaling with back-pressure: Each vertex automatically scales from zero to whatever is needed.

## Data Integrity Guarantees:
## Data Integrity Guarantees

- Minimally provide at-least-once semantics
- Provide exactly-once semantics for unbounded and near real-time data sources
- Preserving order is not required

## Roadmap

- Multi-partitioned edges for higher throughput (v0.9)
- JOIN Vertex and Side Inputs feature (v0.10)
- User-defined Source (v0.11)

Expand Down
8 changes: 4 additions & 4 deletions docs/core-concepts/inter-step-buffer-service.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Inter-Step Buffer Service is the service to provide [Inter-Step Buffers](inter-step-buffer.md).

An Inter-Step Buffer Service is described by a [Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/), it is required to be existing in a namespace before Pipeline objects are created. A sample `InterStepBufferService` with JetStream implementation looks like below.
An Inter-Step Buffer Service is described by a [Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). It is required to be existing in a namespace before Pipeline objects are created. A sample `InterStepBufferService` with JetStream implementation looks like below.

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
Expand All @@ -14,7 +14,7 @@ spec:
version: latest # Do NOT use "latest" but a specific version in your real deployment
```
`InterStepBufferService` is a namespaced object, it can be used by all the Pipelines in the same namespace. By default, Pipeline objects look for an `InterStepBufferService` named `default`, so a common practice is to create an `InterStepBufferService` with the name `default`. If you give the `InterStepBufferService` a name other than `default`, then you need to give the same name in the Pipeline spec.
`InterStepBufferService` is a namespaced object. It can be used by all the Pipelines in the same namespace. By default, Pipeline objects look for an `InterStepBufferService` named `default`, so a common practice is to create an `InterStepBufferService` with the name `default`. If you give the `InterStepBufferService` a name other than `default`, then you need to give the same name in the Pipeline spec.

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
Expand Down Expand Up @@ -42,7 +42,7 @@ Property `spec.jetstream.version` is required for a JetStream `InterStepBufferSe

**Note**

The version `latest` in the ConfigMap should only be used for testing purpose, it's recommended to always use a fixed version in your real workload.
The version `latest` in the ConfigMap should only be used for testing purpose. It's recommended that you always use a fixed version in your real workload.

### Replicas

Expand Down Expand Up @@ -189,7 +189,7 @@ An optional property `spec.redis.native.replicas` (defaults to 3) can be specifi

### Persistence

Following example shows an `native` Redis `InterStepBufferService` with persistence.
The following example shows an `native` Redis `InterStepBufferService` with persistence.

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
Expand Down
4 changes: 2 additions & 2 deletions docs/core-concepts/inter-step-buffer.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Inter-Step Buffer

A `Pipeline` contains multiple vertices to ingest data from sources, processing data, and forward processed data to sinks. Vertices are not connected directly, but through Inter-Step Buffers.
A `Pipeline` contains multiple vertices that ingest data from sources, process data, and forward processed data to sinks. Vertices are not connected directly, but through Inter-Step Buffers.

Inter-Step Buffer can be implemented by a variety of data buffering technologies, those technologies should support:
Inter-Step Buffer can be implemented by a variety of data buffering technologies. Those technologies should support:

- Durability
- Offsets
Expand Down
2 changes: 1 addition & 1 deletion docs/core-concepts/pipeline.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Pipeline

The `Pipeline` is the most important concept in Numaflow, it represents a data processing job, it defines:
The `Pipeline` represents a data processing job. The most important concept in Numaflow, it defines:

1. A list of [vertices](vertex.md), which define the data processing tasks;
1. A list of `edges`, which are used to describe the relationship between the vertices. Note an edge may go from a vertex to multiple vertices, and as of v0.10, an edge may also go from multiple vertices to a vertex. This many-to-one relationship is possible via [Join and Cycles](../user-guide/reference/join-vertex.md)
Expand Down
4 changes: 2 additions & 2 deletions docs/core-concepts/vertex.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Vertex

The `Vertex` is also a key component of Numaflow `Pipeline` where the data processing happens. `Vertex` is defined as a list in the [pipeline](pipeline.md) spec, each representing a data processing task.
The `Vertex` is a key component of Numaflow `Pipeline` where the data processing happens. `Vertex` is defined as a list in the [pipeline](pipeline.md) spec, each representing a data processing task.

There are 3 types of `Vertex` in Numaflow today:

1. `Source` - To ingest data from sources.
1. `Sink` - To forward processed data to sinks.
1. `UDF` - User Defined Function, which is used to define data processing logic.

We have defined a [Kubernetes Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) defined for `Vertex`. A `Pipeline` containing multiple vertices will automatically generate multiple `Vertex` objects by the controller. As a user, you should NOT create a `Vertex` object directly.
We have defined a [Kubernetes Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for `Vertex`. A `Pipeline` containing multiple vertices will automatically generate multiple `Vertex` objects by the controller. As a user, you should NOT create a `Vertex` object directly.

In a `Pipeline`, the vertices are not connected directly, but through [Inter-Step Buffers](inter-step-buffer.md).

Expand Down
2 changes: 1 addition & 1 deletion docs/core-concepts/watermarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ will occur for on-time events at or before T.
Watermarks can be disabled with by setting `disabled: true`.

### maxDelay
Watermark assignments happen at source. Sources could be out of ordered, so sometimes we want to extend the
Watermark assignments happen at source. Sources could be out of order, so sometimes we want to extend the
window (default is `0s`) to wait before we start marking data as late-data.
You can give more time for the system to wait for late data with `maxDelay` so that the late data within the specified
time duration will be considered as data on-time. This means, the watermark propagation will be delayed by `maxDelay`.
Expand Down
4 changes: 3 additions & 1 deletion docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ kubectl delete -f https://raw.githubusercontent.com/numaproj/numaflow/main/examp

## A pipeline with reduce (aggregation)

To view an example pipeline with the [Reduce UDF](user-guide/user-defined-functions/reduce/reduce.md), see [Reduce Examples](user-guide/user-defined-functions/reduce/examples.md).
To set up an example pipeline with the [Reduce UDF](user-guide/user-defined-functions/reduce/reduce.md), see [Reduce Examples](user-guide/user-defined-functions/reduce/examples.md).

## What's Next

Expand All @@ -171,3 +171,5 @@ Try more examples in the [`examples`](https://github.com/numaproj/numaflow/tree/
After exploring how Numaflow pipelines run, you can check what data [Sources](./user-guide/sources/generator.md)
and [Sinks](./user-guide/sinks/kafka.md) Numaflow supports out of the box, or learn how to write
[User Defined Functions](user-guide/user-defined-functions/user-defined-functions.md).

Numaflow can also be paired with Numalogic, a collection of ML models and algorithms for real-time data analytics and AIOps including anomaly detection. Visit the [Numalogic homepage](https://numalogic.numaproj.io/) for more information.
7 changes: 3 additions & 4 deletions docs/specifications/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@

## Synopsis

- Numaflow allows developers with basic knowledge of Kubernetes but
without any special knowledge of data/stream processing to easily
create massively parallel data/stream processing jobs using a
programming language of their choice.
- Numaflow allows developers without any special knowledge of data/stream
processing to easily create massively parallel data/stream processing jobs
using a programming language of their choice, with just basic knowledge of Kubernetes.

- Reliable data processing is highly desirable and exactly-once
semantics is often required by many data processing applications.
Expand Down
21 changes: 21 additions & 0 deletions docs/user-guide/use-cases/monitoring-and-observability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Monitoring and Observability

## Docs

- [How Intuit platform engineers use Numaflow to compute golden signals](https://blog.numaproj.io/numaflow-letting-golden-signals-work-for-you-1bce18e472da).

## Videos

- Numaflow as the stream-processing solution in [Intuit’s Customer Centric Observability Journey Using AIOps](https://www.youtube.com/watch?v=D-eQxDBbx48)
- Using Numaflow for fast incident detection: [Argo CD Observability with AIOps - Detect Incident Fast](https://www.youtube.com/watch?v=_pRJ0_yzxNs)
- Implementing anomaly detection with Numaflow: [Cluster Golden Signals to Avoid Alert Fatigue at Scale](https://www.youtube.com/watch?v=e5TZE9e2KPo)

## Appendix: What is Monitoring and Observability?

Monitoring and observability are two critical concepts in software engineering that help developers ensure the health and performance of their applications.

Monitoring refers to the process of collecting and analyzing data about an application's performance. This data can include metrics such as CPU usage, memory usage, network traffic, and response times. Monitoring tools allow developers to track these metrics over time and set alerts when certain thresholds are exceeded. This enables them to quickly identify and respond to issues before they become critical.

Observability, on the other hand, is a more holistic approach to monitoring that focuses on understanding the internal workings of an application. Observability tools provide developers with deep insights into the behavior of their applications, allowing them to understand how different components interact with each other and how changes in one area can affect the overall system. This includes collecting data on things like logs, traces, and events, which can be used to reconstruct the state of the system at any given point in time.

Together, monitoring and observability provide developers with a comprehensive view of their applications' performance, enabling them to quickly identify and respond to issues as they arise. By leveraging these tools, software engineers can ensure that their applications are running smoothly and efficiently, delivering the best possible experience to their users.
13 changes: 13 additions & 0 deletions docs/user-guide/use-cases/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Overview

Numaflow allows developers without any special knowledge of data/stream processing to easily create massively parallel data/stream processing jobs using a programming language of their choice, with just basic knowledge of Kubernetes.

In this section, you'll find sample use cases for Numaflow and learn how to leverage its features for your stream processing tasks.

- Real-time data analytics applications.
- Event-driven applications: [anomaly detection and monitoring](./monitoring-and-observability.md).
- Streaming applications: data instrumentation and movement.
- Any workflows running in a streaming manner.


Numaflow is still a relatively new tool, and there are likely many other use cases that we haven't yet explored. We're committed to keeping this page up-to-date with the latest use cases and best practices for using Numaflow. We welcome contributions from the community and encourage you to share your own use cases and experiences with us. As we continue to develop and improve Numaflow, we look forward to seeing the cool things you build with it!
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,9 @@ nav:
- user-guide/reference/configuration/max-message-size.md
- user-guide/reference/kustomize/kustomize.md
- APIs.md
- Use Cases:
- user-guide/use-cases/overview.md
- user-guide/use-cases/monitoring-and-observability.md
- Operator Manual:
- Releases ⧉: "operations/releases.md"
- operations/installation.md
Expand Down

0 comments on commit e4b5b73

Please sign in to comment.