diff --git a/docs/README.md b/docs/README.md index ef42d3b903..f29a1cc9a1 100644 --- a/docs/README.md +++ b/docs/README.md @@ -12,6 +12,7 @@ stream processing platforms. - Event driven applications such as anomaly detection, monitoring, and alerting. - Streaming applications such as data instrumentation and data movement. - Workflows running in a streaming manner. +- [Learn more in our User Guide](./user-guide/use-cases/overview.md). ## Key Features @@ -20,7 +21,7 @@ stream processing platforms. - Exactly-Once semantics: No input element is duplicated or lost even as pods are rescheduled or restarted. - Auto-scaling with back-pressure: Each vertex automatically scales from zero to whatever is needed. -## Data Integrity Guarantees: +## Data Integrity Guarantees - Minimally provide at-least-once semantics - Provide exactly-once semantics for unbounded and near real-time data sources @@ -28,7 +29,6 @@ stream processing platforms. ## Roadmap -- Multi-partitioned edges for higher throughput (v0.9) - JOIN Vertex and Side Inputs feature (v0.10) - User-defined Source (v0.11) diff --git a/docs/core-concepts/inter-step-buffer-service.md b/docs/core-concepts/inter-step-buffer-service.md index b698066fdb..2f4bf966a7 100644 --- a/docs/core-concepts/inter-step-buffer-service.md +++ b/docs/core-concepts/inter-step-buffer-service.md @@ -2,7 +2,7 @@ Inter-Step Buffer Service is the service to provide [Inter-Step Buffers](inter-step-buffer.md). -An Inter-Step Buffer Service is described by a [Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/), it is required to be existing in a namespace before Pipeline objects are created. A sample `InterStepBufferService` with JetStream implementation looks like below. +An Inter-Step Buffer Service is described by a [Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). It is required to be existing in a namespace before Pipeline objects are created. A sample `InterStepBufferService` with JetStream implementation looks like below. ```yaml apiVersion: numaflow.numaproj.io/v1alpha1 @@ -14,7 +14,7 @@ spec: version: latest # Do NOT use "latest" but a specific version in your real deployment ``` -`InterStepBufferService` is a namespaced object, it can be used by all the Pipelines in the same namespace. By default, Pipeline objects look for an `InterStepBufferService` named `default`, so a common practice is to create an `InterStepBufferService` with the name `default`. If you give the `InterStepBufferService` a name other than `default`, then you need to give the same name in the Pipeline spec. +`InterStepBufferService` is a namespaced object. It can be used by all the Pipelines in the same namespace. By default, Pipeline objects look for an `InterStepBufferService` named `default`, so a common practice is to create an `InterStepBufferService` with the name `default`. If you give the `InterStepBufferService` a name other than `default`, then you need to give the same name in the Pipeline spec. ```yaml apiVersion: numaflow.numaproj.io/v1alpha1 @@ -42,7 +42,7 @@ Property `spec.jetstream.version` is required for a JetStream `InterStepBufferSe **Note** -The version `latest` in the ConfigMap should only be used for testing purpose, it's recommended to always use a fixed version in your real workload. +The version `latest` in the ConfigMap should only be used for testing purpose. It's recommended that you always use a fixed version in your real workload. ### Replicas @@ -189,7 +189,7 @@ An optional property `spec.redis.native.replicas` (defaults to 3) can be specifi ### Persistence -Following example shows an `native` Redis `InterStepBufferService` with persistence. +The following example shows an `native` Redis `InterStepBufferService` with persistence. ```yaml apiVersion: numaflow.numaproj.io/v1alpha1 diff --git a/docs/core-concepts/inter-step-buffer.md b/docs/core-concepts/inter-step-buffer.md index 8b0b8710f9..9e1992b976 100644 --- a/docs/core-concepts/inter-step-buffer.md +++ b/docs/core-concepts/inter-step-buffer.md @@ -1,8 +1,8 @@ # Inter-Step Buffer -A `Pipeline` contains multiple vertices to ingest data from sources, processing data, and forward processed data to sinks. Vertices are not connected directly, but through Inter-Step Buffers. +A `Pipeline` contains multiple vertices that ingest data from sources, process data, and forward processed data to sinks. Vertices are not connected directly, but through Inter-Step Buffers. -Inter-Step Buffer can be implemented by a variety of data buffering technologies, those technologies should support: +Inter-Step Buffer can be implemented by a variety of data buffering technologies. Those technologies should support: - Durability - Offsets diff --git a/docs/core-concepts/pipeline.md b/docs/core-concepts/pipeline.md index 7375d9a3ca..5061fc9493 100644 --- a/docs/core-concepts/pipeline.md +++ b/docs/core-concepts/pipeline.md @@ -1,6 +1,6 @@ # Pipeline -The `Pipeline` is the most important concept in Numaflow, it represents a data processing job, it defines: +The `Pipeline` represents a data processing job. The most important concept in Numaflow, it defines: 1. A list of [vertices](vertex.md), which define the data processing tasks; 1. A list of `edges`, which are used to describe the relationship between the vertices. Note an edge may go from a vertex to multiple vertices, and as of v0.10, an edge may also go from multiple vertices to a vertex. This many-to-one relationship is possible via [Join and Cycles](../user-guide/reference/join-vertex.md) diff --git a/docs/core-concepts/vertex.md b/docs/core-concepts/vertex.md index 7e448caa1a..fe5ae85745 100644 --- a/docs/core-concepts/vertex.md +++ b/docs/core-concepts/vertex.md @@ -1,6 +1,6 @@ # Vertex -The `Vertex` is also a key component of Numaflow `Pipeline` where the data processing happens. `Vertex` is defined as a list in the [pipeline](pipeline.md) spec, each representing a data processing task. +The `Vertex` is a key component of Numaflow `Pipeline` where the data processing happens. `Vertex` is defined as a list in the [pipeline](pipeline.md) spec, each representing a data processing task. There are 3 types of `Vertex` in Numaflow today: @@ -8,7 +8,7 @@ There are 3 types of `Vertex` in Numaflow today: 1. `Sink` - To forward processed data to sinks. 1. `UDF` - User Defined Function, which is used to define data processing logic. -We have defined a [Kubernetes Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) defined for `Vertex`. A `Pipeline` containing multiple vertices will automatically generate multiple `Vertex` objects by the controller. As a user, you should NOT create a `Vertex` object directly. +We have defined a [Kubernetes Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for `Vertex`. A `Pipeline` containing multiple vertices will automatically generate multiple `Vertex` objects by the controller. As a user, you should NOT create a `Vertex` object directly. In a `Pipeline`, the vertices are not connected directly, but through [Inter-Step Buffers](inter-step-buffer.md). diff --git a/docs/core-concepts/watermarks.md b/docs/core-concepts/watermarks.md index e4a7e5247b..d6c5a446ff 100644 --- a/docs/core-concepts/watermarks.md +++ b/docs/core-concepts/watermarks.md @@ -16,7 +16,7 @@ will occur for on-time events at or before T. Watermarks can be disabled with by setting `disabled: true`. ### maxDelay -Watermark assignments happen at source. Sources could be out of ordered, so sometimes we want to extend the +Watermark assignments happen at source. Sources could be out of order, so sometimes we want to extend the window (default is `0s`) to wait before we start marking data as late-data. You can give more time for the system to wait for late data with `maxDelay` so that the late data within the specified time duration will be considered as data on-time. This means, the watermark propagation will be delayed by `maxDelay`. diff --git a/docs/quick-start.md b/docs/quick-start.md index c98bae4c03..02d9463260 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -162,7 +162,7 @@ kubectl delete -f https://raw.githubusercontent.com/numaproj/numaflow/main/examp ## A pipeline with reduce (aggregation) -To view an example pipeline with the [Reduce UDF](user-guide/user-defined-functions/reduce/reduce.md), see [Reduce Examples](user-guide/user-defined-functions/reduce/examples.md). +To set up an example pipeline with the [Reduce UDF](user-guide/user-defined-functions/reduce/reduce.md), see [Reduce Examples](user-guide/user-defined-functions/reduce/examples.md). ## What's Next @@ -171,3 +171,5 @@ Try more examples in the [`examples`](https://github.com/numaproj/numaflow/tree/ After exploring how Numaflow pipelines run, you can check what data [Sources](./user-guide/sources/generator.md) and [Sinks](./user-guide/sinks/kafka.md) Numaflow supports out of the box, or learn how to write [User Defined Functions](user-guide/user-defined-functions/user-defined-functions.md). + +Numaflow can also be paired with Numalogic, a collection of ML models and algorithms for real-time data analytics and AIOps including anomaly detection. Visit the [Numalogic homepage](https://numalogic.numaproj.io/) for more information. \ No newline at end of file diff --git a/docs/specifications/overview.md b/docs/specifications/overview.md index 0ab7200462..1c32a2e0a3 100644 --- a/docs/specifications/overview.md +++ b/docs/specifications/overview.md @@ -2,10 +2,9 @@ ## Synopsis -- Numaflow allows developers with basic knowledge of Kubernetes but - without any special knowledge of data/stream processing to easily - create massively parallel data/stream processing jobs using a - programming language of their choice. +- Numaflow allows developers without any special knowledge of data/stream + processing to easily create massively parallel data/stream processing jobs + using a programming language of their choice, with just basic knowledge of Kubernetes. - Reliable data processing is highly desirable and exactly-once semantics is often required by many data processing applications. diff --git a/docs/user-guide/use-cases/monitoring-and-observability.md b/docs/user-guide/use-cases/monitoring-and-observability.md new file mode 100644 index 0000000000..005de7fecd --- /dev/null +++ b/docs/user-guide/use-cases/monitoring-and-observability.md @@ -0,0 +1,21 @@ +# Monitoring and Observability + +## Docs + +- [How Intuit platform engineers use Numaflow to compute golden signals](https://blog.numaproj.io/numaflow-letting-golden-signals-work-for-you-1bce18e472da). + +## Videos + +- Numaflow as the stream-processing solution in [Intuit’s Customer Centric Observability Journey Using AIOps](https://www.youtube.com/watch?v=D-eQxDBbx48) +- Using Numaflow for fast incident detection: [Argo CD Observability with AIOps - Detect Incident Fast](https://www.youtube.com/watch?v=_pRJ0_yzxNs) +- Implementing anomaly detection with Numaflow: [Cluster Golden Signals to Avoid Alert Fatigue at Scale](https://www.youtube.com/watch?v=e5TZE9e2KPo) + +## Appendix: What is Monitoring and Observability? + +Monitoring and observability are two critical concepts in software engineering that help developers ensure the health and performance of their applications. + +Monitoring refers to the process of collecting and analyzing data about an application's performance. This data can include metrics such as CPU usage, memory usage, network traffic, and response times. Monitoring tools allow developers to track these metrics over time and set alerts when certain thresholds are exceeded. This enables them to quickly identify and respond to issues before they become critical. + +Observability, on the other hand, is a more holistic approach to monitoring that focuses on understanding the internal workings of an application. Observability tools provide developers with deep insights into the behavior of their applications, allowing them to understand how different components interact with each other and how changes in one area can affect the overall system. This includes collecting data on things like logs, traces, and events, which can be used to reconstruct the state of the system at any given point in time. + +Together, monitoring and observability provide developers with a comprehensive view of their applications' performance, enabling them to quickly identify and respond to issues as they arise. By leveraging these tools, software engineers can ensure that their applications are running smoothly and efficiently, delivering the best possible experience to their users. \ No newline at end of file diff --git a/docs/user-guide/use-cases/overview.md b/docs/user-guide/use-cases/overview.md new file mode 100644 index 0000000000..84c33b587c --- /dev/null +++ b/docs/user-guide/use-cases/overview.md @@ -0,0 +1,13 @@ +# Overview + +Numaflow allows developers without any special knowledge of data/stream processing to easily create massively parallel data/stream processing jobs using a programming language of their choice, with just basic knowledge of Kubernetes. + +In this section, you'll find sample use cases for Numaflow and learn how to leverage its features for your stream processing tasks. + +- Real-time data analytics applications. +- Event-driven applications: [anomaly detection and monitoring](./monitoring-and-observability.md). +- Streaming applications: data instrumentation and movement. +- Any workflows running in a streaming manner. + + +Numaflow is still a relatively new tool, and there are likely many other use cases that we haven't yet explored. We're committed to keeping this page up-to-date with the latest use cases and best practices for using Numaflow. We welcome contributions from the community and encourage you to share your own use cases and experiences with us. As we continue to develop and improve Numaflow, we look forward to seeing the cool things you build with it! \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index a5080c943f..5f5b3556da 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -97,6 +97,9 @@ nav: - user-guide/reference/configuration/max-message-size.md - user-guide/reference/kustomize/kustomize.md - APIs.md + - Use Cases: + - user-guide/use-cases/overview.md + - user-guide/use-cases/monitoring-and-observability.md - Operator Manual: - Releases ⧉: "operations/releases.md" - operations/installation.md