docs: add a use case section (#961)

Signed-off-by: Jason Zesheng Chen <[email protected]>
numaproj · Aug 17, 2023 · e4b5b73 · e4b5b73
1 parent d2d6ced
commit e4b5b73
Show file tree

Hide file tree

Showing 11 changed files with 55 additions and 17 deletions.
diff --git a/docs/README.md b/docs/README.md
@@ -12,6 +12,7 @@ stream processing platforms.
 - Event driven applications such as anomaly detection, monitoring, and alerting.
 - Streaming applications such as data instrumentation and data movement.
 - Workflows running in a streaming manner.
+- [Learn more in our User Guide](./user-guide/use-cases/overview.md).
 
 ## Key Features
 
@@ -20,15 +21,14 @@ stream processing platforms.
 - Exactly-Once semantics: No input element is duplicated or lost even as pods are rescheduled or restarted.
 - Auto-scaling with back-pressure: Each vertex automatically scales from zero to whatever is needed.
 
-## Data Integrity Guarantees:
+## Data Integrity Guarantees
 
 - Minimally provide at-least-once semantics
 - Provide exactly-once semantics for unbounded and near real-time data sources
 - Preserving order is not required
 
 ## Roadmap
 
-- Multi-partitioned edges for higher throughput (v0.9)
 - JOIN Vertex and Side Inputs feature (v0.10)
 - User-defined Source (v0.11)
 

diff --git a/docs/core-concepts/inter-step-buffer-service.md b/docs/core-concepts/inter-step-buffer-service.md
@@ -2,7 +2,7 @@
 
 Inter-Step Buffer Service is the service to provide [Inter-Step Buffers](inter-step-buffer.md).
 
-An Inter-Step Buffer Service is described by a [Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/), it is required to be existing in a namespace before Pipeline objects are created. A sample `InterStepBufferService` with JetStream implementation looks like below.
+An Inter-Step Buffer Service is described by a [Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). It is required to be existing in a namespace before Pipeline objects are created. A sample `InterStepBufferService` with JetStream implementation looks like below.
 
 ```yaml
 apiVersion: numaflow.numaproj.io/v1alpha1
@@ -14,7 +14,7 @@ spec:
     version: latest # Do NOT use "latest" but a specific version in your real deployment
 ```
 
-`InterStepBufferService` is a namespaced object, it can be used by all the Pipelines in the same namespace. By default, Pipeline objects look for an `InterStepBufferService` named `default`, so a common practice is to create an `InterStepBufferService` with the name `default`. If you give the `InterStepBufferService` a name other than `default`, then you need to give the same name in the Pipeline spec.
+`InterStepBufferService` is a namespaced object. It can be used by all the Pipelines in the same namespace. By default, Pipeline objects look for an `InterStepBufferService` named `default`, so a common practice is to create an `InterStepBufferService` with the name `default`. If you give the `InterStepBufferService` a name other than `default`, then you need to give the same name in the Pipeline spec.
 
 ```yaml
 apiVersion: numaflow.numaproj.io/v1alpha1
@@ -42,7 +42,7 @@ Property `spec.jetstream.version` is required for a JetStream `InterStepBufferSe
 
 **Note**
 
-The version `latest` in the ConfigMap should only be used for testing purpose, it's recommended to always use a fixed version in your real workload.
+The version `latest` in the ConfigMap should only be used for testing purpose. It's recommended that you always use a fixed version in your real workload.
 
 ### Replicas
 
@@ -189,7 +189,7 @@ An optional property `spec.redis.native.replicas` (defaults to 3) can be specifi
 
 ### Persistence
 
-Following example shows an `native` Redis `InterStepBufferService` with persistence.
+The following example shows an `native` Redis `InterStepBufferService` with persistence.
 
 ```yaml
 apiVersion: numaflow.numaproj.io/v1alpha1

diff --git a/docs/core-concepts/inter-step-buffer.md b/docs/core-concepts/inter-step-buffer.md
@@ -1,8 +1,8 @@
 # Inter-Step Buffer
 
-A `Pipeline` contains multiple vertices to ingest data from sources, processing data, and forward processed data to sinks. Vertices are not connected directly, but through Inter-Step Buffers.
+A `Pipeline` contains multiple vertices that ingest data from sources, process data, and forward processed data to sinks. Vertices are not connected directly, but through Inter-Step Buffers.
 
-Inter-Step Buffer can be implemented by a variety of data buffering technologies, those technologies should support:
+Inter-Step Buffer can be implemented by a variety of data buffering technologies. Those technologies should support:
 
 - Durability
 - Offsets

diff --git a/docs/core-concepts/pipeline.md b/docs/core-concepts/pipeline.md
@@ -1,6 +1,6 @@
 # Pipeline
 
-The `Pipeline` is the most important concept in Numaflow, it represents a data processing job, it defines:
+The `Pipeline` represents a data processing job. The most important concept in Numaflow, it defines:
 
 1. A list of [vertices](vertex.md), which define the data processing tasks;
 1. A list of `edges`, which are used to describe the relationship between the vertices. Note an edge may go from a vertex to multiple vertices, and as of v0.10, an edge may also go from multiple vertices to a vertex. This many-to-one relationship is possible via [Join and Cycles](../user-guide/reference/join-vertex.md)

diff --git a/docs/core-concepts/vertex.md b/docs/core-concepts/vertex.md
@@ -1,14 +1,14 @@
 # Vertex
 
-The `Vertex` is also a key component of Numaflow `Pipeline` where the data processing happens. `Vertex` is defined as a list in the [pipeline](pipeline.md) spec, each representing a data processing task.
+The `Vertex` is a key component of Numaflow `Pipeline` where the data processing happens. `Vertex` is defined as a list in the [pipeline](pipeline.md) spec, each representing a data processing task.
 
 There are 3 types of `Vertex` in Numaflow today:
 
 1. `Source` - To ingest data from sources.
 1. `Sink` - To forward processed data to sinks.
 1. `UDF` - User Defined Function, which is used to define data processing logic.
 
-We have defined a [Kubernetes Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) defined for `Vertex`. A `Pipeline` containing multiple vertices will automatically generate multiple `Vertex` objects by the controller. As a user, you should NOT create a `Vertex` object directly.
+We have defined a [Kubernetes Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for `Vertex`. A `Pipeline` containing multiple vertices will automatically generate multiple `Vertex` objects by the controller. As a user, you should NOT create a `Vertex` object directly.
 
 In a `Pipeline`, the vertices are not connected directly, but through [Inter-Step Buffers](inter-step-buffer.md).
 

diff --git a/docs/core-concepts/watermarks.md b/docs/core-concepts/watermarks.md
@@ -16,7 +16,7 @@ will occur for on-time events at or before T.
 Watermarks can be disabled with by setting `disabled: true`. 
 
 ### maxDelay
-Watermark assignments happen at source. Sources could be out of ordered, so sometimes we want to extend the
+Watermark assignments happen at source. Sources could be out of order, so sometimes we want to extend the
 window (default is `0s`) to wait before we start marking data as late-data.
 You can give more time for the system to wait for late data with `maxDelay` so that the late data within the specified
 time duration will be considered as data on-time. This means, the watermark propagation will be delayed by `maxDelay`.

diff --git a/docs/quick-start.md b/docs/quick-start.md
@@ -162,7 +162,7 @@ kubectl delete -f https://raw.githubusercontent.com/numaproj/numaflow/main/examp
 
 ## A pipeline with reduce (aggregation)
 
-To view an example pipeline with the [Reduce UDF](user-guide/user-defined-functions/reduce/reduce.md), see [Reduce Examples](user-guide/user-defined-functions/reduce/examples.md).
+To set up an example pipeline with the [Reduce UDF](user-guide/user-defined-functions/reduce/reduce.md), see [Reduce Examples](user-guide/user-defined-functions/reduce/examples.md).
 
 ## What's Next
 
@@ -171,3 +171,5 @@ Try more examples in the [`examples`](https://github.com/numaproj/numaflow/tree/
 After exploring how Numaflow pipelines run, you can check what data [Sources](./user-guide/sources/generator.md) 
 and [Sinks](./user-guide/sinks/kafka.md) Numaflow supports out of the box, or learn how to write 
 [User Defined Functions](user-guide/user-defined-functions/user-defined-functions.md).
+
+Numaflow can also be paired with Numalogic, a collection of ML models and algorithms for real-time data analytics and AIOps including anomaly detection. Visit the [Numalogic homepage](https://numalogic.numaproj.io/) for more information.
diff --git a/docs/specifications/overview.md b/docs/specifications/overview.md
@@ -2,10 +2,9 @@
 
 ## Synopsis
 
-- Numaflow allows developers with basic knowledge of Kubernetes but
-  without any special knowledge of data/stream processing to easily
-  create massively parallel data/stream processing jobs using a
-  programming language of their choice.
+- Numaflow allows developers without any special knowledge of data/stream
+ processing to easily create massively parallel data/stream processing jobs 
+ using a programming language of their choice, with just basic knowledge of Kubernetes.
 
 - Reliable data processing is highly desirable and exactly-once
   semantics is often required by many data processing applications.

diff --git a/docs/user-guide/use-cases/monitoring-and-observability.md b/docs/user-guide/use-cases/monitoring-and-observability.md
@@ -0,0 +1,21 @@
+# Monitoring and Observability
+
+## Docs
+
+- [How Intuit platform engineers use Numaflow to compute golden signals](https://blog.numaproj.io/numaflow-letting-golden-signals-work-for-you-1bce18e472da).
+
+## Videos
+
+- Numaflow as the stream-processing solution in [Intuit’s Customer Centric Observability Journey Using AIOps](https://www.youtube.com/watch?v=D-eQxDBbx48)
+- Using Numaflow for fast incident detection: [Argo CD Observability with AIOps - Detect Incident Fast](https://www.youtube.com/watch?v=_pRJ0_yzxNs)
+- Implementing anomaly detection with Numaflow: [Cluster Golden Signals to Avoid Alert Fatigue at Scale](https://www.youtube.com/watch?v=e5TZE9e2KPo)
+
+## Appendix: What is Monitoring and Observability?
+
+Monitoring and observability are two critical concepts in software engineering that help developers ensure the health and performance of their applications.
+
+Monitoring refers to the process of collecting and analyzing data about an application's performance. This data can include metrics such as CPU usage, memory usage, network traffic, and response times. Monitoring tools allow developers to track these metrics over time and set alerts when certain thresholds are exceeded. This enables them to quickly identify and respond to issues before they become critical.
+
+Observability, on the other hand, is a more holistic approach to monitoring that focuses on understanding the internal workings of an application. Observability tools provide developers with deep insights into the behavior of their applications, allowing them to understand how different components interact with each other and how changes in one area can affect the overall system. This includes collecting data on things like logs, traces, and events, which can be used to reconstruct the state of the system at any given point in time.
+
+Together, monitoring and observability provide developers with a comprehensive view of their applications' performance, enabling them to quickly identify and respond to issues as they arise. By leveraging these tools, software engineers can ensure that their applications are running smoothly and efficiently, delivering the best possible experience to their users.
diff --git a/docs/user-guide/use-cases/overview.md b/docs/user-guide/use-cases/overview.md
@@ -0,0 +1,13 @@
+# Overview
+
+Numaflow allows developers without any special knowledge of data/stream processing to easily create massively parallel data/stream processing jobs using a programming language of their choice, with just basic knowledge of Kubernetes.
+
+In this section, you'll find sample use cases for Numaflow and learn how to leverage its features for your stream processing tasks. 
+
+- Real-time data analytics applications.  
+- Event-driven applications: [anomaly detection and monitoring](./monitoring-and-observability.md).  
+- Streaming applications: data instrumentation and movement.  
+- Any workflows running in a streaming manner.   
+
+
+Numaflow is still a relatively new tool, and there are likely many other use cases that we haven't yet explored. We're committed to keeping this page up-to-date with the latest use cases and best practices for using Numaflow. We welcome contributions from the community and encourage you to share your own use cases and experiences with us. As we continue to develop and improve Numaflow, we look forward to seeing the cool things you build with it!
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -97,6 +97,9 @@ nav:
             - user-guide/reference/configuration/max-message-size.md
           - user-guide/reference/kustomize/kustomize.md
           - APIs.md
+      - Use Cases:
+          - user-guide/use-cases/overview.md
+          - user-guide/use-cases/monitoring-and-observability.md
   - Operator Manual:
       - Releases ⧉: "operations/releases.md"
       - operations/installation.md