Skip to content

Commit

Permalink
Merge pull request #34 from interTwin-eu/dev-slangarita
Browse files Browse the repository at this point in the history
documentation of Component, Connections and Generics
  • Loading branch information
SergioLangaritaBenitez authored Oct 30, 2024
2 parents 32b18d0 + 225f57c commit 93877ed
Show file tree
Hide file tree
Showing 14 changed files with 8,407 additions and 8 deletions.
3 changes: 3 additions & 0 deletions docpage/docs/03.- Sources/AWS/S3.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ The S3 Source requires:

Here is an example of the configuration file. Check the documentation of [AWS credentials](/dcnios/docs/Sources/AWS/) to define the Access Key and Secret Key.

S3 Source consists of the following component:
- GetSQS

```
GetS3:
- name: S3
Expand Down
3 changes: 3 additions & 0 deletions docpage/docs/03.- Sources/AWS/SQS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ SQS Source consumes from an AWS SQS queue. It creates an SQS in creation time, r
Here is an example of the configuration file. Check the documentation of [AWS credentials](/dcnios/docs/Sources/AWS/) to define the Access Key and Secret Key.


SQS Source consists of the following component:
- GetSQS

```
SQS:
- name: sqs
Expand Down
4 changes: 4 additions & 0 deletions docpage/docs/03.- Sources/Kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ The Kafka Source allows us to consume a Kafka topic. It requires this informatio

An SSL connection between NiFi and Kafka is necessary. A PKCS12 certificate and the certificate's password must be provided.


Kafka Source consists of the following component:
- ConsumeKafka_2_6

```
Kafka:
- name: kafka
Expand Down
2 changes: 1 addition & 1 deletion docpage/docs/03.- Sources/dcache.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dCache is a Source that listens into a dCache instance. The following values mus
- Folder of dCache where keeps an active listening.Required.
- Statefile is the file that will store the state. Please, do not employ `dcache` as its name, as it may cause problems. Required.

The dCache Source only works when the NiFi cluster is deployed with the image `ghcr.io/grycap/nifi-sse:latest`, is composed of:
The dCache Source only works when the NiFi cluster is deployed with the image `ghcr.io/grycap/nifi-sse:latest`, consists of the following component:
- ExecuteProcess
- GetFile

Expand Down
2 changes: 1 addition & 1 deletion docpage/docs/04.- Destinations/OSCAR.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The OSCAR Destination invokes an OSCAR service asynchronously:
- Token or user/password. User/password or token. The user/password has priority over the token. Please, do not edit the OSCAR services. Required.


Destination is composed of this component:
OSCAR Destination consists of the following component:
- InvokeOSCAR


Expand Down
7 changes: 6 additions & 1 deletion docpage/docs/05.- Alterations/Decode.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,9 @@ Here is the YAML example.
alterations:
- action: Decode
Encoding: base64
```
```

Decode Alteration consists of the following component:
-EncodeContent

In this case, DCNiOS uses the same file as the Encode ProcessGroup; it simply changes the configuration.
6 changes: 5 additions & 1 deletion docpage/docs/05.- Alterations/Encode.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,8 @@ Here is the YAML example.
alterations:
- action: Encode
Encoding: base64
```
```


Encode Alteration consists of the following component:
- EncodeContent
5 changes: 4 additions & 1 deletion docpage/docs/05.- Alterations/Merge.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,7 @@ alterations:
- action: Merge
maxMessages: 10
windowSeconds: 2
```
```

Merge Alteration consists of the following component:
- MergeContent
34 changes: 34 additions & 0 deletions docpage/docs/Component.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
sidebar_position: 5
---
# Component


The component subsection is used in all Elements like Kafka, OSCAR, and the Generic one. A component alters the workflow's operation by employing Apache NiFi Processors. The Processor's name, execution time and the node on which it runs (ALL or PRIMARY) must be indicated.


## Time Execution

Time execution in Apache NiFi refers to the duration between executions of a Processor within a workflow. This interval determines how often a Processor runs and is crucial for managing resource utilization.

## Node Options

When a Processor is set to run on the ALL node option, it executes on all available nodes in the NiFi cluster. This helps distribute the workload evenly, enhancing parallel processing and improving throughput.

Choosing the PRIMARY node option means the Processor will run only on the designated primary node. This is useful for limiting resource use or maintaining specific configurations that shouldn’t be duplicated across nodes.



```
- name: dcache
endpoint: <dcache-endpoint>
user: <dcache-user>
password: <dcache-password>
folder: <input-folder>
statefile: <file-that-save-state>
components:
- name: GetFile
seconds: 2
node: (ALL | PRIMARY)
```
19 changes: 19 additions & 0 deletions docpage/docs/Connections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
sidebar_position: 7
---

# Connections

The Connections section defines the links between Sources and Destinations. It is declared at the same level as Source and Destination, and you have to use the identifier name of the Element to create the connection. The use of [Alterations](/docs/Alterations) does not affect the connection between the elements; the connection is made transparently.

Connections play a crucial role in managing the flow of data between different elements of the workflow, ensuring the order and integrity of the data. By utilizing Connections, workflows can be designed in a modular fashion, allowing for easy modifications, such as adding or removing components, without disrupting the overall flow.

```
OSCAR:
- name: OSCAROutput
Kafka:
- name: kafkaInput
connection:
- from: kafkaInput
to: OSCAROutput
```
41 changes: 41 additions & 0 deletions docpage/docs/Generic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
sidebar_position: 8
---

# Generic Element


In this document, we focus on the concept of Generic Elements within NiFi, specifically how to deploy custom workflows using already created ProcessGroup files. Understanding these elements is essential for efficiently managing and automating data flows in Apache NiFi.


The generic section creates a custom workflow by providing a ProcessGroup file (.json). The purpose of this element could be Source, Destination, Alteration, or even a complete data flow. The author of the file '.json' sets the purpose of the workflow. For the use of Generic Element, it is necessary to have knowledge in creating ProcessGroups in Apache NiFi.

DCNiOS creates the specified workflow in Apache NiFi using the .json file, substitutes the environment variables, and uses the same configuration characteristics as other Elements such as Connections and Components. Additionally, make the connections with other Elements. Thus, the declarative .yaml file has the following structure:

- An identifier name of the Element (ProcessGrop in NiFi). It must be unique. Required.
- The path of your ProcessGroup (.json file).Required.
- The variables that compose the workflow (as a list).

Also, a generic element can use [Alterations](/docs/Alterations) if it is connected with another element, or the subsection `component` to modify the time execution or the node execution. The user must know the names of the NiFi Processor defined in the .json.

To use a Generic Element that interacts with other elements, it is necessary to use an Input or Output port with the default name. Please use only one Input and one Output.



```
generic:
- name: <identifier>
file: <file-of-process-group>
variables:
key1: value1
key2: value2
components:
- name: GetFile
seconds: 2
node: (ALL | PRIMARY)
alterations:
- action: Encode
Encoding: base64
```

6 changes: 3 additions & 3 deletions docpage/docs/Introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ sidebar_position: 1

# Introduction

DCNiOS is an open-source command-line tool that easily manages the creation of event-driven data processing flows. DCNiOS reads a file with a workflow defined in a YAML structure. Then, DCNiOS creates this workflow in an Apache NiFi cluster. DCNiOS uses transparently the Apache NiFi [Process Groups](https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Configuring_a_ProcessGroup) to create predefined workflows.
DCNiOS is an open-source command-line tool that easily manages the creation of event-driven data processing flows. DCNiOS reads a file with a workflow defined in a YAML structure. Then, DCNiOS creates this workflow in an Apache NiFi cluster. DCNiOS uses transparently the Apache NiFi [ProcessGroups](https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Configuring_a_ProcessGroup) to create predefined workflows.


![DCNiOS images](/../static/img/dcnios-logo-hor.png)

Apache NiFi Process Group is a group of Processors that compose a dataflow. DCNiOS uses predefined Process Groups that make simple actions like interacting with third-party elements (e.g., consuming from Kafka) or changing the data content (e.g.encoding the data in base64) to compose a complete dataflow.
Apache NiFi ProcessGroup is a group of Processors that compose a dataflow. DCNiOS uses predefined ProcessGroups that make simple actions like interacting with third-party elements (e.g., consuming from Kafka) or changing the data content (e.g.encoding the data in base64) to compose a complete dataflow.

In DCNiOS documentation, the Process Groups are split by purpose into three main groups: 'Sources', 'Destinations', and 'Alterations'.
In DCNiOS documentation, the ProcessGroups are split by purpose into three main groups: 'Sources', 'Destinations', and 'Alterations'.
- 'Sources' interact with third-party elements as the input data receiver.
- 'Destinations' interact with third-party elements as an output data sender.
- 'Alterations' that do not interact with third-party elements and change the format of the data flow.
Expand Down
13 changes: 13 additions & 0 deletions docpage/docs/Users.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,19 @@ connection:



### Terminology

Element: It is equivalent to a ProcessGroup in NiFi, where a defined workflow takes place.

Source: It is an Element (ProcessGroup) that connects to an external tool to use it for data ingestion.

Destination: It is an Element (ProcessGroup) that connects to an external tool where data will be sent.

Alterations: It modify the format of the input data provided by a Source.

Component: It is equivalent to a Processor in NiFi, where a specific task is performed.


### Example


Expand Down
Loading

0 comments on commit 93877ed

Please sign in to comment.