Skip to content

Commit

Permalink
Generating new version 1.21.1
Browse files Browse the repository at this point in the history
  • Loading branch information
GitHub Action Website Snapshot committed Aug 30, 2024
1 parent aee7233 commit c45c88a
Show file tree
Hide file tree
Showing 230 changed files with 12,483 additions and 0 deletions.
1 change: 1 addition & 0 deletions versioned_docs/version-1.21.1/before-ol.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions versioned_docs/version-1.21.1/client/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Client Libraries",
"position": 4
}
4 changes: 4 additions & 0 deletions versioned_docs/version-1.21.1/client/java/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Java",
"position": 1
}
94 changes: 94 additions & 0 deletions versioned_docs/version-1.21.1/client/java/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
sidebar_position: 2
title: Configuration
---

We recommend configuring the client with an `openlineage.yml` file that contains all the
details of how to connect to your OpenLineage backend.

See [example configurations.](#transports)

You can make this file available to the client in three ways (the list also presents precedence of the configuration):

1. Set an `OPENLINEAGE_CONFIG` environment variable to a file path: `OPENLINEAGE_CONFIG=path/to/openlineage.yml`.
2. Place an `openlineage.yml` in the user's current working directory.
3. Place an `openlineage.yml` under `.openlineage/` in the user's home directory (`~/.openlineage/openlineage.yml`).


## Environment Variables
The following environment variables are available:

| Name | Description | Since |
|----------------------|-----------------------------------------------------------------------------|-------|
| OPENLINEAGE_CONFIG | The path to the YAML configuration file. Example: `path/to/openlineage.yml` | |
| OPENLINEAGE_DISABLED | When `true`, OpenLineage will not emit events. | 0.9.0 |


## Facets Configuration

In YAML configuration file you can also specify a list of disabled facets that will not be included in OpenLineage event.

*YAML Configuration*
```yaml
transport:
type: console
facets:
disabled:
- spark_unknown
- spark_logicalPlan
```
## Transports
import Transports from './partials/java_transport.md';
<Transports/>
### Error Handling via Transport
```java
// Connect to http://localhost:5000
OpenLineageClient client = OpenLineageClient.builder()
.transport(
HttpTransport.builder()
.uri("http://localhost:5000")
.apiKey("f38d2189-c603-4b46-bdea-e573a3b5a7d5")
.build())
.registerErrorHandler(new EmitErrorHandler() {
@Override
public void handleError(Throwable throwable) {
// Handle emit error here
}
}).build();
```

### Defining Your Own Transport

```java
OpenLineageClient client = OpenLineageClient.builder()
.transport(
new MyTransport() {
@Override
public void emit(OpenLineage.RunEvent runEvent) {
// Add emit logic here
}
}).build();
```

## Circuit Breakers

import CircuitBreakers from './partials/java_circuit_breaker.md';

<CircuitBreakers/>

## Metrics

import Metrics from './partials/java_metrics.md';

<Metrics/>

## Dataset Namespace Resolver

import DatasetNamespaceResolver from './partials/java_namespace_resolver.md';

<DatasetNamespaceResolver/>
39 changes: 39 additions & 0 deletions versioned_docs/version-1.21.1/client/java/java.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
sidebar_position: 5
---

# Java

## Overview

The OpenLineage Java is a SDK for Java programming language that users can use to generate and emit OpenLineage events to OpenLineage backends.
The core data structures currently offered by the client are the `RunEvent`, `RunState`, `Run`, `Job`, `Dataset`,
and `Transport` classes, along with various `Facets` that can come under run, job, and dataset.

There are various [transport classes](#transports) that the library provides that carry the lineage events into various target endpoints (e.g. HTTP).

You can also use the Java client to create your own custom integrations.

## Installation

Java client is provided as library that can either be imported into your Java project using Maven or Gradle.

Maven:

```xml
<dependency>
<groupId>io.openlineage</groupId>
<artifactId>openlineage-java</artifactId>
<version>${OPENLINEAGE_VERSION}</version>
</dependency>
```

or Gradle:

```groovy
implementation("io.openlineage:openlineage-java:${OPENLINEAGE_VERSION}")
```

For more information on the available versions of the `openlineage-java`,
please refer to the [maven repository](https://search.maven.org/artifact/io.openlineage/openlineage-java).

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

:::info
This feature is available in OpenLineage versions >= 1.9.0.
:::

To prevent from over-instrumentation OpenLineage integration provides a circuit breaker mechanism
that stops OpenLineage from creating, serializing and sending OpenLineage events.

### Simple Memory Circuit Breaker

Simple circuit breaker which is working based only on free memory within JVM. Configuration should
contain free memory threshold limit (percentage). Default value is `20%`. The circuit breaker
will close within first call if free memory is low. `circuitCheckIntervalInMillis` parameter is used
to configure a frequency circuit breaker is called. Default value is `1000ms`, when no entry in config.
`timeoutInSeconds` is optional. If set, OpenLineage code execution is terminated when a timeout
is reached (added in version 1.13).

<Tabs groupId="integrations">
<TabItem value="yaml" label="Yaml Config">

```yaml
circuitBreaker:
type: simpleMemory
memoryThreshold: 20
circuitCheckIntervalInMillis: 1000
timeoutInSeconds: 90
```
</TabItem>
<TabItem value="spark" label="Spark Config">
| Parameter | Definition | Example |
--------------------------------------|----------------------------------------------------------------|--------------
| spark.openlineage.circuitBreaker.type | Circuit breaker type selected | simpleMemory |
| spark.openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| spark.openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13)| 90 |
</TabItem>
<TabItem value="flink" label="Flink Config">
| Parameter | Definition | Example |
--------------------------------------|---------------------------------------------|-------------
| openlineage.circuitBreaker.type | Circuit breaker type selected | simpleMemory |
| openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13) | 90 |
</TabItem>
</Tabs>
### Java Runtime Circuit Breaker
More complex version of circuit breaker. The amount of free memory can be low as long as
amount of time spent on Garbage Collection is acceptable. `JavaRuntimeCircuitBreaker` closes
when free memory drops below threshold and amount of time spent on garbage collection exceeds
given threshold (`10%` by default). The circuit breaker is always open when checked for the first time
as GC threshold is computed since the previous circuit breaker call.
`circuitCheckIntervalInMillis` parameter is used
to configure a frequency circuit breaker is called.
Default value is `1000ms`, when no entry in config.
`timeoutInSeconds` is optional. If set, OpenLineage code execution is terminated when a timeout
is reached (added in version 1.13).

<Tabs groupId="integrations">
<TabItem value="yaml" label="Yaml Config">

```yaml
circuitBreaker:
type: javaRuntime
memoryThreshold: 20
gcCpuThreshold: 10
circuitCheckIntervalInMillis: 1000
timeoutInSeconds: 90
```
</TabItem>
<TabItem value="spark" label="Spark Config">

| Parameter | Definition | Example |
--------------------------------------|---------------------------------------|-------------
| spark.openlineage.circuitBreaker.type | Circuit breaker type selected | javaRuntime |
| spark.openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| spark.openlineage.circuitBreaker.gcCpuThreshold | Garbage Collection CPU threshold | 10 |
| spark.openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13)| 90 |


</TabItem>
<TabItem value="flink" label="Flink Config">

| Parameter | Definition | Example |
--------------------------------------|---------------------------------------|-------------
| openlineage.circuitBreaker.type | Circuit breaker type selected | javaRuntime |
| openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| openlineage.circuitBreaker.gcCpuThreshold | Garbage Collection CPU threshold | 10 |
| openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13) | 90 |


</TabItem>
</Tabs>

### Custom Circuit Breaker

List of available circuit breakers can be extended with custom one loaded via ServiceLoader
with own implementation of `io.openlineage.client.circuitBreaker.CircuitBreakerBuilder`.
64 changes: 64 additions & 0 deletions versioned_docs/version-1.21.1/client/java/partials/java_metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

:::info
This feature is available in OpenLineage 1.11 and above
:::

To ease the operational experience of using the OpenLineage integrations, this document details the metrics collected by the Java client and the configuration settings for various metric backends.

### Metrics collected by Java Client

The following table outlines the metrics collected by the OpenLineage Java client, which help in monitoring the integration's performance:

| Metric | Definition | Type |
|-------------------------------------|-------------------------------------------------------|--------|
| `openlineage.emit.start` | Number of events the integration started to send | Counter|
| `openlineage.emit.complete` | Number of events the integration completed sending | Counter|
| `openlineage.emit.time` | Time spent on emitting events | Timer |
| `openlineage.circuitbreaker.engaged`| Status of the Circuit Breaker (engaged or not) | Gauge |

## Metric Backends

OpenLineage uses [Micrometer](https://micrometer.io) for metrics collection, similar to how SLF4J operates for logging. Micrometer provides a facade over different metric backends, allowing metrics to be dispatched to various destinations.

### Configuring Metric Backends

Below are the available backends and potential configurations using Micrometer's facilities.

### StatsD

Full configuration options for StatsD can be found in the [Micrometer's StatsDConfig implementation](https://github.com/micrometer-metrics/micrometer/blob/main/implementations/micrometer-registry-statsd/src/main/java/io/micrometer/statsd/StatsdConfig.java).

<Tabs groupId="integrations">
<TabItem value="yaml" label="Yaml Config">

```yaml
metrics:
type: statsd
flavor: datadog
host: localhost
port: 8125
```
</TabItem>
<TabItem value="spark" label="Spark Config">
| Parameter | Definition | Example |
--------------------------------------|---------------------------------------|-------------
| spark.openlineage.metrics.type | Metrics type selected | statsd |
| spark.openlineage.metrics.flavor | Flavor of StatsD configuration | datadog |
| spark.openlineage.metrics.host | Host that receives StatsD metrics | localhost |
| spark.openlineage.metrics.port | Port that receives StatsD metrics | 8125 |
</TabItem>
<TabItem value="flink" label="Flink Config">
| Parameter | Definition | Example |
--------------------------------------|---------------------------------------|-------------
| openlineage.metrics.type | Metrics type selected | statsd |
| openlineage.metrics.flavor | Flavor of StatsD configuration | datadog |
| openlineage.metrics.host | Host that receives StatsD metrics | localhost |
| openlineage.metrics.port | Port that receives StatsD metrics | 8125 |
</TabItem>
</Tabs>
Loading

0 comments on commit c45c88a

Please sign in to comment.