Skip to content

Commit

Permalink
Merge branch 'main' into renjie/issue-66-1
Browse files Browse the repository at this point in the history
  • Loading branch information
liurenjie1024 authored Jan 4, 2024
2 parents 3f56d0b + 47e3ae7 commit 7c1caf6
Show file tree
Hide file tree
Showing 18 changed files with 482 additions and 4 deletions.
3 changes: 3 additions & 0 deletions .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ github:
required_approving_review_count: 1

required_linear_history: true

features:
wiki: false
issues: true
Expand All @@ -50,6 +51,8 @@ github:
- Xuanwo
- liurenjie1024
- JanKaul
ghp_branch: gh-pages
ghp_path: /

notifications:
commits: [email protected]
Expand Down
56 changes: 56 additions & 0 deletions .github/workflows/website.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: Website

on:
push:
branches:
- main
pull_request:
branches:
- main

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name }}
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup mdBook
uses: peaceiris/actions-mdbook@v1
with:
mdbook-version: '0.4.36'

- name: Build
working-directory: website
run: mdbook build

- name: Copy asf file
run: cp .asf.yaml ./website/book/.asf.yaml

- name: Deploy to gh-pages
uses: peaceiris/[email protected]
if: github.event_name == 'push' && github.ref_name == 'main'
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: website/book
publish_branch: gh-pages
3 changes: 2 additions & 1 deletion .licenserc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,6 @@ header:
- 'LICENSE'
- 'NOTICE'
- '**/*.json'

# Generated content by mdbook
- 'website/book'
comment: on-failure
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ log = "^0.4"
mockito = "^1"
murmur3 = "0.5.2"
once_cell = "1"
opendal = "0.43"
opendal = "0.44"
ordered-float = "4.0.0"
pretty_assertions = "1.4.0"
port_scanner = "0.1.5"
Expand Down
71 changes: 71 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,77 @@

Native Rust implementation of [Apache Iceberg](https://iceberg.apache.org/).

## Roadmap

### Catalog

| Catalog Type | Status |
|--------------|-------------|
| Rest | Done |
| Hive | In Progress |
| Sql | Not Started |
| Glue | Not Started |
| DynamoDB | Not Started |

### FileIO

| FileIO Type | Status |
|-------------|-------------|
| S3 | Done |
| Local File | Done |
| GCS | Not Started |
| HDFS | Not Started |

Our `FileIO` is powered by [Apache OpenDAL](https://github.com/apache/incubator-opendal), so it would be quite easy to
expand to other service.

### Table API

#### Reader

| Feature | Status |
|------------------------------------------------------------|-------------|
| File based task planning | In progress |
| Size based task planning | Not started |
| Filter pushdown(manifest evaluation, partition prunning) | Not started |
| Apply deletions, including equality and position deletions | Not started |
| Read into arrow record batch | Not started |
| Parquet file support | Not started |
| ORC file support | Not started |

#### Writer

| Feature | Status |
|--------------------------|-------------|
| Data writer | Not started |
| Equality deletion writer | Not started |
| Position deletion writer | Not started |
| Partitioned writer | Not started |
| Upsert writer | Not started |
| Parquet file support | Not started |
| ORC file support | Not started |

#### Transaction

| Feature | Status |
|-----------------------|-------------|
| Schema evolution | Not started |
| Update partition spec | Not started |
| Update properties | Not started |
| Replace sort order | Not started |
| Update location | Not started |
| Append files | Not started |
| Rewrite files | Not started |
| Rewrite manifests | Not started |
| Overwrite files | Not started |
| Row level updates | Not started |
| Replace partitions | Not started |
| Snapshot management | Not started |

### Integrations

We will add integrations with other rust based data systems, such as polars, datafusion, etc.

## Contribute

Iceberg is an active open-source project. We are always open to people who want to use it or contribute to it. Here are some ways to go.
Expand Down
42 changes: 42 additions & 0 deletions crates/iceberg/src/expr/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! This module contains expressions.

mod term;
pub use term::*;
mod predicate;
pub use predicate::*;

/// Predicate operators used in expressions.
#[allow(missing_docs)]
pub enum PredicateOperator {
IsNull,
NotNull,
IsNan,
NotNan,
LessThan,
LessThanOrEq,
GreaterThan,
GreaterThanOrEq,
Eq,
NotEq,
In,
NotIn,
StartsWith,
NotStartsWith,
}
93 changes: 93 additions & 0 deletions crates/iceberg/src/expr/predicate.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! This module contains predicate expressions.
//! Predicate expressions are used to filter data, and evaluates to a boolean value. For example,
//! `a > 10` is a predicate expression, and it evaluates to `true` if `a` is greater than `10`,

use crate::expr::{BoundReference, PredicateOperator, UnboundReference};
use crate::spec::Literal;
use std::collections::HashSet;

/// Logical expression, such as `AND`, `OR`, `NOT`.
pub struct LogicalExpression<T, const N: usize> {
inputs: [Box<T>; N],
}

/// Unary predicate, for example, `a IS NULL`.
pub struct UnaryExpression<T> {
/// Operator of this predicate, must be single operand operator.
op: PredicateOperator,
/// Term of this predicate, for example, `a` in `a IS NULL`.
term: T,
}

/// Binary predicate, for example, `a > 10`.
pub struct BinaryExpression<T> {
/// Operator of this predicate, must be binary operator, such as `=`, `>`, `<`, etc.
op: PredicateOperator,
/// Term of this predicate, for example, `a` in `a > 10`.
term: T,
/// Literal of this predicate, for example, `10` in `a > 10`.
literal: Literal,
}

/// Set predicates, for example, `a in (1, 2, 3)`.
pub struct SetExpression<T> {
/// Operator of this predicate, must be set operator, such as `IN`, `NOT IN`, etc.
op: PredicateOperator,
/// Term of this predicate, for example, `a` in `a in (1, 2, 3)`.
term: T,
/// Literals of this predicate, for example, `(1, 2, 3)` in `a in (1, 2, 3)`.
literals: HashSet<Literal>,
}

/// Unbound predicate expression before binding to a schema.
pub enum UnboundPredicate {
/// And predicate, for example, `a > 10 AND b < 20`.
And(LogicalExpression<UnboundPredicate, 2>),
/// Or predicate, for example, `a > 10 OR b < 20`.
Or(LogicalExpression<UnboundPredicate, 2>),
/// Not predicate, for example, `NOT (a > 10)`.
Not(LogicalExpression<UnboundPredicate, 1>),
/// Unary expression, for example, `a IS NULL`.
Unary(UnaryExpression<UnboundReference>),
/// Binary expression, for example, `a > 10`.
Binary(BinaryExpression<UnboundReference>),
/// Set predicates, for example, `a in (1, 2, 3)`.
Set(SetExpression<UnboundReference>),
}

/// Bound predicate expression after binding to a schema.
pub enum BoundPredicate {
/// An expression always evaluates to true.
AlwaysTrue,
/// An expression always evaluates to false.
AlwaysFalse,
/// An expression combined by `AND`, for example, `a > 10 AND b < 20`.
And(LogicalExpression<BoundPredicate, 2>),
/// An expression combined by `OR`, for example, `a > 10 OR b < 20`.
Or(LogicalExpression<BoundPredicate, 2>),
/// An expression combined by `NOT`, for example, `NOT (a > 10)`.
Not(LogicalExpression<BoundPredicate, 1>),
/// Unary expression, for example, `a IS NULL`.
Unary(UnaryExpression<BoundReference>),
/// Binary expression, for example, `a > 10`.
Binary(BinaryExpression<BoundReference>),
/// Set predicates, for example, `a in (1, 2, 3)`.
Set(SetExpression<BoundReference>),
}
37 changes: 37 additions & 0 deletions crates/iceberg/src/expr/term.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! Term definition.

use crate::spec::NestedFieldRef;

/// Unbound term before binding to a schema.
pub type UnboundTerm = UnboundReference;

/// A named reference in an unbound expression.
/// For example, `a` in `a > 10`.
pub struct UnboundReference {
name: String,
}

/// A named reference in a bound expression after binding to a schema.
pub struct BoundReference {
field: NestedFieldRef,
}

/// Bound term after binding to a schema.
pub type BoundTerm = BoundReference;
4 changes: 4 additions & 0 deletions crates/iceberg/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ pub use error::ErrorKind;
pub use error::Result;

mod catalog;

pub use catalog::Catalog;
pub use catalog::Namespace;
pub use catalog::NamespaceIdent;
Expand All @@ -45,5 +46,8 @@ pub mod io;
pub mod spec;

mod scan;

#[allow(dead_code)]
pub mod expr;
pub mod transaction;
pub mod transform;
2 changes: 1 addition & 1 deletion crates/iceberg/src/spec/manifest.rs
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ impl ManifestWriter {
let entry = self
.field_summary
.remove(&field.source_id)
.unwrap_or(FieldSummary::default());
.unwrap_or_default();
partition_summary.push(entry);
}
partition_summary
Expand Down
2 changes: 1 addition & 1 deletion rust-toolchain.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@
# under the License.

[toolchain]
channel = "1.72.1"
channel = "1.75.0"
components = ["rustfmt", "clippy"]
18 changes: 18 additions & 0 deletions website/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

book
Loading

0 comments on commit 7c1caf6

Please sign in to comment.