Skip to content

Commit

Permalink
update user_guide
Browse files Browse the repository at this point in the history
  • Loading branch information
linjing-lab committed Sep 22, 2023
1 parent d180801 commit 9ac40df
Show file tree
Hide file tree
Showing 10 changed files with 32 additions and 24 deletions.
2 changes: 0 additions & 2 deletions user_guide/src/dsl/groupby.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# 分组

> 本页还在施工中。。。。
## 多线程

处理表状数据最高效的方式就是通过“分割-处理-组合”的方式并行地进行。这样的操作正是 `Polars`
Expand Down
2 changes: 0 additions & 2 deletions user_guide/src/howcani/intro.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
# 使用范围

本章包含一些片段,可以帮助您了解用`Polars`完成事情的惯用方法的最新信息。

> “使用范围”一章正在建设中。
4 changes: 0 additions & 4 deletions user_guide/src/howcani/io/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,3 @@ path = "<YOUR_PATH>"
dataset = pq.ParquetDataset(f"s3://{bucket}/{path}", filesystem=fs)
df = pl.from_arrow(dataset.read())
```

## 写入

> 该内容正在建设中。
3 changes: 1 addition & 2 deletions user_guide/src/howcani/io/csv.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,4 @@ df.write_csv("path.csv")
df = pl.scan_csv("path.csv")
```

如果你想了解更多这样设计的精妙之处,
请移步`Polars`[Optimizations](../../optimizations/intro.md)这一章。
如果你想了解更多这样设计的精妙之处,请移步`Polars`[Optimizations](../../optimizations/intro.md)这一章。
19 changes: 18 additions & 1 deletion user_guide/src/howcani/io/google-big-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,21 @@ df = pl.from_arrow(rows.to_arrow())

## 写入

> 内容还在建设中
```python
from google.cloud import bigquery

client = bigquery.Client()

with io.BytesIO() as stream:
df.write_parquet(stream)
stream.seek(0)
job = client.load_table_from_file(
stream,
destination='tablename',
project='projectname',
job_config=bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.PARQUET,
),
)
job.result()
```
2 changes: 0 additions & 2 deletions user_guide/src/optimizations/lazy/intro.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# 惰性方法

> 惰性方法页面正在构建中。
为了展示惰性`Polars`功能,我们将探索两种中大型用户名数据集:

[Reddit用户名数据集](https://www.reddit.com/r/datasets/comments/9i8s5j/dataset_metadata_for_69_million_reddit_users_in/)
Expand Down
2 changes: 0 additions & 2 deletions user_guide/src/optimizations/lazy/other-optimizations.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# 其它优化

> `其它优化`页面正在构建中。
除了谓词和投影下推之外,`Polars`还进行其他优化。

一个重要的主题是可选的缓存和并行化。很容易想象,有两种不同的`DataFrame`计算会导致扫描同一个文件`Polars`可能会缓存扫描的文件,以防止扫描同一文件两次。但是,如果您愿意,可以重写此行为并强制`Polars`读取同一文件。这可能会更快,因为扫描可以并行进行。
Expand Down
2 changes: 0 additions & 2 deletions user_guide/src/optimizations/lazy/predicate-pushdown.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# 谓词下推

> `谓词下推`章节正在构建中
谓词下推是`Polars`所做的优化,可以减少查询时间和内存使用。谓词是数据库行话,用于在某个表上应用过滤器,从而减少该表上的行数。

那么,让我们看看是否可以加载一些Reddit数据并对几个谓词进行过滤。
Expand Down
2 changes: 0 additions & 2 deletions user_guide/src/optimizations/lazy/projection-pushdown.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# 投影下推

> `投影下推`章节正在构建中
我们来把上一章节中的查询与在 Runescape (一款游戏)数据中进行 *FILTER* 操作的结果结合起来,
来找出以字母 `a` 开头且玩过 Runescape 的流行 Reddit 用户名。相信你一定也会对此感兴趣的!

Expand Down
18 changes: 13 additions & 5 deletions user_guide/src/timeseries/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@
上采样实际上相当于将一个日期范围与你的数据集进行左关联 (left join) 操作,并填充缺失数据。`Polars` 为此操作
提供了封装方法,你可以参考下面的一个示例。

```python
{{#include ../examples/time_series/resampling_example.py:5:}}
```

## 下采样 (Down Sampling)

下采样很有意思。你需要处理日期间隔、窗口持续时间、聚合等问题。
Expand All @@ -25,6 +29,14 @@

让我们通过下面几个示例来理解这样做的意义。

```python
{{#include ../examples/time_series/dynamic_ds.py:5:}}
```

```python
{{#include ../examples/time_series/dynamic_groupby.py:5:}}
```

## 动态分组 (Groupby Dynamic)

在下面的一段代码中,我们以 **** (`"1d"`) 为单位,把关于 2021 年的 `日期范围 (date range)` 创建为一个 `DataFrame`
Expand Down Expand Up @@ -129,13 +141,9 @@ print(out)

```python
{{#include ../examples/time_series/dynamic_groupby.py:4:}}
print(out)
print(df)
```

```text
{{#include ../outputs/time_series/dyn_gb.txt}}
```

## 上采样

> 该部分内容仍在编写。

0 comments on commit 9ac40df

Please sign in to comment.