Skip to content

Commit

Permalink
Merge pull request pola-rs#122 from linjing-lab/main
Browse files Browse the repository at this point in the history
update docs
  • Loading branch information
linjing-lab authored Sep 22, 2023
2 parents c080741 + 432dead commit 0fe8cd0
Show file tree
Hide file tree
Showing 8 changed files with 12 additions and 14 deletions.
Binary file added user_guide/data/db-benchmark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 6 additions & 4 deletions user_guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
- [上下文](dsl/contexts.md)
- [分组](dsl/groupby.md)
- [折叠](dsl/folds.md)
- [自定义函数](dsl/custom_functions.md)
- [窗口函数](dsl/window_functions.md)
- [Numpy 通用函数](dsl/numpy.md)
- [示例](notebooks/introduction_polars.md)
Expand All @@ -17,6 +18,7 @@
- [来自 Pandas](coming_from_pandas.md)
- [来自 Apache Spark](coming_from_spark.md)
- [时间序列](timeseries/intro.md)
- [示例](time-series.md)
- [使用范围](howcani/intro.md)
- [IO](howcani/io/intro.md)
- [CSV 文件](howcani/io/csv.md)
Expand All @@ -29,10 +31,10 @@
- [互通性](howcani/interop/intro.md)
- [Arrow](howcani/interop/arrow.md)
- [Numpy](howcani/interop/numpy.md)
- [数据处理](howcani/data/intro.md)
- [处理字符串](howcani/data/strings.md)
- [时间戳解析](howcani/data/timestamps.md)
- [处理数据帧](howcani/df/intro.md)
- [数据](howcani/data/intro.md)
- [字符串](howcani/data/strings.md)
- [时间戳](howcani/data/timestamps.md)
- [数据帧](howcani/df/intro.md)
- [选中](howcani/df/row_col_selection.md)
- [常用](howcani/df/common-manipulations.md)
- [分组](howcani/df/groupby.md)
Expand Down
2 changes: 1 addition & 1 deletion user_guide/src/coming_from_spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

`Spark` `DataFrame` 类似于一个行的集合,而 `Polars` `DataFrame` 更接近于一个列的集合。这意味着你可以在 `Polars` 中以 `Spark` 中不可能的方式组合列,因为 `Spark` 保留了每一行中的数据关系。

考虑一下下面这个样本数据集
考虑下面这个样本数据集

```python
import polars as pl
Expand Down
2 changes: 1 addition & 1 deletion user_guide/src/howcani/data/intro.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# 数据处理
# 数据
2 changes: 1 addition & 1 deletion user_guide/src/howcani/data/strings.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# 处理字符串
# 字符串

由于 `Arrow` 后端, `Polars`字符串操作比使用`NumPy``Pandas`执行的相同操作快得多。在后者中,字符串存储为`Python`对象。 在遍历`np.array` or the `pd.Series`时,CPU需要跟踪所有字符串指针,并跳转到许多随机内存位置——这是非常低效的缓存。在`Polars`(通过`Arrow`数据结构)中,字符串在内存中是连续的。因此,对于CPU来说,遍历缓存是最优的,也是可预测的。

Expand Down
2 changes: 1 addition & 1 deletion user_guide/src/howcani/data/timestamps.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# 时间戳解析
# 时间戳

`Polars` 提供了`4`时间数据类型:

Expand Down
4 changes: 2 additions & 2 deletions user_guide/src/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@

Polars的速度非常快,事实上是目前性能最好的解决方案之一。参见h2oai的db基准测试中的结果。下图显示了产生结果的最大数据集。

![](https://www.ritchievink.com/img/post-35-polars-0.15/db-benchmark.png)
![](../data/db-benchmark.png)

### 当前状态

Expand All @@ -48,7 +48,7 @@ Polars的速度非常快,事实上是目前性能最好的解决方案之一
- 位掩码(bitmask)优化
- 高效算法
- 非常快的IO
- 它的csv和parquet 阅读器是现存速度最快的阅读器之一
- 它的csv和parquet阅读器是现存速度最快的阅读器之一
- [查询优化](optimizations/lazy/intro.md)
- 谓词(Predicate)下推
- 扫描级过滤
Expand Down
4 changes: 0 additions & 4 deletions user_guide/src/time-series.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
# 时间序列

> `时间序列`文章正在建设中
我们仍在这一页上部署。但这里已经有一些例子来展示如何使用`groupby_dynamic`按时间窗口分组。

```python
import polars as pl
from datetime import datetime
Expand Down

0 comments on commit 0fe8cd0

Please sign in to comment.