Skip to content

Commit

Permalink
remove duplicate content for local cache part (#5535)
Browse files Browse the repository at this point in the history
 remove duplicate content from doc
  • Loading branch information
gaoyangxiaozhu authored Apr 26, 2024
1 parent 731c17e commit c191753
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 26 deletions.
14 changes: 1 addition & 13 deletions docs/get-started/VeloxABFS.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,4 @@ spark.hadoop.fs.azure.account.key.<storage-account>.dfs.core.windows.net XXXXXX

# Local Caching support

Velox supports a local cache when reading data from HDFS/S3/ABFS. With this feature, Velox can asynchronously cache the data on local disk when reading from remote storage and future read requests on previously cached blocks will be serviced from local cache files. To enable the local caching feature, the following configurations are required:

```
spark.gluten.sql.columnar.backend.velox.cacheEnabled // enable or disable velox cache, default false.
spark.gluten.sql.columnar.backend.velox.memCacheSize // the total size of in-mem cache, default is 128MB.
spark.gluten.sql.columnar.backend.velox.ssdCachePath // the folder to store the cache files, default is "/tmp".
spark.gluten.sql.columnar.backend.velox.ssdCacheSize // the total size of the SSD cache, default is 128MB. Velox will do in-mem cache only if this value is 0.
spark.gluten.sql.columnar.backend.velox.ssdCacheShards // the shards of the SSD cache, default is 1.
spark.gluten.sql.columnar.backend.velox.ssdCacheIOThreads // the IO threads for cache promoting, default is 1. Velox will try to do "read-ahead" if this value is bigger than 1
spark.gluten.sql.columnar.backend.velox.ssdODirect // enable or disable O_DIRECT on cache write, default false.
```

It's recommended to mount SSDs to the cache path to get the best performance of local caching. Cache files will be written to "spark.gluten.sql.columnar.backend.velox.cachePath", with UUID based suffix, e.g. "/tmp/cache.13e8ab65-3af4-46ac-8d28-ff99b2a9ec9b0". Gluten cannot reuse older caches for now, and the old cache files are left after Spark context shutdown.
Velox supports a local cache when reading data from ABFS. Please refer [Velox Local Cache](VeloxLocalCache.md) part for more detailed configurations.
20 changes: 20 additions & 0 deletions docs/get-started/VeloxLocalCache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
layout: page
title: Velox Local Caching
nav_order: 7
parent: Getting-Started
---

Velox supports a local cache when reading data from HDFS/S3/ABFS. With this feature, Velox can asynchronously cache the data on local disk when reading from remote storage and future read requests on previously cached blocks will be serviced from local cache files. To enable the local caching feature, the following configurations are required:

```
spark.gluten.sql.columnar.backend.velox.cacheEnabled // enable or disable velox cache, default false.
spark.gluten.sql.columnar.backend.velox.memCacheSize // the total size of in-mem cache, default is 128MB.
spark.gluten.sql.columnar.backend.velox.ssdCachePath // the folder to store the cache files, default is "/tmp".
spark.gluten.sql.columnar.backend.velox.ssdCacheSize // the total size of the SSD cache, default is 128MB. Velox will do in-mem cache only if this value is 0.
spark.gluten.sql.columnar.backend.velox.ssdCacheShards // the shards of the SSD cache, default is 1.
spark.gluten.sql.columnar.backend.velox.ssdCacheIOThreads // the IO threads for cache promoting, default is 1. Velox will try to do "read-ahead" if this value is bigger than 1
spark.gluten.sql.columnar.backend.velox.ssdODirect // enable or disable O_DIRECT on cache write, default false.
```

It's recommended to mount SSDs to the cache path to get the best performance of local caching. Cache files will be written to "spark.gluten.sql.columnar.backend.velox.cachePath", with UUID based suffix, e.g. "/tmp/cache.13e8ab65-3af4-46ac-8d28-ff99b2a9ec9b0". Gluten cannot reuse older caches for now, and the old cache files are left after Spark context shutdown.
14 changes: 1 addition & 13 deletions docs/get-started/VeloxS3.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,16 +58,4 @@ You can change log granularity of AWS C++ SDK by setting the `spark.gluten.velox

# Local Caching support

Velox supports a local cache when reading data from HDFS/S3. The feature is very useful if remote storage is slow, e.g., reading from a public S3 bucket and stronger performance is desired. With this feature, Velox can asynchronously cache the data on local disk when reading from remote storage, and the future reading requests on already cached blocks will be serviced from local cache files. To enable the local caching feature, below configurations are required:

```
spark.gluten.sql.columnar.backend.velox.cacheEnabled // enable or disable velox cache, default false.
spark.gluten.sql.columnar.backend.velox.memCacheSize // the total size of in-mem cache, default is 128MB.
spark.gluten.sql.columnar.backend.velox.ssdCachePath // the folder to store the cache files, default is "/tmp".
spark.gluten.sql.columnar.backend.velox.ssdCacheSize // the total size of the SSD cache, default is 128MB. Velox will do in-mem cache only if this value is 0.
spark.gluten.sql.columnar.backend.velox.ssdCacheShards // the shards of the SSD cache, default is 1.
spark.gluten.sql.columnar.backend.velox.ssdCacheIOThreads // the IO threads for cache promoting, default is 1. Velox will try to do "read-ahead" if this value is bigger than 1
spark.gluten.sql.columnar.backend.velox.ssdODirect // enable or disable O_DIRECT on cache write, default false.
```

It's recommended to mount SSDs to the cache path to get the best performance of local caching. On the start up of Spark context, the cache files will be allocated under "spark.gluten.sql.columnar.backend.velox.cachePath", with UUID based suffix, e.g. "/tmp/cache.13e8ab65-3af4-46ac-8d28-ff99b2a9ec9b0". Gluten is not able to reuse older caches for now, and the old cache files are left there after Spark context shutdown.
Velox supports a local cache when reading data from S3. Please refer [Velox Local Cache](VeloxLocalCache.md) part for more detailed configurations.

0 comments on commit c191753

Please sign in to comment.