Skip to content

Commit

Permalink
[BEAM-7607] Per user request, making maxFilesPerBundle public (#9160)
Browse files Browse the repository at this point in the history
* Per user request, making maxFilesPerBundle public

* Adding documentation.

* Apply spotless
  • Loading branch information
pabloem authored Aug 5, 2019
1 parent 913f065 commit 08d0146
Showing 1 changed file with 11 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2019,8 +2019,17 @@ public Write<T> withTestServices(BigQueryServices testServices) {
return toBuilder().setBigQueryServices(testServices).build();
}

@VisibleForTesting
Write<T> withMaxFilesPerBundle(int maxFilesPerBundle) {
/**
* Control how many files will be written concurrently by a single worker when using BigQuery
* load jobs before spilling to a shuffle. When data comes into this transform, it is written to
* one file per destination per worker. When there are more files than maxFilesPerBundle
* (DEFAULT: 20), the data is shuffled (i.e. Grouped By Destination), and written to files
* one-by-one-per-worker. This flag sets the maximum number of files that a single worker can
* write concurrently before shuffling the data. This flag should be used with caution. Setting
* a high number can increase the memory pressure on workers, and setting a low number can make
* a pipeline slower (due to the need to shuffle data).
*/
public Write<T> withMaxFilesPerBundle(int maxFilesPerBundle) {
checkArgument(
maxFilesPerBundle > 0, "maxFilesPerBundle must be > 0, but was: %s", maxFilesPerBundle);
return toBuilder().setMaxFilesPerBundle(maxFilesPerBundle).build();
Expand Down

0 comments on commit 08d0146

Please sign in to comment.