This repository has been archived by the owner on Dec 20, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 70
Configuration Properties
Peter Rudenko edited this page Oct 3, 2018
·
14 revisions
SparkRDMA has several runtime properties that can be set along side other Spark properties:
Property Name | Default/Min/Max | Description |
---|---|---|
spark.shuffle.rdma.driverPort | random/1025/65535 | Port the RDMA driver instance will listen on. |
spark.shuffle.rdma.executorPort | random/1025/65535 | Port the RDMA executor instances will listen on. |
spark.shuffle.rdma.portMaxRetries | 16/1/1000 | Maximum number of attempts to bind to an RDMA port before failing. Each retry will increment the previously attempted port number by 1. This value applies to both the RDMA driverPort and RDMA executorPort. |
spark.shuffle.rdma.cpuList | All CPUs/--/-- | The list of CPUs that should be used by the RDMA services for thread creation and event processing. It is recommended to only use the CPU cores associated with the NUMA node that the Mellanox NIC is attached to. The parameter should be specified as a comma separated list, but can also take a hyphenated range. Invalid syntax will result in reverting to the default value. Examples: 1,3,5 or 1-5, or 1-4,10-12 |
spark.shuffle.rdma.useOdp | false | On-Demand-Paging (ODP) is a technique to ease the memory registration. Applications do not need to pin down the underlying physical pages of the address space, and track the validity of the mappings. Rather, the HCA (Host Channel Adapter) requests the latest translations from the OS when pages are not present, and the OS invalidates translations which are no longer valid due to either non-present pages or mapping changes. See more... |
spark.shuffle.rdma.collectOdpStats | true | Collect and report ODP statistics |
spark.shuffle.rdma.device.num | 0 | Device number to get ODP stats from sysfs (/sys/class/infiniband_verbs/uverbs$DEVICE_NUMBER /) (only if spark.shuffle.rdma.useOdp=true and spark.shuffle.rdma.collectOdpStats=true ) |
spark.shuffle.rdma.preAllocateBuffers | Comma separated list of buffer size : buffer count pairs. E.g. 4k:1000,16k:500 |
Property Name | Default/Min/Max | Description |
---|---|---|
spark.shuffle.rdma.recvQueueDepth | 1024/256/65535 | The maximum number of outstanding receive work requests that can be posted to the QP. |
spark.shuffle.rdma.sendQueueDepth | 4096/256/65535 | The maximum number of outstanding send work requests that can be posted to the QP. |
spark.shuffle.rdma.recvWrSize | 4k/2k/1m | The size (in bytes) of the buffers used to receive data from a SEND operation. |
Property Name | Default/Min/Max | Description |
---|---|---|
spark.shuffle.rdma.rdmaCmEventTimeout | 20000/-1/60000 | The amount of time to wait (in milliseconds) for RDMA CM events before failing. A value of -1 means to wait forever. |
spark.shuffle.rdma.teardownListenTimeout | 50/-1/60000 | The amount of time to wait (in milliseconds) for RDMA disconnect events before failing. A value of -1 means to wait forever. |
spark.shuffle.rdma.resolvePathTimeout | 2000/-1/60000 | The amount of time to wait (in milliseconds) for RDMA resolve address and resolve route events before failing. A value of -1 means to wait forever. |
spark.shuffle.rdma.maxConnectionAttempts | 5/1/100 | Maximum attempts to set up remote connections before failing a task |
Property Name | Default/Min/Max | Description |
---|---|---|
spark.shuffle.rdma.shuffleWriteBlockSize | 8m/4k/512m | The storage block size used for the shuffle writer. When using "ChunkedPartitionAgg" writer method, it's the size of each memory buffer used to store ShuffleWrite data. In “Wrapper” mode, it's the size of each file mapping – e.g. a 120MB file is broken down into 8MB sized file mappings. |
Property Name | Default/Min/Max | Description |
---|---|---|
spark.shuffle.rdma.shuffleReadBlockSize | 256k/0/512m | The transfer size to be used for block fetches on shuffle read operations. The SparkRDMA layer will aggregate the blocks into a single buffer until it reaches this size. When set to "0", no aggregation will be performed on the reader side. |
spark.shuffle.rdma.maxBytesInFlight | 64m/128k/100g | Maximum bytes that shuffle read operations will attempt to fetch at any given moment. If this threshold is reached, then fetches will resume only once outstanding requests complete. |
spark.shuffle.rdma. partitionLocationFetchTimeout |
30000/1000/MAX_INT | The amount of time to wait (in milliseconds) for fetching Shuffle metadata |