You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The example code examples/multi_gpu/distributed_sampling.py of using distributed sampling with multiple GPUs has a bug in the splitting of the indices.
# Split indices into `world_size` many chunks:train_idx=data.train_mask.nonzero(as_tuple=False).view(-1)
train_idx=train_idx.split(train_idx.size(0) //world_size)[rank]
This code does floor division to split train_idx to world_size chunks, but since this is a floor division, it can create world_size + 1 chunks. To fix this issue, use the ceiling division.
importmath# Split indices into `world_size` many chunks:train_idx=data.train_mask.nonzero(as_tuple=False).view(-1)
train_idx=train_idx.split(math.ceil(train_idx.size(0) /world_size))[rank]
🐛 Describe the bug
The example code
examples/multi_gpu/distributed_sampling.py
of using distributed sampling with multiple GPUs has a bug in the splitting of the indices.This code does floor division to split
train_idx
toworld_size
chunks, but since this is a floor division, it can createworld_size + 1
chunks. To fix this issue, use the ceiling division.This ensures that we have
world_size
chunks.Versions
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.1.0
[pip3] torch_geometric==2.4.0
[pip3] torch-scatter==2.1.2+pt21cu118
[pip3] torch-sparse==0.6.18+pt21cu118
[pip3] torchaudio==2.1.0
[pip3] torchvision==0.16.0
[pip3] triton==2.1.0
[conda] blas 1.0 mkl
[conda] cuda-cudart 11.8.89 0 nvidia
[conda] cuda-cupti 11.8.87 0 nvidia
[conda] cuda-libraries 11.8.0 0 nvidia
[conda] cuda-nvrtc 11.8.89 0 nvidia
[conda] cuda-nvtx 11.8.86 0 nvidia
[conda] cuda-runtime 11.8.0 0 nvidia
[conda] cudatoolkit 11.8.0 h6a678d5_0
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] libcublas 11.11.3.6 0 nvidia
[conda] libcufft 10.9.0.58 0 nvidia
[conda] libcurand 10.3.5.147 0 nvidia
[conda] libcusolver 11.4.1.48 0 nvidia
[conda] libcusparse 11.7.5.86 0 nvidia
[conda] libjpeg-turbo 2.0.0 h9bf148f_0 pytorch
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.10 py310h5eee18b_0
[conda] mkl_random 1.2.7 py310h1128e8f_0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] pytorch 2.1.0 py3.10_cuda11.8_cudnn8.7.0_0 pytorch
[conda] pytorch-cuda 11.8 h7e8668a_6 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch-geometric 2.4.0 pypi_0 pypi
[conda] torch-scatter 2.1.2+pt21cu118 pypi_0 pypi
[conda] torch-sparse 0.6.18+pt21cu118 pypi_0 pypi
[conda] torchaudio 2.1.0 py310_cu118 pytorch
[conda] torchtriton 2.1.0 py310 pytorch
[conda] torchvision 0.16.0 py310_cu118 pytorch
The text was updated successfully, but these errors were encountered: