Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spconv could not take advantage of gpu acceleration #704

Open
yigetan0909 opened this issue Jun 13, 2024 · 0 comments
Open

spconv could not take advantage of gpu acceleration #704

yigetan0909 opened this issue Jun 13, 2024 · 0 comments

Comments

@yigetan0909
Copy link

My test code is as follows:

import torch
import torch.nn as nn
import time
import spconv.pytorch as spconv

x = torch.zeros(64, 16, 124, 124, dtype=torch.float16).cuda()
for i in range(10):
    x[0 , 0, i, 0]=1
x1 =x.to(dtype=torch.float32)
cv1 = nn.Conv2d(16, 16, 3, 1, 1,).half().cuda()
cv2 = nn.Conv2d(16, 16, 3, 1, 1,).cuda()
cv3 = spconv.SubMConv2d(16, 16, 3, 1, padding=1, indice_key="asd", algo=spconv.ConvAlgo.Native).half().cuda()
cv4 = spconv.SubMConv2d(16, 16, 3, 1, padding=1, indice_key="asd", algo=spconv.ConvAlgo.Native).cuda()

s= x.permute(0,2,3,1)
s = spconv.SparseConvTensor.from_dense(s)
s1= x1.permute(0,2,3,1)
s1 = spconv.SparseConvTensor.from_dense(s1)

for i in range(10):
    a = time.time()
    y1 = cv3(s)
    b = time.time()
    print(b-a)

for i in range(10):
    a = time.time()
    y1 = cv4(s1)
    b = time.time()
    print(b-a)

# 在gpu能力较差的情况下
# 原始卷积
# FP16:
#   e1: 0.022843360900878906
#   e last: 7.104873657226562e-05
# FP32:
#   e1: 0.0012712478637695312
#   e last: 8.654594421386719e-05
# spconv
# FP16:
#   e1: 0.07234716415405273
#   e last: 0.0004432201385498047
# FP32:
#   e1: 0.0012712478637695312
#   e last: 0.00042891502380371094

However, when I tested with the cpu, the test result show that the GPU does not achieve acceleration:

# CPU下运行
# spconv
# FP16:
#   e1: 0.08111023902893066
#   e last: 0.00044608116149902344
# FP32:
#   e1: 0.0016925334930419922
#   e last: 0.00042366981506347656

Did I make a mistake calling the test? I sincerely hope to get an answer. Thank you for your help.

Supplementary description:

  1. The above results are run on a laptop, and this situation also occurs when the GPU is A6000.
  2. I changed algo, but the situation still occurs under the default algo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant