You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi could you please help me resolving the issue with the rocm/rocm-terminal.
I'm running the container with the command:
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video --privileged=true rocm/rocm-terminal:5.4.2
When I run the rocminfo or rocm-smi from inside the container the GPU seems not to be initialized:
rocm-user@edfc4c0977ff:~$ rocminfo
ROCk module is NOT loaded, possibly no GPU devices
rocm-user@edfc4c0977ff:~$ rocm-smi
cat: /sys/module/amdgpu/initstate: No such file or directory
ERROR:root:Driver not initialized (amdgpu not found in modules)
In deed the /sys/module/amdgpu is not present in the container
I must use the --privileged=true otherwise I get the error message:
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/rocm-terminal:5.4.2
docker: Error response from daemon: error gathering device information while adding custom device "/dev/kfd": no such file or directory.
ERRO[0000] error waiting for container:
On host everything seems to be working fine. I have installed the rocm-5.4.2 and the modules are present in the system:
rocm-smi
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 30.0c 13.0W 500Mhz 96Mhz 20.0% auto 213.0W 2% 0%
================================================================================
============================= End of ROCm SMI Log ==============================
rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Uuid: CPU-XX
Marketing Name: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4600
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16315820(0xf8f5ac) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16315820(0xf8f5ac) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16315820(0xf8f5ac) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1030
Uuid: GPU-18937fe3730afec6
Marketing Name: AMD Radeon PRO W6800
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 4096(0x1000) KB
L3: 131072(0x20000) KB
Chip ID: 29603(0x73a3)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2555
BDFID: 768
Internal Node ID: 1
Compute Unit: 60
SIMDs per CU: 2
Shader Engines: 8
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 31440896(0x1dfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1030
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
I tried fresh installation of Ubuntu 22.04 and 20.4. And I get the same results.
Are there any requirements regarding the motherboard and cpu to make it work? The system I'm testing the container isn't new
Hi could you please help me resolving the issue with the rocm/rocm-terminal.
I'm running the container with the command:
When I run the
rocminfo
orrocm-smi
from inside the container the GPU seems not to be initialized:rocm-user@edfc4c0977ff:~$ rocminfo ROCk module is NOT loaded, possibly no GPU devices
In deed the
/sys/module/amdgpu
is not present in the containerI must use the
--privileged=true
otherwise I get the error message:On host everything seems to be working fine. I have installed the rocm-5.4.2 and the modules are present in the system:
I tried fresh installation of Ubuntu 22.04 and 20.4. And I get the same results.
Are there any requirements regarding the motherboard and cpu to make it work? The system I'm testing the container isn't new
Motherboard:
Thanks in advance for your help.
The text was updated successfully, but these errors were encountered: