-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MOFED Pod CrashLoopBackOff State #830
Comments
Hi @ruta-04, did you configure cluster-wide-entitlement? |
yes, I added the cluster-wide-entitlement and ran the test pod provided in the instructions which gave the following output. It matches the example output.
|
any update on this? |
Looks that entitlement is setup OK. Did you restart MOFED after setting up the entitlement? BTW, in 24.1 version, entitlement is not needed as compilation is done with DTK. |
I setup the entitlement before starting the mofed pods.
Anything else we can try?
…On Sun, Mar 10, 2024 at 12:30 AM Fred Rolland ***@***.***> wrote:
Looks that entitlement is setup OK. Did you restart MOFED after setting up
the entitlement?
BTW, in 24.1 version, entitlement is not needed as compilation is done
with DTK.
—
Reply to this email directly, view it on GitHub
<#830 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALJO5CY65MGGBQA3CXTDH5TYXP4ZZAVCNFSM6AAAAABDXLOAI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXGEYDOMBVGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@e0ne |
Any update on this one?
…On Tue, Mar 12, 2024 at 3:13 AM Fred Rolland ***@***.***> wrote:
@e0ne <https://github.com/e0ne>
I remember that we encountered an issue where the Red Hat subscription
should be of a certain type.
Do you recall something related?
—
Reply to this email directly, view it on GitHub
<#830 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALJO5C26V7PJU4T6M6XSUMDYX22J5AVCNFSM6AAAAABDXLOAI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJRGAYDIMBZGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Will you be able to use more recent Network Operator release? |
I am using Nvidia Network Operator 23.10.0 on the Openshift Container Platform (rhcos 4.14).
While creating nicclusterpolicy, it spins up MOFED pods which are crashing continuously (running but in notReady state)
`
[ansible@csah-pri entitlement]$ oc logs mofed-rhcos4.14-ds-ncj2n
Unsetting driver ready state
No OFED driver found for kernel 5.14.0-284.40.1.el9_2.x86_64
Enabling RHOCP and EUS RPM repos...
ID="rhcos"
VERSION_ID="4.14"
RHEL_VERSION="9.2"
Updating Subscription Management repositories.
Unable to read consumer identity
subscription-manager is operating in container mode.
Updating Subscription Management repositories.
Unable to read consumer identity
subscription-manager is operating in container mode.
cuda 589 B/s | 3.5 kB 00:06
cuda 294 kB/s | 1.2 MB 00:04
Red Hat Enterprise Linux 9 for x86_64 - AppStre 0.0 B/s | 0 B 00:10
Errors during downloading metadata for repository 'rhel-9-for-x86_64-appstream-rpms':
Error: Failed to download metadata for repo 'rhel-9-for-x86_64-appstream-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Updating Subscription Management repositories.
Unable to read consumer identity
subscription-manager is operating in container mode.
Updating Subscription Management repositories.
Unable to read consumer identity
subscription-manager is operating in container mode.
Updating Subscription Management repositories.
Unable to read consumer identity
subscription-manager is operating in container mode.
cuda 589 B/s | 3.5 kB 00:06
Red Hat Enterprise Linux 9 for x86_64 - AppStre 5.0 MB/s | 24 MB 00:04
Red Hat Enterprise Linux 9 for x86_64 - BaseOS 13 MB/s | 16 MB 00:01
Red Hat Enterprise Linux 9 for x86_64 - BaseOS 11 MB/s | 14 MB 00:01
Red Hat Universal Base Image 9 (RPMs) - BaseOS 3.1 kB/s | 3.8 kB 00:01
Red Hat Universal Base Image 9 (RPMs) - BaseOS 802 kB/s | 515 kB 00:00
Red Hat Universal Base Image 9 (RPMs) - AppStre 22 kB/s | 4.2 kB 00:00
Red Hat Universal Base Image 9 (RPMs) - AppStre 2.3 MB/s | 1.8 MB 00:00
Red Hat Universal Base Image 9 (RPMs) - CodeRea 22 kB/s | 4.2 kB 00:00
Red Hat Universal Base Image 9 (RPMs) - CodeRea 345 kB/s | 192 kB 00:00
Metadata cache created.
Installing dependencies
Error in POSTTRANS scriptlet in rpm package kernel-core
Installed:
cryptsetup-libs-2.6.0-3.el9.x86_64
device-mapper-9:1.02.195-3.el9.x86_64
device-mapper-libs-9:1.02.195-3.el9.x86_64
dracut-057-44.git20230822.el9.x86_64
kbd-2.4.0-9.el9.x86_64
kbd-legacy-2.4.0-9.el9.noarch
kbd-misc-2.4.0-9.el9.noarch
kernel-5.14.0-284.40.1.el9_2.x86_64
kernel-core-5.14.0-284.40.1.el9_2.x86_64
kernel-modules-5.14.0-284.40.1.el9_2.x86_64
kernel-modules-core-5.14.0-284.40.1.el9_2.x86_64
kpartx-0.8.7-22.el9.x86_64
libkcapi-1.3.1-3.el9.x86_64
libkcapi-hmaccalc-1.3.1-3.el9.x86_64
linux-firmware-20230310-138.el9_2.noarch
linux-firmware-whence-20230310-138.el9_2.noarch
pigz-2.5-4.el9.x86_64
systemd-udev-252-18.el9.x86_64
Downgraded:
elfutils-0.188-3.el9.x86_64
elfutils-debuginfod-client-0.188-3.el9.x86_64
elfutils-libelf-0.188-3.el9.x86_64
elfutils-libs-0.188-3.el9.x86_64
Installed:
createrepo_c-0.20.1-1.el9.x86_64
createrepo_c-libs-0.20.1-1.el9.x86_64
elfutils-libelf-devel-0.188-3.el9.x86_64
kernel-rpm-macros-185-12.el9.noarch
numactl-libs-2.0.16-1.el9.x86_64
zlib-devel-1.2.11-40.el9.x86_64
Installing Linux kernel headers...
Error: Unable to find a match: kernel-headers-5.14.0-284.40.1.el9_2.x86_64 kernel-devel-5.14.0-284.40.1.el9_2.x86_64
Command "dnf -q -y --releasever=9.2 install kernel-headers-5.14.0-284.40.1.el9_2.x86_64 kernel-devel-5.14.0-284.40.1.el9_2.x86_64" failed with exit code: 1
Terminate event caught
Terminating container
Unsetting driver ready state
Keeping currently loaded Mellanox OFED Driver...`
I can't seem to pinpoint the issue here. is it that os's kernel version is not currently supported by OFED driver?
The text was updated successfully, but these errors were encountered: