Skip to content

Commit

Permalink
Merge pull request openshift#990 from SchSeba/sync-2024-08-20
Browse files Browse the repository at this point in the history
d/s: Sync 2024 08 20
  • Loading branch information
openshift-merge-bot[bot] committed Aug 21, 2024
2 parents 6b540ec + ba76887 commit 8b4771e
Show file tree
Hide file tree
Showing 31 changed files with 691 additions and 824 deletions.
4 changes: 3 additions & 1 deletion Dockerfile.sriov-network-config-daemon
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ RUN make _build-sriov-network-config-daemon BIN_PATH=build/_output/cmd

FROM quay.io/centos/centos:stream9
ARG MSTFLINT=mstflint
RUN ARCH_DEP_PKGS=$(if [ "$(uname -m)" != "s390x" ]; then echo -n ${MSTFLINT} ; fi) && yum -y install hwdata $ARCH_DEP_PKGS && yum clean all
# We have to ensure that pciutils is installed. This package is needed for mstfwreset to succeed.
# xref pkg/vendors/mellanox/mellanox.go#L150
RUN ARCH_DEP_PKGS=$(if [ "$(uname -m)" != "s390x" ]; then echo -n ${MSTFLINT} ; fi) && yum -y install hwdata pciutils $ARCH_DEP_PKGS && yum clean all
LABEL io.k8s.display-name="sriov-network-config-daemon" \
io.k8s.description="This is a daemon that manage and config sriov network devices in Kubernetes cluster"
COPY --from=builder /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/build/_output/cmd/sriov-network-config-daemon /usr/bin/
Expand Down
37 changes: 27 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -327,19 +327,37 @@ spec:
node-role.kubernetes.io/worker: ""
```

### Resource Injector Policy
## Feature Gates

By default, the Resource injector webhook has a failed policy of ignored, this was implemented to not block pod creation
in case the webhook is not available.
Feature gates are used to enable or disable specific features in the operator.

with a feature introduced in Kubernetes 1.28(Beta) called [MatchConditions](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#matching-requests-matchconditions)
we can move the webhook failed policy to be Fail. In this case the operator configured the Mutating webhook for the resource
injector only on pods with the secondary network annotation of `k8s.v1.cni.cncf.io/networks`.
It's possible to enable the feature with a FeatureGate via the SriovOperatorConfig object
> **NOTE**: As features mature and graduate to stable status, default settings may change, and feature gates might be removed in future releases. Keep this in mind when configuring feature gates and ensure your environment is compatible with any updates.

> **NOTE**: the feature is disabled by default
### Available Feature Gates

**Example**:
1. **Parallel NIC Configuration** (`parallelNicConfig`)
- **Description:** Allows the configuration of NICs in parallel, which can potentially reduce the time required for network setup.
- **Default:** Disabled

2. **Resource Injector Match Condition** (`resourceInjectorMatchCondition`)
- **Description:** Switches the resource injector's webhook failure policy from "Ignore" to "Fail" by utilizing the `MatchConditions` feature introduced in Kubernetes 1.28. This ensures the webhook only targets pods with the `k8s.v1.cni.cncf.io/networks` annotation, improving reliability without affecting other pods.
- **Default:** Disabled

3. **Metrics Exporter** (`metricsExporter`)
- **Description:** Enables the metrics exporter on the same node where the config-daemon is running. This helps in collecting and exporting metrics related to SR-IOV network devices.
- **Default:** Disabled

4. **Manage Software Bridges** (`manageSoftwareBridges`)
- **Description:** Allows the operator to manage software bridges. This feature gate is useful for environments where bridge management is required.
- **Default:** Disabled

5. **Mellanox Firmware Reset** (`mellanoxFirmwareReset`)
- **Description:** Enables the firmware reset via `mstfwreset` before a system reboot. This feature is specific to Mellanox network devices and is used to ensure that the firmware is properly reset during system maintenance.
- **Default:** Disabled

### Enabling Feature Gates

To enable a feature gate, add it to your configuration file or command line with the desired state. For example, to enable the `resourceInjectorMatchCondition` feature gate, you would specify:

```yaml
apiVersion: sriovnetwork.openshift.io/v1
Expand All @@ -348,7 +366,6 @@ metadata:
name: default
namespace: sriov-network-operator
spec:
...
featureGates:
resourceInjectorMatchCondition: true
...
Expand Down
1 change: 1 addition & 0 deletions bindata/manifests/daemon/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ spec:
annotations:
kubectl.kubernetes.io/default-container: sriov-network-config-daemon
target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
openshift.io/required-scc: privileged
spec:
hostNetwork: true
hostPID: true
Expand Down
2 changes: 2 additions & 0 deletions bindata/manifests/operator-webhook/server.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ spec:
target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
labels:
app: operator-webhook
annotations:
openshift.io/required-scc: restricted-v2
spec:
securityContext:
runAsNonRoot: true
Expand Down
2 changes: 2 additions & 0 deletions bindata/manifests/webhook/server.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ spec:
component: network
type: infra
openshift.io/component: network
annotations:
openshift.io/required-scc: restricted-v2
spec:
securityContext:
runAsNonRoot: true
Expand Down
4 changes: 2 additions & 2 deletions cmd/sriov-network-config-daemon/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -146,9 +146,9 @@ func phasePre(setupLog logr.Logger, conf *systemd.SriovConfig, hostHelpers helpe
return fmt.Errorf("failed to remove sriov result file: %v", err)
}

_, err := hostHelpers.TryEnableRdma()
_, err := hostHelpers.CheckRDMAEnabled()
if err != nil {
setupLog.Error(err, "warning, failed to enable RDMA")
setupLog.Error(err, "warning, failed to check RDMA state")
}
hostHelpers.TryEnableTun()
hostHelpers.TryEnableVhostNet()
Expand Down
6 changes: 3 additions & 3 deletions cmd/sriov-network-config-daemon/service_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ var _ = Describe("Service", func() {
"/etc/sriov-operator/sriov-interface-result.yaml": []byte("something"),
},
})
hostHelpers.EXPECT().TryEnableRdma().Return(true, nil)
hostHelpers.EXPECT().CheckRDMAEnabled().Return(true, nil)
hostHelpers.EXPECT().TryEnableTun().Return()
hostHelpers.EXPECT().TryEnableVhostNet().Return()
hostHelpers.EXPECT().DiscoverSriovDevices(hostHelpers).Return([]sriovnetworkv1.InterfaceExt{{
Expand All @@ -183,7 +183,7 @@ var _ = Describe("Service", func() {
"/etc/sriov-operator/sriov-interface-result.yaml": []byte("something"),
},
})
hostHelpers.EXPECT().TryEnableRdma().Return(true, nil)
hostHelpers.EXPECT().CheckRDMAEnabled().Return(true, nil)
hostHelpers.EXPECT().TryEnableTun().Return()
hostHelpers.EXPECT().TryEnableVhostNet().Return()

Expand Down Expand Up @@ -211,7 +211,7 @@ var _ = Describe("Service", func() {
"/etc/sriov-operator/sriov-interface-result.yaml": []byte("something"),
},
})
hostHelpers.EXPECT().TryEnableRdma().Return(true, nil)
hostHelpers.EXPECT().CheckRDMAEnabled().Return(true, nil)
hostHelpers.EXPECT().TryEnableTun().Return()
hostHelpers.EXPECT().TryEnableVhostNet().Return()
hostHelpers.EXPECT().DiscoverSriovDevices(hostHelpers).Return([]sriovnetworkv1.InterfaceExt{{
Expand Down
15 changes: 15 additions & 0 deletions cmd/sriov-network-config-daemon/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import (

"github.com/spf13/cobra"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/kubernetes/scheme"
"k8s.io/client-go/rest"
Expand All @@ -41,6 +42,7 @@ import (
snclientset "github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/client/clientset/versioned"
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/consts"
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon"
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/featuregate"
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/helper"
snolog "github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/log"
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/platforms"
Expand Down Expand Up @@ -276,6 +278,18 @@ func runStartCmd(cmd *cobra.Command, args []string) error {
}
go nodeWriter.Run(stopCh, refreshCh, syncCh)

// Init feature gates once to prevent race conditions.
defaultConfig := &sriovnetworkv1.SriovOperatorConfig{}
err = kClient.Get(context.Background(), types.NamespacedName{Namespace: vars.Namespace, Name: consts.DefaultConfigName}, defaultConfig)
if err != nil {
log.Log.Error(err, "Failed to get default SriovOperatorConfig object")
return err
}
featureGates := featuregate.New()
featureGates.Init(defaultConfig.Spec.FeatureGates)
vars.MlxPluginFwReset = featureGates.IsEnabled(consts.MellanoxFirmwareResetFeatureGate)
log.Log.Info("Enabled featureGates", "featureGates", featureGates.String())

setupLog.V(0).Info("Starting SriovNetworkConfigDaemon")
err = daemon.New(
kClient,
Expand All @@ -288,6 +302,7 @@ func runStartCmd(cmd *cobra.Command, args []string) error {
syncCh,
refreshCh,
eventRecorder,
featureGates,
startOpts.disabledPlugins,
).Run(stopCh, exitCh)
if err != nil {
Expand Down
1 change: 1 addition & 0 deletions config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ spec:
metadata:
annotations:
target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
openshift.io/required-scc: restricted-v2
labels:
name: sriov-network-operator
spec:
Expand Down
2 changes: 2 additions & 0 deletions deploy/operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ spec:
metadata:
labels:
name: sriov-network-operator
annotations:
openshift.io/required-scc: restricted-v2
spec:
affinity:
nodeAffinity:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ spec:
maxUnavailable: 33%
template:
metadata:
annotations:
openshift.io/required-scc: restricted-v2
labels:
name: sriov-network-operator
spec:
Expand Down
1 change: 1 addition & 0 deletions doc/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
1. A supported SRIOV hardware on the cluster nodes. Supported models can be found [here](https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/doc/supported-hardware.md).
2. Kubernetes or Openshift cluster running on bare metal nodes.
3. Multus-cni is deployed as default CNI plugin, and there is a default CNI plugin (flannel, openshift-sdn etc.) available for Multus-cni.
4. On RedHat Enterprise Linux and Ubuntu operating systems, the `rdma-core` package must be installed to support RDMA resource provisioning. On RedHat CoreOS the package installation is not required.

## Installation

Expand Down
4 changes: 2 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ require (
github.com/pkg/errors v0.9.1
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.68.0
github.com/prometheus-operator/prometheus-operator/pkg/client v0.68.0
github.com/prometheus/client_model v0.5.0
github.com/prometheus/common v0.45.0
github.com/safchain/ethtool v0.3.0
github.com/spf13/cobra v1.7.0
github.com/stretchr/testify v1.8.4
Expand Down Expand Up @@ -125,8 +127,6 @@ require (
github.com/peterbourgon/diskv v2.0.1+incompatible // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/client_golang v1.17.0 // indirect
github.com/prometheus/client_model v0.5.0 // indirect
github.com/prometheus/common v0.45.0 // indirect
github.com/prometheus/procfs v0.12.0 // indirect
github.com/robfig/cron v1.2.0 // indirect
github.com/rogpeppe/go-internal v1.10.0 // indirect
Expand Down
31 changes: 31 additions & 0 deletions hack/deploy-operator-helm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/usr/bin/env bash
set -xeo pipefail

here="$(dirname "$(readlink --canonicalize "${BASH_SOURCE[0]}")")"
root="$(readlink --canonicalize "$here/..")"

export ADMISSION_CONTROLLERS_ENABLED=true
export ADMISSION_CONTROLLERS_CERTIFICATES_CERT_MANAGER_ENABLED=true
export NAMESPACE="sriov-network-operator"
export OPERATOR_NAMESPACE="sriov-network-operator"

source hack/env.sh

HELM_MODE=${HELM_MODE:-install}

HELM_VALUES_OPTS="\
--set images.operator=${SRIOV_NETWORK_OPERATOR_IMAGE} \
--set images.sriovConfigDaemon=${SRIOV_NETWORK_CONFIG_DAEMON_IMAGE} \
--set images.sriovCni=${SRIOV_CNI_IMAGE} \
--set images.sriovDevicePlugin=${SRIOV_DEVICE_PLUGIN_IMAGE} \
--set images.resourcesInjector=${NETWORK_RESOURCES_INJECTOR_IMAGE} \
--set images.webhook=${SRIOV_NETWORK_WEBHOOK_IMAGE} \
--set operator.admissionControllers.enabled=${ADMISSION_CONTROLLERS_ENABLED} \
--set operator.admissionControllers.certificates.certManager.enabled=${ADMISSION_CONTROLLERS_CERTIFICATES_CERT_MANAGER_ENABLED} \
--set sriovOperatorConfig.deploy=true"

PATH=$PATH:${root}/bin
make helm
helm ${HELM_MODE} -n ${NAMESPACE} --create-namespace \
$HELM_VALUES_OPTS \
--wait sriov-network-operator ./deployment/sriov-network-operator-chart
39 changes: 9 additions & 30 deletions hack/run-e2e-conformance-virtual-cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,13 @@ root="$(readlink --canonicalize "$here/..")"
NUM_OF_WORKERS=${NUM_OF_WORKERS:-2}
total_number_of_nodes=$((1 + NUM_OF_WORKERS))

## Global configuration
export NAMESPACE="sriov-network-operator"
export OPERATOR_NAMESPACE="sriov-network-operator"
export SKIP_VAR_SET=""
export OPERATOR_EXEC=kubectl
export CLUSTER_HAS_EMULATED_PF=TRUE

if [ "$NUM_OF_WORKERS" -lt 2 ]; then
echo "Min number of workers is 2"
exit 1
Expand Down Expand Up @@ -364,36 +371,8 @@ do
ATTEMPTS=$((ATTEMPTS+1))
done


source hack/env.sh

export ADMISSION_CONTROLLERS_ENABLED=true
export ADMISSION_CONTROLLERS_CERTIFICATES_CERT_MANAGER_ENABLED=true
export SKIP_VAR_SET=""
export NAMESPACE="sriov-network-operator"
export OPERATOR_NAMESPACE="sriov-network-operator"
export CNI_BIN_PATH=/opt/cni/bin
export OPERATOR_EXEC=kubectl
export CLUSTER_HAS_EMULATED_PF=TRUE


HELM_VALUES_OPTS="\
--set images.operator=${SRIOV_NETWORK_OPERATOR_IMAGE} \
--set images.sriovConfigDaemon=${SRIOV_NETWORK_CONFIG_DAEMON_IMAGE} \
--set images.sriovCni=${SRIOV_CNI_IMAGE} \
--set images.sriovDevicePlugin=${SRIOV_DEVICE_PLUGIN_IMAGE} \
--set images.resourcesInjector=${NETWORK_RESOURCES_INJECTOR_IMAGE} \
--set images.webhook=${SRIOV_NETWORK_WEBHOOK_IMAGE} \
--set operator.admissionControllers.enabled=${ADMISSION_CONTROLLERS_ENABLED} \
--set operator.admissionControllers.certificates.certManager.enabled=${ADMISSION_CONTROLLERS_CERTIFICATES_CERT_MANAGER_ENABLED} \
--set sriovOperatorConfig.deploy=true"

PATH=$PATH:${root}/bin
make helm
helm install -n ${NAMESPACE} --create-namespace \
$HELM_VALUES_OPTS \
--wait sriov-network-operator ./deployment/sriov-network-operator-chart

# Deploy the sriov operator via helm
hack/deploy-operator-helm.sh

echo "## create certificates for webhook"
cat <<EOF | kubectl apply -f -
Expand Down
17 changes: 12 additions & 5 deletions hack/virtual-cluster-redeploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,14 @@ if [ $CLUSTER_TYPE == "openshift" ]; then
echo ${auth} > registry-login.conf

internal_registry="image-registry.openshift-image-registry.svc:5000"
pass=$( jq .\"$internal_registry\".password registry-login.conf )
pass=$( jq .\"image-registry.openshift-image-registry.svc:5000\".auth registry-login.conf )
pass=`echo ${pass:1:-1} | base64 -d`

# dockercfg password is in the form `<token>:password`. We need to trim the `<token>:` prefix
pass=${pass#"<token>:"}

registry="default-route-openshift-image-registry.apps.${cluster_name}.${domain_name}"
podman login -u serviceaccount -p ${pass:1:-1} $registry --tls-verify=false
podman login -u serviceaccount -p ${pass} $registry --tls-verify=false

export SRIOV_NETWORK_OPERATOR_IMAGE="$registry/$NAMESPACE/sriov-network-operator:latest"
export SRIOV_NETWORK_CONFIG_DAEMON_IMAGE="$registry/$NAMESPACE/sriov-network-config-daemon:latest"
Expand All @@ -44,6 +48,7 @@ else
fi

export ADMISSION_CONTROLLERS_ENABLED=true
export OPERATOR_LEADER_ELECTION_ENABLE=true
export SKIP_VAR_SET=""
export OPERATOR_NAMESPACE=$NAMESPACE
export OPERATOR_EXEC=kubectl
Expand All @@ -67,9 +72,11 @@ if [ $CLUSTER_TYPE == "openshift" ]; then
export SRIOV_NETWORK_OPERATOR_IMAGE="image-registry.openshift-image-registry.svc:5000/$NAMESPACE/sriov-network-operator:latest"
export SRIOV_NETWORK_CONFIG_DAEMON_IMAGE="image-registry.openshift-image-registry.svc:5000/$NAMESPACE/sriov-network-config-daemon:latest"
export SRIOV_NETWORK_WEBHOOK_IMAGE="image-registry.openshift-image-registry.svc:5000/$NAMESPACE/sriov-network-operator-webhook:latest"
echo "## deploying SRIOV Network Operator"
hack/deploy-setup.sh $NAMESPACE
else
export HELM_MODE=upgrade
hack/deploy-operator-helm.sh
fi

echo "## deploying SRIOV Network Operator"
hack/deploy-setup.sh $NAMESPACE

kubectl -n ${NAMESPACE} delete po --all
3 changes: 3 additions & 0 deletions pkg/consts/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,9 @@ const (
// ManageSoftwareBridgesFeatureGate: enables management of software bridges by the operator
ManageSoftwareBridgesFeatureGate = "manageSoftwareBridges"

// MellanoxFirmwareResetFeatureGate: enables the firmware reset via mstfwreset before a reboot
MellanoxFirmwareResetFeatureGate = "mellanoxFirmwareReset"

// The path to the file on the host filesystem that contains the IB GUID distribution for IB VFs
InfinibandGUIDConfigFilePath = SriovConfBasePath + "/infiniband/guids"
)
Expand Down
Loading

0 comments on commit 8b4771e

Please sign in to comment.