-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design doc for the externally-manage-pf support #476
design doc for the externally-manage-pf support #476
Conversation
Thanks for your PR,
To skip the vendors CIs use one of:
|
Pull Request Test Coverage Report for Build 5725351357
💛 - Coveralls |
LGTM |
de40793
to
81d3b31
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
doc/design/externally-manage-pf.md
Outdated
|
||
The feature is needed to allow the operator to only configure a subset of virtual functions. | ||
This allows a third party component like nmstate, kubernetes-nmstate, NetworkManager to handle the creation | ||
and the usage of the virtual functions on the system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by usage in the system ?
doc/design/externally-manage-pf.md
Outdated
|
||
### Use Cases | ||
|
||
* As a user I want to use a virtual function for SDN network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why cant you do it today ? i think this needs more clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, why isnt using sriov-operator in systemd mode good enough for openshift ovn-k hw offload use case ?
because they ended up not requesting an sriov resource ? because they didnt want cluster network operator to depend on sriov operator ? (maybe it should for this case IMO, but i dont have the full context...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I start working on the systemd I was thinking this will be a solution for both HW-offload and any request to use a vf as the primary interface.
But at the end the use-case for the systemd mode if you ask me is to have a faster boot/sriov configuration when you request a large number of virtual functions.
Even with systemd you have some limitations that only the externally manage can handle like:
- Installation time - meaning you want to use the vf even before you deploy k8s because after the deployment moving the IP address from the PF to the virtual function is super complicated. (even for vanilla deployment not OCP)
- same for also storage you may want a storage connection via a VF for the baremetal machine via a VF even before you deploy k8s on that machine.
The request is to be able and have parity with just creating vlan interfaces from the PF to split the host traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets extend the usecases then to reflect that.
i.e for SDN the network need to be configured before k8s is deployed and these VFs should be available at system startup before pods start running.
also lets add storage use-case where you need VFs to be configured and available for the storage subsystem before k8s is deployed / pods spinning up at system startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/externally-manage-pf.md
Outdated
|
||
* Allow the SR-IOV operator to handle the configure and pod allocation of a subset of virtual functions. | ||
* Allow the user to Allocate the number of virtual functions he wants for the system and the subset he wants for pods | ||
* Not resetting the numOfVfs for PFs that the operator didn't configure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that the operator didint configure -> that are externally managed.
doc/design/externally-manage-pf.md
Outdated
|
||
Create a flow in the SR-IOV operator where the user can request a configuration for a subset of virtual functions. | ||
|
||
The operator will first validate and the requested PF contains the requested amount of virtual functions allocated, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove "and" ?
doc/design/externally-manage-pf.md
Outdated
Create a flow in the SR-IOV operator where the user can request a configuration for a subset of virtual functions. | ||
|
||
The operator will first validate and the requested PF contains the requested amount of virtual functions allocated, | ||
Then the operator will configure the subset of virtual functions with the request driver and will update the device plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about MTU, GUID, MAC, driver unbind/bind for rdma?
can you add a list of the current operations sriov operator (and config daemon) performs on PF and VF
and then list the set of operations it will perform on a VF that belongs to externally manager PF ?
doc/design/externally-manage-pf.md
Outdated
``` | ||
|
||
#### Validation | ||
The SR-IOV operator will do a validation webhook to check if the requested `numVfs` is equal to what the user allocate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until today, did we prevent the creation of SriovNetworkNodePolicy because an sriovNetworkNodeState did not have status ?
wont this affect UX, say i have a provisioning infra where i deploy sriov operator then create its CRs
i would now need to split between the two steps, add custom wait logic to make sure i can create the CR, this is not automation friendly.
also what happens on scale-up of the cluster ? (after you already have a bunch of externally managed policies)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the case already today if you enable the webhook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the case today ? can you provide some examples (im not too familiar with the webhook flows)
81d3b31
to
dcb3119
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
Hi @adrianchiris let me know if I manage to answer all the comments. |
doc/design/externally-manage-pf.md
Outdated
## Summary | ||
|
||
Allow the SR-IOV operator to configure and allocate a subset of virtual functions from a physical function not | ||
configured by the operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...not configured by the operator": I think this summary is a bit confusing for a new reader. I think what is meant is by "...from a physical function that is configured externally from SR-IOV network operator" Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/externally-manage-pf.md
Outdated
|
||
## Summary | ||
|
||
Allow the SR-IOV operator to configure and allocate a subset of virtual functions from a physical function not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: SR-IOV network operator throughout the document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/externally-manage-pf.md
Outdated
|
||
The feature is needed to allow the operator to only configure a subset of virtual functions. | ||
This allows a third party component like nmstate, kubernetes-nmstate, NetworkManager to handle the creation | ||
and the usage of the virtual functions on the system.Some of the examples are using the virtual function as the primary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Missing "space" between "." and "Some".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/externally-manage-pf.md
Outdated
|
||
### Use Cases | ||
|
||
* As a user I want to use a virtual function for SDN network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets extend the usecases then to reflect that.
i.e for SDN the network need to be configured before k8s is deployed and these VFs should be available at system startup before pods start running.
also lets add storage use-case where you need VFs to be configured and available for the storage subsystem before k8s is deployed / pods spinning up at system startup.
doc/design/externally-manage-pf.md
Outdated
#### Webhook | ||
For the webhook we add more validations when the policy contains `ExternallyManaged: true` | ||
* `numVfs` in the policy equal is equal or lower the number of virtual functions on the system | ||
* `MTU` in the policy equals the MTU we discover on the PF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to validate link type as well IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right
doc/design/externally-manage-pf.md
Outdated
|
||
### Test Plan | ||
|
||
* Should not allow to create a policy if there are no vfs configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a policy with externallyManaged==true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done that was the name of the tests I already did on the PR :)
doc/design/externally-manage-pf.md
Outdated
This is where most of the changes for this feature are implemented. | ||
|
||
First step we will do a validation same as on the webhook to check the PF have everything we need to apply the requested | ||
policy, by checking the `numVfs` and `MTU`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linkType as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/externally-manage-pf.md
Outdated
First step we will do a validation same as on the webhook to check the PF have everything we need to apply the requested | ||
policy, by checking the `numVfs` and `MTU`. | ||
Next config-daemon will skip all the PF configuration like `numVfs` and `MTU`. he will only preform the virtual function | ||
driver allocation, administrative mac allocation and MTU. Last step as always will be to reset the device plugin so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allocation -> binding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to update this part with RDMA bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dcb3119
to
6ae4a8a
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
pfNames: ["ens3f0#5-9"] | ||
nodeSelector: | ||
node-role.kubernetes.io/worker: "" | ||
numVfs: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the goals is "Not resetting the numOfVfs for PFs that are externally managed" then should we somehow block out this parameter? We shouldn't need to provide this when we can get the number of VFs created from sysfs sriov_numvfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no we want to have this so we can still do validation.
with the externally manage the numVfs MUST be lower or equal to the number of vfs configured in the system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the numVfs must be equal since the significance of numVfs is configuring the number of VFs via sysfs sriov_numvfs. I can't see a reason which numVfs < sysfs sriov_numvfs.
Wouldn't we always want to have numVfs == sysfs sriov_numvfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can also be lower then sysfs sriov_numvfs
to allow increasing the num of vfs
|
||
### Use Cases | ||
|
||
* As a user I want to use a virtual function for SDN network, for SDN the network need to be configured before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this particular use case, it is similar to what we have for OvS Hardware Offload. Do you expect for this use case that the use sriov-network-operator in systemd would be needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it depend.
the user can select what he want.
if he want to have the SDN from day 0 (installation) then use the externallyManage if he want to configure this after the cluster is ready (and the SDN support it like for HWoffload) then use the systemd mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it maybe helpful to put a note in the document saying that systemd mode is recommended for some of the example use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done I added as a non goal for this feature
doc/design/externally-manage-pf.md
Outdated
The SR-IOV network operator will do a validation webhook to check if the requested `numVfs` is equal to what the user allocate | ||
if not it will reject the policy creation. | ||
|
||
The SR-IOV network operator will do a validation webhook to check if the requested MTU is equal to what exist on the PF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should block out any setting of the MTU when externallyManaged = true. In the code we should automatically set the MTU of the VF from the PF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we allow the user to select a different MTU for the VF that must be lower or equal the one from the PF. (because we don't change the MTU on the PF in this mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought if the VF MTU is lower than the PF MTU it can cause issues. From the text, highlighted here, it seems that the MTU on the VF must be equal to the MTU on the PF which is also my expectations.
So the user has no choice in MTU values? It is a "must" for the MTU of VF = MTU of PF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this sentence should be changed and validation webhook should check that the configured MTU is less/eq to MTU configured on PF.
also im not aware of any limitation where VF MTU < PF MTU
|
||
#### Webhook | ||
For the webhook we add more validations when the policy contains `ExternallyManaged: true` | ||
* `numVfs` in the policy equal is equal or lower the number of virtual functions on the system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a better UX if the user no longer needs to provide these values for externally managed. One less configuration that needs to be copied and pasted from another configuration (e.g. nmstate). It would be better to just blindly inherit these values. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will like to have this one because we do a lot of validation and stuff on the code with it when we have multiple policies.
we can change this in the future if we see customer complaining
doc/design/externally-manage-pf.md
Outdated
|
||
First step we will do a validation same as on the webhook to check the PF have everything we need to apply the requested | ||
policy, by checking the `numVfs`, `MTU` and `LinkType`. | ||
Next config-daemon will skip all the PF configuration like `numVfs`, `MTU` and `LinkType`. he will only preform the virtual function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SP: "preform" to "perform"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
6ae4a8a
to
a99ef33
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
doc/design/externally-manage-pf.md
Outdated
|
||
### Goals | ||
|
||
* Allow the SR-IOV network operator to handle the configure and pod allocation of a subset of virtual functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: configure -> configuration
of a subset of virtual functions
of some or all virtual functions
i suggest to add at the end:
while PF configuration are managed by an external entity.
doc/design/externally-manage-pf.md
Outdated
|
||
* Allow the SR-IOV network operator to handle the configure and pod allocation of a subset of virtual functions | ||
* Allow the user to Allocate the number of virtual functions he wants for the system and the subset he wants for pods | ||
* Not resetting the numOfVfs for PFs that are externally managed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think its a goal, but an implementation detail that emerges from the fact that some virtual functions are reserved for system use, hence the operator must not reset the number of VFs.
doc/design/externally-manage-pf.md
Outdated
|
||
## Proposal | ||
|
||
Create a sub-flow in the SR-IOV network operator where the user can request a configuration for a subset of virtual functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a subset of virtual functions
for all/subset of virtual functions
doc/design/externally-manage-pf.md
Outdated
|
||
The operator will first validate the requested PF contains the requested amount of virtual functions allocated, it | ||
will also validate the requested MTU is configured as expected on the PF. | ||
The `sriovNetworkNodeState.status.SyncStatus` field will be report a `Failed` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is not the case ...
doc/design/externally-manage-pf.md
Outdated
2. Configure the MTU on the PF | ||
3. Copy the Administrative mac address from the VFs | ||
4. Bind the right driver for the VF | ||
5. Create SR-IOV device plugin pools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- restart sriov network device plugin
device plugin configs are updated by controller regardless of daemon flow IIRC
doc/design/externally-manage-pf.md
Outdated
The SR-IOV network operator will do a validation webhook to check if the requested `numVfs` is equal to what the user allocate | ||
if not it will reject the policy creation. | ||
|
||
The SR-IOV network operator will do a validation webhook to check if the requested MTU is equal to what exist on the PF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this sentence should be changed and validation webhook should check that the configured MTU is less/eq to MTU configured on PF.
also im not aware of any limitation where VF MTU < PF MTU
doc/design/externally-manage-pf.md
Outdated
|
||
### Upgrade & Downgrade considerations | ||
|
||
The feature supports both Upgrade and Downgrade as we are introducing a new field in the API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
downgrade will cause the operator to treat an externally managed PF as non externally managed and actually configure PF, this may cause conflicts in the system.
imo the only concern here is documenting this.
doc/design/externally-manage-pf.md
Outdated
Last step as always will be to reset the device plugin so | ||
kubelet will be able to discover the SR-IOV devices. | ||
|
||
The config-daemon will also save on the node a cache of the last applied policy. this is needed to be able and understand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cache -> cache (file)
doc/design/externally-manage-pf.md
Outdated
kubelet will be able to discover the SR-IOV devices. | ||
|
||
The config-daemon will also save on the node a cache of the last applied policy. this is needed to be able and understand | ||
if we need to reset the PF configuration(`ExternallyManaged` was false) or not when o policy is removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove "o"
a99ef33
to
a3a9b16
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
Hi @adrianchiris please give it another look |
doc/design/externally-manage-pf.md
Outdated
Last step as always will be to reset the device plugin so | ||
kubelet will be able to discover the SR-IOV devices. | ||
|
||
The config-daemon will also save on the node a cache (file) of the last applied policy. this is needed to be able and understand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of such behaviour but this could be usefull for other features like systemd mode, drain controller, so it would be good implement this in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if possible, I would like to leave as part of this PR as it's already integrated.
|
||
### Test Plan | ||
|
||
* Should not allow to create a policy with externallyManaged true if there are no vfs configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to support a case when VFs will be configured later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do it. BUT the user will need to disable the webhook because that is not the right way to do it.
the network must be ready before kubelet start of not then just let the operator create the vfs.
or use systemd mode
a3a9b16
to
4d1cf3d
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM, just a few minor comments
once addressed consider having an LGTM from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments
Signed-off-by: Sebastian Sch <[email protected]>
4d1cf3d
to
2307c9f
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.