Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix adding flex counter to wrong context #1421

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

byu343
Copy link
Contributor

@byu343 byu343 commented Sep 18, 2024

The counters for syncd (switch chip) were attempted to be added to gbsyncd (gearbox phys), and vice versa.
This issue is introduced by #1362 When setting the redis attribute
SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and
SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER, the operation is applied to every contexts (both syncd and gbsyncd). However, the counters to initialize could only exist in one context.

The fix is to check that the target switch id exists in the context; if not, skip the operation.

Before the fix, we see the error below in syslog
ERR gbsyncd#syncd: :- processFlexCounterEvent: port VID oid:0x1000000000002, was not found (probably port was removed/splitted) and will remove from counters now

After the fix, on warning like below is printed:
WARNING swss#orchagent: :- containsSwitch: context phys failed to find switch oid:0x21000000000000

The counters for syncd (switch chip) were attempted to be added to
gbsyncd (gearbox phys), and vice versa.
This issue is introduced by
sonic-net#1362
When setting the redis attribute
SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and
SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER, the operation is applied to every
contexts (both syncd and gbsyncd). However, the counters to initialize
could only exist in one context.

The fix is to check that the target switch id exists in the context; if
not, skip the operation.
Copy link
Collaborator

@kcudnik kcudnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meet code cover requirements

{
SWSS_LOG_ENTER();

if (!m_switchContainer->contains(switchId)){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ on new line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

lib/RedisRemoteSaiInterface.cpp Show resolved Hide resolved
Comment on lines +234 to +243
if (objectType == SAI_OBJECT_TYPE_SWITCH && objectId != SAI_NULL_OBJECT_ID &&
(attr->id == SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER ||
attr->id == SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP))
{
if (!kvp.second->m_redisSai->containsSwitch(objectId))
{
continue;
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not guarantee that the context is actually gbsyncd or syncd, no what basis this decision is made here ? from my perspective therre should be some other attribute passed to indicated which syncd it is or based on switch created attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When setting attribute SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER or SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP using sai_switch_api, the switch Id is set to gSwitchId or gearbox_oid, e.g., https://github.com/sonic-net/sonic-swss/blob/master/orchagent/saihelper.cpp#L878.

Before my change, code around here was looping through all the context using "for (auto& kvp: m_contextMap)" and setting the attributes to all contexts. My change is to check if the switch id passed through could match the context. Assumption is that gearbox_oid is only available in the gbsyncd context, switch chip oid is only available in the syncd context, which are always the case from my understanding.

There is some description in PR #1362:

Gearbox flex counter database
Pass the Phy OID, an OID of a SAI switch object in syntax, when calling the SAI set API to set the extended attributes. By doing so, the SAI redis objects can choose in which context the SAI API call should be invoked and the corresponding gearbox syncd docker container will handle it.

From my understanding, my PR here is to implement "choose in which context the SAI API call should be invoke based on OID of a SAI switch object".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your assumption is not enough, since you are assuming those 2 attributes will be passed only on grbsyncd OID, but if you pass other OID for other switch it will also execute it, sairedis should know internally whether those attributes belong to grsyncd, maybe based on create_switch

anyway, if you know outside the sairedis whether those attrbiutes should be passed or not, just don't pass them for non grbsyncd, and let that api loop throu all oids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about the comment. Let me explain the original issue without this change. Could you please take a look and give some idea on where to change to fix the issue if you still think the current location is not appropriate?

There are two contexts: (A) is syncd for the ASIC switch chip, and (B) is gbsyncd for multiple gearbox chips.

When the system is enabling counter for switch chip of context (A), it executes

sai_switch_api->set_switch_attribute(gSwitchId, &attr);

in https://github.com/sonic-net/sonic-swss/blob/master/orchagent/saihelper.cpp#L878
with SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP in the attr.

Then because of

for (auto& kvp: m_contextMap)

in https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L232,
it loops through both context (A) and context (B) and runs the set operation on both. The parameters in the original set_switch_attribute is for ASIC switch chip, thus only relevant to context (A). As it also runs on context (B) we saw the errors like below in syslog that we want to get rid of (the port VID is actually in context (A))

ERR gbsyncd#syncd: :- processFlexCounterEvent: port VID oid:0x1000000000002, was not found (probably port was removed/splitted) and will remove from counters now

Then I proposed that instead of running on both, probably it should only run on context (A), and that is through checking the objectId from set_switch_attribute belongs to which context. Also the TODO line in https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L227-L228 seems also implying we should run set on only one context under certain situation.

Above example is for enabling counter on ASIC switch chip of context (A). Meanwhile, when system is enabling counters on gearbox chips of context (B), we found similar error from syncd complaining the port VID is not found in syncd (actually it belongs to gbsyncd).

ERR syncd#syncd: :- processFlexCounterEvent: port VID oid:0x1010000000001, was not found (probably port was removed/splitted) and will remove from counters now

For you comments

anyway, if you know outside the sairedis whether those attributes should be passed or not, just don't pass them for non gbsyncd, and let that api loop through all oids

it seems hard for "don't pass them for non gbsyncd" with the existing code, as that executing on all contexts is done by sairedis: https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L232. Note the loop is only for redis attribute: https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L204

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i get the problem fully now, and easiest solution for that is this: https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L227, apply only for specific context based on switch ID, instead of hard codding specific attributes, and apply to all contexts if switchid is nullobjectid

@byu343 byu343 force-pushed the flex-counter-context branch 2 times, most recently from 8941ffc to 0266f34 Compare October 3, 2024 05:33
lib/Sai.cpp Outdated
sai_status_t status = kvp.second->m_redisSai->set(objectType, objectId, attr);

success &= (status == SAI_STATUS_SUCCESS);
executed = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this flag is redundant, your condition on line 243 is exactly the same as in 236, please remove this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Normally, setting SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP and
SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER should provide a valid switch id
and the setting is only applied to the context with the switch id.
Fall back to loop all contexts if SAI_NULL_OBJECT_ID is provided.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants