Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix adding flex counter to wrong context #1421

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions lib/RedisRemoteSaiInterface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2049,6 +2049,21 @@ sai_switch_notifications_t RedisRemoteSaiInterface::syncProcessNotification(
return { };
}

bool RedisRemoteSaiInterface::containsSwitch(
_In_ sai_object_id_t switchId) const
{
SWSS_LOG_ENTER();

if (!m_switchContainer->contains(switchId))
{
SWSS_LOG_WARN("context %s failed to find switch %s",
m_contextConfig->m_name.c_str(), sai_serialize_object_id(switchId).c_str());
return false;
}

return true;
kcudnik marked this conversation as resolved.
Show resolved Hide resolved
}

const std::map<sai_object_id_t, swss::TableDump>& RedisRemoteSaiInterface::getTableDump() const
{
SWSS_LOG_ENTER();
Expand Down
3 changes: 3 additions & 0 deletions lib/RedisRemoteSaiInterface.h
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,9 @@ namespace sairedis

const std::map<sai_object_id_t, swss::TableDump>& getTableDump() const;

bool containsSwitch(
_In_ sai_object_id_t switchId) const;

private: // QUAD API helpers

sai_status_t create(
Expand Down
10 changes: 10 additions & 0 deletions lib/Sai.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,16 @@ sai_status_t Sai::set(

for (auto& kvp: m_contextMap)
{
if (objectType == SAI_OBJECT_TYPE_SWITCH && objectId != SAI_NULL_OBJECT_ID &&
(attr->id == SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER ||
attr->id == SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP))
{
if (!kvp.second->m_redisSai->containsSwitch(objectId))
{
continue;
}
}

Comment on lines +234 to +243
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not guarantee that the context is actually gbsyncd or syncd, no what basis this decision is made here ? from my perspective therre should be some other attribute passed to indicated which syncd it is or based on switch created attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When setting attribute SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER or SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP using sai_switch_api, the switch Id is set to gSwitchId or gearbox_oid, e.g., https://github.com/sonic-net/sonic-swss/blob/master/orchagent/saihelper.cpp#L878.

Before my change, code around here was looping through all the context using "for (auto& kvp: m_contextMap)" and setting the attributes to all contexts. My change is to check if the switch id passed through could match the context. Assumption is that gearbox_oid is only available in the gbsyncd context, switch chip oid is only available in the syncd context, which are always the case from my understanding.

There is some description in PR #1362:

Gearbox flex counter database
Pass the Phy OID, an OID of a SAI switch object in syntax, when calling the SAI set API to set the extended attributes. By doing so, the SAI redis objects can choose in which context the SAI API call should be invoked and the corresponding gearbox syncd docker container will handle it.

From my understanding, my PR here is to implement "choose in which context the SAI API call should be invoke based on OID of a SAI switch object".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your assumption is not enough, since you are assuming those 2 attributes will be passed only on grbsyncd OID, but if you pass other OID for other switch it will also execute it, sairedis should know internally whether those attributes belong to grsyncd, maybe based on create_switch

anyway, if you know outside the sairedis whether those attrbiutes should be passed or not, just don't pass them for non grbsyncd, and let that api loop throu all oids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about the comment. Let me explain the original issue without this change. Could you please take a look and give some idea on where to change to fix the issue if you still think the current location is not appropriate?

There are two contexts: (A) is syncd for the ASIC switch chip, and (B) is gbsyncd for multiple gearbox chips.

When the system is enabling counter for switch chip of context (A), it executes

sai_switch_api->set_switch_attribute(gSwitchId, &attr);

in https://github.com/sonic-net/sonic-swss/blob/master/orchagent/saihelper.cpp#L878
with SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP in the attr.

Then because of

for (auto& kvp: m_contextMap)

in https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L232,
it loops through both context (A) and context (B) and runs the set operation on both. The parameters in the original set_switch_attribute is for ASIC switch chip, thus only relevant to context (A). As it also runs on context (B) we saw the errors like below in syslog that we want to get rid of (the port VID is actually in context (A))

ERR gbsyncd#syncd: :- processFlexCounterEvent: port VID oid:0x1000000000002, was not found (probably port was removed/splitted) and will remove from counters now

Then I proposed that instead of running on both, probably it should only run on context (A), and that is through checking the objectId from set_switch_attribute belongs to which context. Also the TODO line in https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L227-L228 seems also implying we should run set on only one context under certain situation.

Above example is for enabling counter on ASIC switch chip of context (A). Meanwhile, when system is enabling counters on gearbox chips of context (B), we found similar error from syncd complaining the port VID is not found in syncd (actually it belongs to gbsyncd).

ERR syncd#syncd: :- processFlexCounterEvent: port VID oid:0x1010000000001, was not found (probably port was removed/splitted) and will remove from counters now

For you comments

anyway, if you know outside the sairedis whether those attributes should be passed or not, just don't pass them for non gbsyncd, and let that api loop through all oids

it seems hard for "don't pass them for non gbsyncd" with the existing code, as that executing on all contexts is done by sairedis: https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L232. Note the loop is only for redis attribute: https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L204

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i get the problem fully now, and easiest solution for that is this: https://github.com/sonic-net/sonic-sairedis/blob/master/lib/Sai.cpp#L227, apply only for specific context based on switch ID, instead of hard codding specific attributes, and apply to all contexts if switchid is nullobjectid

sai_status_t status = kvp.second->m_redisSai->set(objectType, objectId, attr);

success &= (status == SAI_STATUS_SUCCESS);
Expand Down
16 changes: 10 additions & 6 deletions syncd/tests/TestSyncdMlnx.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -284,10 +284,14 @@ TEST_F(SyncdMlnxTest, portBulkAddRemove)
attr.id = SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP;
attr.value.ptr = (void*)&flexCounterGroupParam;

status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, SAI_NULL_OBJECT_ID, &attr);
// Failed to create if the switchId is invalid for the context
status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, m_switchId + 1, &attr);
std::vector<swss::FieldValueTuple> fvVector, fvVectorExpected;
ASSERT_FALSE(m_flexCounterGroupTable->get("PORT_STAT_COUNTER", fvVector));

status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, m_switchId, &attr);
ASSERT_EQ(status, SAI_STATUS_SUCCESS);

std::vector<swss::FieldValueTuple> fvVector, fvVectorExpected;
ASSERT_TRUE(m_flexCounterGroupTable->get("PORT_STAT_COUNTER", fvVector));
fvVectorExpected.emplace_back(POLL_INTERVAL_FIELD, poll_interval);
fvVectorExpected.emplace_back(STATS_MODE_FIELD, stats_mode);
Expand Down Expand Up @@ -315,15 +319,15 @@ TEST_F(SyncdMlnxTest, portBulkAddRemove)
attr.id = SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER;
attr.value.ptr = (void*)&flexCounterParam;

status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, SAI_NULL_OBJECT_ID, &attr);
status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, m_switchId, &attr);
ASSERT_EQ(status, SAI_STATUS_FAILURE);
ASSERT_FALSE(m_flexCounterTable->get(key, fvVector));

// Try with a good key
key = "PORT_STAT_COUNTER:" + sai_serialize_object_id(oidList[0]);
flexCounterParam.counter_key.list = (int8_t*)const_cast<char *>(key.c_str());
flexCounterParam.counter_key.count = (uint32_t)key.length();
status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, SAI_NULL_OBJECT_ID, &attr);
status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, m_switchId, &attr);
ASSERT_EQ(status, SAI_STATUS_SUCCESS);

ASSERT_TRUE(m_flexCounterTable->get(key, fvVector));
Expand All @@ -336,7 +340,7 @@ TEST_F(SyncdMlnxTest, portBulkAddRemove)
flexCounterParam.counter_field_name.count = 0;

// 3. Stop counter polling for the port
status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, SAI_NULL_OBJECT_ID, &attr);
status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, m_switchId, &attr);
ASSERT_EQ(status, SAI_STATUS_SUCCESS);
ASSERT_FALSE(m_flexCounterTable->get(key, fvVector));

Expand All @@ -351,7 +355,7 @@ TEST_F(SyncdMlnxTest, portBulkAddRemove)
attr.id = SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUP;
attr.value.ptr = (void*)&flexCounterGroupParam;

status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, SAI_NULL_OBJECT_ID, &attr);
status = m_sairedis->set(SAI_OBJECT_TYPE_SWITCH, m_switchId, &attr);
ASSERT_EQ(status, SAI_STATUS_SUCCESS);
ASSERT_FALSE(m_flexCounterTable->get(key, fvVector));

Expand Down
Loading