Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[21293] Fix destruction data-race on participant removal in intra-process #5034

Merged
merged 14 commits into from
Oct 29, 2024

Conversation

Mario-DL
Copy link
Member

@Mario-DL Mario-DL commented Jul 7, 2024

Description

This PR addresses a race issue happening in stressed intraprocess scenarios when EDP's writer intends to use the remote local reader pointer of an already removed participant. This happens because the participant hasn't received the other's one disposal yet (as it goes through transport).

Some ci flaky tests have already been identified to be related with this issue.

The proposed solution introduces a new state in the Readers LocalReaderViewStatus in which the reader will notify that it is inactive as soon as it is destroyed and noone is using it.
On the other side, the remote local writers using pointers to it, now holds a LocalReaderPointer which wraps the raw reader's pointer plus the view. An internal counter now accounts for the number of references.

Thanks @MiguelCompany for helping with the final's solution design.

Note: the test may be launched with --restest-until-fail 20 or so, in order to reproduce the issue. For a more frequent failure, review can launch the colcon test with the taskset -c 0,1 prefix to make the test to stress more and make it fail more frequently.

@Mergifyio backport 3.1.x 3.0.x 2.14.x 2.10.x

Contributor Checklist

  • Commit messages follow the project guidelines.
  • The code follows the style guidelines of this project.
  • Tests that thoroughly check the new feature have been added/Regression tests checking the bug and its fix have been added; the added tests pass locally
  • Any new/modified methods have been properly documented using Doxygen.
  • N/A Any new configuration API has an equivalent XML API (with the corresponding XSD extension)
  • Changes are backport compatible: they do NOT break ABI nor change library core behavior.
  • Changes are API compatible.
  • N/A New feature has been added to the versions.md file (if applicable).
  • N/A New feature has been documented/Current behavior is correctly described in the documentation.
  • Applicable backports have been included in the description.

Reviewer Checklist

  • The PR has a milestone assigned.
  • The title and description correctly express the PR's purpose.
  • Check contributor checklist is correct.
  • If this is a critical bug fix, backports to the critical-only supported branches have been requested.
  • Check CI results: changes do not issue any warning.
  • Check CI results: failing tests are unrelated with the changes.

@Mario-DL Mario-DL added this to the v3.0.0 milestone Jul 7, 2024
@EduPonz EduPonz modified the milestones: v3.0.0, v3.0.1 Jul 19, 2024
@Mario-DL Mario-DL modified the milestones: v3.0.1, v3.0.2 Sep 4, 2024
@MiguelCompany MiguelCompany modified the milestones: v3.0.2, v3.1.0 Oct 2, 2024
@rsanchez15 rsanchez15 modified the milestones: v3.1.0, v3.1.1 Oct 3, 2024
@Mario-DL Mario-DL marked this pull request as ready for review October 3, 2024 11:06
@Mario-DL Mario-DL requested review from richiprosima and removed request for richiprosima October 3, 2024 11:07
@github-actions github-actions bot added the ci-pending PR which CI is running label Oct 3, 2024
@Mario-DL Mario-DL requested review from richiprosima and removed request for richiprosima October 3, 2024 11:10
@Mario-DL Mario-DL requested review from richiprosima and removed request for richiprosima October 3, 2024 20:08
@Mario-DL Mario-DL requested review from richiprosima and removed request for richiprosima October 3, 2024 21:40
@MiguelCompany MiguelCompany self-requested a review October 11, 2024 09:25
@Mario-DL Mario-DL requested review from MiguelCompany and removed request for MiguelCompany October 14, 2024 16:41
…calReaderPointer> and properly calls local_actions_on_reader_removed()

Signed-off-by: Mario Dominguez <[email protected]>
…when accessing local reader

Signed-off-by: Mario Dominguez <[email protected]>
Signed-off-by: Mario Dominguez <[email protected]>
Signed-off-by: Mario Dominguez <[email protected]>
Signed-off-by: Mario Dominguez <[email protected]>
Signed-off-by: Mario Dominguez <[email protected]>
Copy link
Member

@MiguelCompany MiguelCompany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with green CI 🏅

@MiguelCompany MiguelCompany added ready-to-merge Ready to be merged. CI and changes have been reviewed and approved. and removed ci-pending PR which CI is running labels Oct 29, 2024
@MiguelCompany MiguelCompany merged commit 456e45f into master Oct 29, 2024
17 checks passed
@MiguelCompany MiguelCompany deleted the hotfix/21293 branch October 29, 2024 13:30
@MiguelCompany
Copy link
Member

@Mergifyio backport 3.1.x 3.0.x 2.14.x

Copy link
Contributor

mergify bot commented Oct 29, 2024

backport 3.1.x 3.0.x 2.14.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Oct 29, 2024
)

* Refs #21293: Add BB test

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Reinforce test to fail more frequently

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Add RefCountedPointer.hpp to utils

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Add unittests for RefCountedPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: LocalReaderPointer.hpp

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: BaseReader aggregates LocalReaderPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: ReaderLocator aggregates LocalReaderPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: RTPSDomainImpl::find_local_reader returns a sared_ptr<LocalReaderPointer> and properly calls local_actions_on_reader_removed()

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: RTPSWriters properly using LocalReaderPointer::Instance when accessing local reader

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Linter

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Fix windows warnings

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Address Miguel's review

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Apply last comment

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: NIT

Signed-off-by: Mario Dominguez <[email protected]>

---------

Signed-off-by: Mario Dominguez <[email protected]>
(cherry picked from commit 456e45f)
mergify bot pushed a commit that referenced this pull request Oct 29, 2024
)

* Refs #21293: Add BB test

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Reinforce test to fail more frequently

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Add RefCountedPointer.hpp to utils

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Add unittests for RefCountedPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: LocalReaderPointer.hpp

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: BaseReader aggregates LocalReaderPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: ReaderLocator aggregates LocalReaderPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: RTPSDomainImpl::find_local_reader returns a sared_ptr<LocalReaderPointer> and properly calls local_actions_on_reader_removed()

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: RTPSWriters properly using LocalReaderPointer::Instance when accessing local reader

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Linter

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Fix windows warnings

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Address Miguel's review

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Apply last comment

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: NIT

Signed-off-by: Mario Dominguez <[email protected]>

---------

Signed-off-by: Mario Dominguez <[email protected]>
(cherry picked from commit 456e45f)
mergify bot pushed a commit that referenced this pull request Oct 29, 2024
)

* Refs #21293: Add BB test

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Reinforce test to fail more frequently

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Add RefCountedPointer.hpp to utils

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Add unittests for RefCountedPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: LocalReaderPointer.hpp

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: BaseReader aggregates LocalReaderPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: ReaderLocator aggregates LocalReaderPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: RTPSDomainImpl::find_local_reader returns a sared_ptr<LocalReaderPointer> and properly calls local_actions_on_reader_removed()

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: RTPSWriters properly using LocalReaderPointer::Instance when accessing local reader

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Linter

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Fix windows warnings

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Address Miguel's review

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Apply last comment

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: NIT

Signed-off-by: Mario Dominguez <[email protected]>

---------

Signed-off-by: Mario Dominguez <[email protected]>
(cherry picked from commit 456e45f)

# Conflicts:
#	include/fastdds/rtps/writer/ReaderLocator.h
#	include/fastdds/rtps/writer/ReaderProxy.h
#	src/cpp/rtps/RTPSDomain.cpp
#	src/cpp/rtps/RTPSDomainImpl.hpp
#	src/cpp/rtps/participant/RTPSParticipantImpl.cpp
#	src/cpp/rtps/participant/RTPSParticipantImpl.h
#	src/cpp/rtps/reader/BaseReader.cpp
#	src/cpp/rtps/reader/BaseReader.hpp
#	src/cpp/rtps/writer/ReaderLocator.cpp
#	src/cpp/rtps/writer/StatefulWriter.cpp
#	src/cpp/rtps/writer/StatelessWriter.cpp
#	test/blackbox/common/DDSBlackboxTestsBasic.cpp
#	test/mock/rtps/ReaderLocator/fastdds/rtps/writer/ReaderLocator.h
#	test/unittest/utils/CMakeLists.txt
MiguelCompany pushed a commit that referenced this pull request Oct 30, 2024
) (#5366)

* Refs #21293: Add BB test

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Reinforce test to fail more frequently

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Add RefCountedPointer.hpp to utils

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Add unittests for RefCountedPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: LocalReaderPointer.hpp

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: BaseReader aggregates LocalReaderPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: ReaderLocator aggregates LocalReaderPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: RTPSDomainImpl::find_local_reader returns a sared_ptr<LocalReaderPointer> and properly calls local_actions_on_reader_removed()

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: RTPSWriters properly using LocalReaderPointer::Instance when accessing local reader

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Linter

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Fix windows warnings

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Address Miguel's review

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Apply last comment

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: NIT

Signed-off-by: Mario Dominguez <[email protected]>

---------

Signed-off-by: Mario Dominguez <[email protected]>
(cherry picked from commit 456e45f)

Co-authored-by: Mario Domínguez López <[email protected]>
MiguelCompany pushed a commit that referenced this pull request Oct 30, 2024
) (#5365)

* Refs #21293: Add BB test

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Reinforce test to fail more frequently

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Add RefCountedPointer.hpp to utils

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Add unittests for RefCountedPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: LocalReaderPointer.hpp

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: BaseReader aggregates LocalReaderPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: ReaderLocator aggregates LocalReaderPointer

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: RTPSDomainImpl::find_local_reader returns a sared_ptr<LocalReaderPointer> and properly calls local_actions_on_reader_removed()

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: RTPSWriters properly using LocalReaderPointer::Instance when accessing local reader

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Linter

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Fix windows warnings

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Address Miguel's review

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: Apply last comment

Signed-off-by: Mario Dominguez <[email protected]>

* Refs #21293: NIT

Signed-off-by: Mario Dominguez <[email protected]>

---------

Signed-off-by: Mario Dominguez <[email protected]>
(cherry picked from commit 456e45f)

Co-authored-by: Mario Domínguez López <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge Ready to be merged. CI and changes have been reviewed and approved.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants