Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NBS] Allow broken devices to unstuck from the broken state #2125

Open
komarevtsev-d opened this issue Sep 25, 2024 · 2 comments
Open

[NBS] Allow broken devices to unstuck from the broken state #2125

komarevtsev-d opened this issue Sep 25, 2024 · 2 comments
Assignees
Labels
blockstore Add this label to run only cloud/blockstore build and tests on PR

Comments

@komarevtsev-d
Copy link
Collaborator

There is a problem now, that once a device becomes broken (from the perspective of a nonreplicated partition), NBS will never retry requests, therefore a remount or a restart is needed to fix it.

Another oddity is that there is a time interval when the disk is already broken, but we are not giving the error to the user yet. We can probably either remove it completely or allow to send requests to agents in this state.

@komarevtsev-d komarevtsev-d added the blockstore Add this label to run only cloud/blockstore build and tests on PR label Sep 25, 2024
@komarevtsev-d komarevtsev-d self-assigned this Sep 25, 2024
@komarevtsev-d komarevtsev-d changed the title [NBS] Allow broken to device to become good [NBS] Allow broken devices to unstuck from the broken state Sep 25, 2024
@drbasic
Copy link
Collaborator

drbasic commented Sep 25, 2024

делаем так:

  1. пока не вышел таймаут MaxTimedOutDeviceStateDuration на все запросы отвечаем E_TIMEOUT
  2. дальше продолжаем делать запросы в дискагента, но отвечаем E_IO_SILENT
  3. еще чрез 5 минут начинаем отвечать E_IO, но продолжаем делать запросы в дискагента
  4. если дискагент очнулся - отвечаем на запросы клиента.

т.е. если ФС поломалась и ей сделали fsck то оно все починится без вмешательства дежурного.
Для mirror дисков тоже все начнет чиниться само

@drbasic
Copy link
Collaborator

drbasic commented Sep 26, 2024

связанный тикет #720

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blockstore Add this label to run only cloud/blockstore build and tests on PR
Projects
None yet
Development

No branches or pull requests

2 participants