Skip to content

Commit

Permalink
merge to stable-23-3 (#265)
Browse files Browse the repository at this point in the history
* Count broken partitions instead of broken disks for placement groups

[NBS] Сенсор PlacementGroupsWithRecentlyBrokenTwoOrMoreDisks неправильно работает с partition placement-группами. Он загорается, если ломается 2 и более дисков, но это могут быть диски в 1 партишне, что не страшно.
Данный PR меняет логику для partition-placement групп - алерт будет загораться, если сломалось 2 и более партишнов. Для spread-placement групп каждый диск считается за партишн и для них логика не изменилась.

* NBSNEBIUS-86: impld garbage FreshDeviceIds filtration in TVolumeState (#206)

* NBSNEBIUS-86: not saying that replication is in progress on volume monpage if volume config contains only garbage FreshDeviceIds

* NBSNEBIUS-86: moved FreshDeviceIds filtration to TVolumeState::Reset and removed it from the monpage code and removed the corresponding check from TMirrorPartitionState

* NBSNEBIUS-86: rendering raw volume config (pretty-printed proto) on volume monpage (#220)

* NBSNEBIUS-86: rendering raw volume config (pretty-printed proto) on volume monpage

* NBSNEBIUS-86: rendering raw volume config (pretty-printed proto) on volume monpage

* NBSNEBIUS-86: TDiskConfigs in DiskRegistry now store event history, history is displayed on DiskRegistry monpage, currently only ReplaceDevice events are added to history - will add other stuff in the next PRs (#233)

* NBSNEBIUS-86: writing migration and replication-related events to disk history (#252)

* NBSNEBIUS-86: fixed stupid bug: FilteredFreshDeviceIds filtration should not drop all FreshDeviceIds for volumes created after 2023-08-30 (#250)

* NBSNEBIUS-86: fixed stupid bug: FilteredFreshDeviceIds filtration should not drop all FreshDeviceIds for volumes created after 2023-08-30

* NBSNEBIUS-86: moved FreshDeviceIds filtration to a separate func

---------

Co-authored-by: dvrazumov <[email protected]>
  • Loading branch information
qkrorlqr and dvrazumov authored Jan 26, 2024
1 parent d71b1a7 commit 6e384b5
Show file tree
Hide file tree
Showing 19 changed files with 720 additions and 142 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ void TDiskRegistryActor::ExecuteMarkReplacementDevice(
TDiskRegistryDatabase db(tx.DB);

args.Error = State->MarkReplacementDevice(
ctx.Now(),
db,
args.DiskId,
args.DeviceId,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -710,6 +710,33 @@ void TDiskRegistryActor::RenderDiskHtmlInfo(
}

GenerateVolumeActionsJS(out);

TAG(TH3) {
out << "History";
}

TABLE_SORTABLE_CLASS("table table-bordered") {
TABLEHEAD() {
TABLER() {
TABLEH() { out << "Timestamp"; }
TABLEH() { out << "Message"; }
}

for (const auto& hi: info.History) {
TABLER() {
TABLED() {
out << TInstant::MicroSeconds(hi.GetTimestamp())
<< " (" << hi.GetTimestamp() << ")";
}
TABLED() {
PRE() {
out << hi.GetMessage();
}
}
}
}
}
}
}
}

Expand Down Expand Up @@ -1424,11 +1451,11 @@ void TDiskRegistryActor::RenderPlacementGroupList(
auto it = brokenGroups.find(x.first);

if (it != brokenGroups.end()) {
if (it->second.RecentlyBrokenDiskCount == 1) {
if (it->second.Recently.GetBrokenPartitionsCount() == 1) {
out << "<font color=yellow>&#9632;&nbsp;</font>";
}

if (it->second.RecentlyBrokenDiskCount > 1) {
if (it->second.Recently.GetBrokenPartitionsCount() > 1) {
out << "<font color=red>&#9632;&nbsp;</font>";
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,14 @@ void TDiskRegistrySelfCounters::Init(
counters->GetCounter("PlacementGroupsWithBrokenSingleDisk");
PlacementGroupsWithBrokenTwoOrMoreDisks =
counters->GetCounter("PlacementGroupsWithBrokenTwoOrMoreDisks");
PlacementGroupsWithRecentlyBrokenSinglePartition =
counters->GetCounter("PlacementGroupsWithRecentlyBrokenSinglePartition");
PlacementGroupsWithRecentlyBrokenTwoOrMorePartitions =
counters->GetCounter("PlacementGroupsWithRecentlyBrokenTwoOrMorePartitions");
PlacementGroupsWithBrokenSinglePartition =
counters->GetCounter("PlacementGroupsWithBrokenSinglePartition");
PlacementGroupsWithBrokenTwoOrMorePartitions =
counters->GetCounter("PlacementGroupsWithBrokenTwoOrMorePartitions");
MeanTimeBetweenFailures =
counters->GetCounter("MeanTimeBetweenFailures");
AutomaticallyReplacedDevices =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,17 @@ struct TDiskRegistrySelfCounters
TCounterPtr Mirror3DisksMinus1;
TCounterPtr Mirror3DisksMinus2;
TCounterPtr Mirror3DisksMinus3;
// TODO(dvrazumov): "*Disk*" counters are replaced with "*Partitions" counters.
// They are left for compatibility and should be removed later (NBSNEBIUS-26)
TCounterPtr PlacementGroupsWithRecentlyBrokenSingleDisk;
TCounterPtr PlacementGroupsWithRecentlyBrokenTwoOrMoreDisks;
TCounterPtr PlacementGroupsWithBrokenSingleDisk;
TCounterPtr PlacementGroupsWithBrokenTwoOrMoreDisks;
// remove above ^^^
TCounterPtr PlacementGroupsWithRecentlyBrokenSinglePartition;
TCounterPtr PlacementGroupsWithRecentlyBrokenTwoOrMorePartitions;
TCounterPtr PlacementGroupsWithBrokenSinglePartition;
TCounterPtr PlacementGroupsWithBrokenTwoOrMorePartitions;
TCounterPtr MeanTimeBetweenFailures;
TCounterPtr AutomaticallyReplacedDevices;

Expand Down
Loading

0 comments on commit 6e384b5

Please sign in to comment.