Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add searchbp metrics to Performance Analyzer #5390

Merged
merged 5 commits into from
Feb 13, 2024
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
184 changes: 165 additions & 19 deletions _monitoring-your-cluster/pa/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -743,27 +743,173 @@
</tbody>
</table>

## Relevant dimensions: `NodeID`, `searchbp_mode`

<table>
<thead style="text-align: left">
<tr>
<th>Metric</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>SearchBP_Shard_Stats_CancellationCount
</td>
<td>The number of tasks marked for cancellation at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_LimitReachedCount
</td>
<td>The number of times when the cancellable task total exceeded the set cancellation threshold at the shard task level.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_Heap_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive heap usage since the node last restarted at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_Heap_Usage_CurrentMax
</td>
<td>The maximum heap usage for tasks currently running at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_Heap_Usage_RollingAvg
</td>
<td> The rolling average heap usage for the _n_ most recent tasks at the shard task level. The default value for _n_ is 100.

Check failure on line 783 in _monitoring-your-cluster/pa/reference.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _monitoring-your-cluster/pa/reference.md#L783

[OpenSearch.Spelling] Error: _n_. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: _n_. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_monitoring-your-cluster/pa/reference.md", "range": {"start": {"line": 783, "column": 50}}}, "severity": "ERROR"}

Check failure on line 783 in _monitoring-your-cluster/pa/reference.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _monitoring-your-cluster/pa/reference.md#L783

[OpenSearch.Spelling] Error: _n_. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: _n_. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_monitoring-your-cluster/pa/reference.md", "range": {"start": {"line": 783, "column": 119}}}, "severity": "ERROR"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_CPU_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive CPU usage since the node last restarted at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_CPU_Usage_CurrentMax
</td>
<td>The maximum CPU time for all tasks currently running on the node at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_CPU_Usage_CurrentAvg
</td>
<td>The average CPU time for all tasks currently running on the node at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_ElaspedTime_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive time elapsed since the node last restarted at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_ElaspedTime_Usage_CurrentMax
</td>
<td>The maximum time elapsed for all tasks currently running on the node at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_ElaspedTime_Usage_CurrentAvg
</td>
<td>The average time elapsed for all tasks currently running on the node at the shard task level.
</td>
</tr>
<tr>
<td>Searchbp_Task_Stats_CancellationCount
</td>
<td>The number of tasks marked for cancellation at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_LimitReachedCount
</td>
<td>The number of times when the cancellable task total exceeded the set cancellation threshold at the search task level.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_Heap_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive heap usage since the node last restarted at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_Heap_Usage_CurrentMax
</td>
<td>The maximum heap usage for tasks currently running at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_Heap_Usage_RollingAvg
</td>
<td> The rolling average heap usage for the _n_ most recent tasks on the search task level. The default value for _n_ is 10.

Check failure on line 849 in _monitoring-your-cluster/pa/reference.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _monitoring-your-cluster/pa/reference.md#L849

[OpenSearch.Spelling] Error: _n_. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: _n_. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_monitoring-your-cluster/pa/reference.md", "range": {"start": {"line": 849, "column": 50}}}, "severity": "ERROR"}

Check failure on line 849 in _monitoring-your-cluster/pa/reference.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _monitoring-your-cluster/pa/reference.md#L849

[OpenSearch.Spelling] Error: _n_. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: _n_. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_monitoring-your-cluster/pa/reference.md", "range": {"start": {"line": 849, "column": 120}}}, "severity": "ERROR"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_CPU_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive CPU usage since the node last restarted at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_CPU_Usage_CurrentMax
</td>
<td>The maximum CPU time for all tasks currently running on the node at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_CPU_Usage_CurrentAvg
</td>
<td>The average CPU time for all tasks currently running on the node at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_ElaspedTime_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive time elapsed since the node last restarted at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_ElaspedTime_Usage_CurrentMax
</td>
<td>The maximum time elapsed for all tasks currently running on the node at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_ElaspedTime_Usage_CurrentAvg
</td>
<td>The average time elapsed for all tasks currently running on the node at the search task level.
</td>
</tr>
</tbody>
</table>


## Dimensions reference

| Dimension | Return values |
|----------------------|-------------------------------------------------|
| ShardID | The ID of the shard, for example, `1`. |
| IndexName | The name of the index, for example, `my-index`. |
| Operation | The type of operation, for example, `shardbulk`. |
| ShardRole | The shard role, for example, `primary` or `replica`. |
| Exception | OpenSearch exceptions, for example, `org.opensearch.index_not_found_exception`. |
| Indices | The list of indexes in the request URL. |
| HTTPRespCode | The response code from OpenSearch, for example, `200`. |
| MemType | The memory type, for example, `totYoungGC`, `totFullGC`, `Survivor`, `PermGen`, `OldGen`, `Eden`, `NonHeap`, or `Heap`. |
| DiskName | The name of the disk, for example, `sda1`. |
| DestAddr | The destination address, for example, `010015AC`. |
| Direction | The direction, for example, `in` or `out`. |
| ThreadPoolType | The OpenSearch thread pools, for example, `index`, `search`, or `snapshot`. |
| CBType | The circuit breaker type, for example, `accounting`, `fielddata`, `in_flight_requests`, `parent`, or `request`. |
| ClusterManagerTaskInsertOrder| The order in which the task was inserted, for example, `3691`. |
| ClusterManagerTaskPriority | The priority of the task, for example, `URGENT`. OpenSearch executes higher-priority tasks before lower-priority ones, regardless of `insert_order`. |
| ClusterManagerTaskType | The task type, for example, `shard-started`, `create-index`, `delete-index`, `refresh-mapping`, `put-mapping`, `CleanupSnapshotRestoreState`, or `Update snapshot state`. |
| ClusterManagerTaskMetadata | The metadata for the task (if any). |
| CacheType | The cache type, for example, `Field_Data_Cache`, `Shard_Request_Cache`, or `Node_Query_Cache`. |

| `ShardID` | The ID of the shard, for example, `1`. |
| `IndexName` | The name of the index, for example, `my-index`. |
| `Operation` | The type of operation, for example, `shardbulk`. |
| `ShardRole` | The shard role, for example, `primary` or `replica`. |
| `Exception` | OpenSearch exceptions, for example, `org.opensearch.index_not_found_exception`. |
| `Indices` | The list of indexes in the request URL. |
| `HTTPRespCode` | The response code from OpenSearch, for example, `200`. |
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
| `MemType` | The memory type, for example, `totYoungGC`, `totFullGC`, `Survivor`, `PermGen`, `OldGen`, `Eden`, `NonHeap`, or `Heap`. |
| `DiskName` | The name of the disk, for example, `sda1`. |
| `DestAddr` | The destination address, for example, `010015AC`. |
| `Direction` | The direction, for example, `in` or `out`. |
| `ThreadPoolType` | The OpenSearch thread pools, for example, `index`, `search`, or `snapshot`. |
| `CBType` | The circuit breaker type, for example, `accounting`, `fielddata`, `in_flight_requests`, `parent`, or `request`. |
| `ClusterManagerTaskInsertOrder`| The order in which the task was inserted, for example, `3691`. |
| `ClusterManagerTaskPriority` | The priority of the task, for example, `URGENT`. OpenSearch executes higher-priority tasks before lower-priority ones, regardless of `insert_order`. |
| `ClusterManagerTaskType` | The task type, for example, `shard-started`, `create-index`, `delete-index`, `refresh-mapping`, `put-mapping`, `CleanupSnapshotRestoreState`, or `Update snapshot state`. |
| `ClusterManagerTaskMetadata` | The metadata for the task (if any). |
| `CacheType` | The cache type, for example, `Field_Data_Cache`, `Shard_Request_Cache`, or `Node_Query_Cache`. |
| `NodeID` | The ID of the node. |
| `Searchbp_mode` | The search backpressure mode, for example, `monitor_only` (default), `enforced`, or `disabled`. |
Loading