Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Introduce slot level metrics to Valkey cluster #20

Closed
PingXie opened this issue Mar 25, 2024 · 44 comments
Closed

[NEW] Introduce slot level metrics to Valkey cluster #20

PingXie opened this issue Mar 25, 2024 · 44 comments
Labels
cluster major-decision-pending Major decision pending by TSC team

Comments

@PingXie
Copy link
Member

PingXie commented Mar 25, 2024

I’m revisiting the feature proposal we discussed in redis/redis#10472, which aims at providing metrics at the slot level. Despite the substantial effort and detailed discussions back then, we didn’t land this feature. I believe it’s worth reconsidering, given the potential benefits and previous interest.

@kyle-yh-kim @zuiderkwast @madolson

@madolson
Copy link
Member

I fully agree!

@madolson
Copy link
Member

I've also re-added my favorite creature comfort. @placeholderkv/core-team thoughts?

@zuiderkwast
Copy link
Contributor

Sure, metrics seem fine. I don't have strong opinions about it, only that I think fixing the cluster consistency problems is more important than metrics.

@zuiderkwast
Copy link
Contributor

redis/redis#11432

@madolson
Copy link
Member

I think our big initial play should be cluster overhaul. I think a lot of us want it, and it makes the most compelling sense as the big "next major features".

@kyle-yh-kim
Copy link
Contributor

Good to hear back on this thread, hope you all have been doing well.

Where we left-off

In total, there were 3 proposed metrics under CLUSTER SLOT-STATS command group;

  1. key_count
  2. cpu_usec
  3. memory_bytes

Next steps

memory_bytes is the most complex of all, but this shouldn't stop us from implementing the first two metrics to gain some momentum.

I will open two PRs for key_count and cpu_usec in the coming days. These PRs will be based off of the already existing PRs for key_count and cpu_usec under Redis repository.

As for CLUSTER SLOT-STATS command format, below was the latest development we agreed upon. Lengthy discussion and rationale can be found here and here.

CLUSTER SLOT-STATS
[SLOTSRANGE start-slot end-slot [start-slot end-slot ...]]|
[ORDERBY column [LIMIT limit] [ASC|DESC]]

@hwware
Copy link
Member

hwware commented Mar 27, 2024

It is great, and I prefer to add this feature in CLUSTER INFO.

@madolson
Copy link
Member

It is great, and I prefer to add this feature in CLUSTER INFO.

Why cluster info? It's a free form field I guess, it could be a new sub info field I suppose

@madolson madolson added the major-decision-pending Major decision pending by TSC team label Mar 27, 2024
@kyle-yh-kim
Copy link
Contributor

Thanks for chiming in. Personally, I'm opposed to CLUSTER INFO. We could perhaps add aggregated information under CLUSTER INFO, but not for the slot level metrics themselves.

Imagine dumping ~16384 slot level metrics under CLUSTER INFO. This would unnecessarily bloat the info string, when the user may have only wanted to check cluster_state:ok.

A new command dedicated for slot level metrics querying, in this case, CLUSTER SLOT-STATS, is more suitable. For reference, below was the latest command format we agreed on.

CLUSTER SLOT-STATS
[SLOTSRANGE start-slot end-slot [start-slot end-slot ...]]|
[ORDERBY column [LIMIT limit] [ASC|DESC]]

I'll wait for the core team to finalize this decision, before opening the PRs.

@zuiderkwast
Copy link
Contributor

@kyle-yh-kim Yeah CLUSTER SLOT-STATS. We're a bit overloaded with the forking stuff, new core team, new project, etc. but I think we want this for our next release. There was already a lot of review done and I think it was almost ready to merge. Do you want to bring over your PR?

@PingXie PingXie moved this to Todo in Valkey 8.0 Apr 15, 2024
@PingXie PingXie changed the title Introduce slot level metrics to Redis cluster [NEW] Introduce slot level metrics to Redis cluster Apr 20, 2024
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 22, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key_count metric. cpu_usec (approved) and
memory_bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key_count metric. cpu_usec (approved) and
memory_bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key_count metric. cpu_usec (approved) and
memory_bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key_count metric. cpu_usec (approved) and
memory_bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key_count metric. cpu_usec (approved) and
memory_bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key-count metric. cpu-usec (approved) and
memory-bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key-count metric. cpu-usec (approved) and
memory-bytes (pending-approval) metrics will soon follow after the
merger of this PR.
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 23, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key-count metric. cpu-usec (approved) and
memory-bytes (pending-approval) metrics will soon follow after the
merger of this PR.
@kyle-yh-kim
Copy link
Contributor

Ignore my spam references above, I was reviewing the diff manually over Github UI.

PR has now been opened; #351

This PR is part one of the three upcoming PRs;

  1. CLUSTER SLOT-STATS command introduction, with key-count support --> This PR.
  2. cpu-usec support
  3. memory-bytes support

kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Apr 30, 2024
The command provides detailed slot usage statistics upon invocation,
with initial support for key-count metric. cpu-usec (approved) and
memory-bytes (pending-approval) metrics will soon follow after the
merger of this PR.

Signed-off-by: Kyle Kim <[email protected]>
@kyle-yh-kim
Copy link
Contributor

Moving ahead, I would like to resume our conversation on per-slot memory metrics. I'd argue this is the most important per-slot metric of all, as it enables for smoother horizontal scaling given the accurate memory tracking at per-slot granularity.

Last time, we converged on its high-level strategy in "online analysis" (amortizing memory tracking cost per mutative-command, over offline RDB snapshot analysis / forking a process), as well as its performance and memory impact. The following conclusion was drawn, before the issue was then put on halt by the previously open-sourced Redis-core team.

Overall this data seems really good to me. There is the separate project for improving main memory efficiency of the dictionary, so if these two features are released together it might not be noticeable.

Source: redis/redis#10472 (comment)

As for module consideration, I mention in details here to keep this feature as an opt-in service to maintain backwards compatibility. For opt-in modules, they will be required to accurately track its value size, and call a newly introduced hook RM_ModuleUpdateMemorySlotStats() upon its mutation, to signal valkey-server to register the memory size gain / loss from the module’s registered write commands.

If we are still aligned to this strategy, I will start on its implementation, and open incremental PRs following the merger of the above CLUSTER SLOT-STATS command PR #351.

@kyle-yh-kim
Copy link
Contributor

Based on Madelyn's latest comment;

  1. Defer the decision about memory usage since it was contentious.

Memory metric is our greatest interest, since it would enable smoother horizontal scaling given accurate information of each slot's memory consumption.

Whenever possible, I'd like to understand more on the proposed design's concerns from the core team. Once the concerns are shared, I will evaluate alternative options.

One thing I can state for certainty is that - we've put a lot of time and effort into this technical design. Ultimately, there is no solution that comes free of charge - it all boils down to tradeoff decisions (performance, memory, and maintainability).

@zuiderkwast
Copy link
Contributor

Hi Kyle! I have two concerns:

  1. Tracking memory for each data structure seems to add considerable complexity. For dict, we'd need keySize and valueSize callbacks in dictType. For quicklist, it's just a size per quicklist I suppose, since the nodes already have a size, but what about compressed nodes? For rax and skiplist, I'd like to see a simple description for how to handle reach of these to understand the complexity.
  2. Any performance degradation, or did you say it's a config? When off, there's no performance degradation?

Memory usage is not a concern to me, since we don't need any new structures to track the memory for the single-allocation datastructures (string, listpack, etc.) Modules is no great concern either since I'd imagine it's no disaster if this metric isn't 100% accurate.

What about alternative approaches? Can we check the total memory usage before and after each command? We know which slot each command operated on.

@kyle-yh-kim
Copy link
Contributor

Thanks for your prompt response. My response is attached below.

Complexity in memory tracking

  1. quicklist compression
    zmalloc_size(node->entry) is called before and after compression to assess its difference. The two hook points are; 1) __quicklistCompressNode(), and 2) __quicklistDecompressNode(). This difference is then accumulated to quicklist->alloc_bytes, where alloc_bytes is a newly introduced size_t field that tracks its allocation bytes.

  2. zskiplist
    zskiplistNode holds two major memory allocations; 1) node->ele, and 2) node->level[]. Both of which can be easily introspected through zmalloc_size(). Similar to dict and quicklist, lowest common hook points are chosen, such that the change is minimally invasive. This change can be accomplished by 2 line changes in 1) zslInsert(), and 2) zslDeleteNode().

  3. rax
    There exist an open OSS PR which tracks rax allocation size in its header. The change isn't complex, as we simply add or subtract zmalloc_size(raxNode) per mutation, for which there're about 20 touch points. This effort can be resumed in Valkey project.

Performance degradation

The configuration is based on server.cluster_enabled. If enabled, per-slot memory will be aggregated. Else, the code will be bypassed.

The aggregation comes in two layers;

  1. Track accurate memory usage of Redis key-value entry.
  2. Aggregate memory usage at per-slot level, given that we can track each Redis key-value entry’s memory usage.

Right now, the proposal for CMD is to bypass only the second aggregation and retain the first one. This way, both CMD and CME will have O(1) accurate MEMORY USAGE.

More on performance benchmarking can be found here. In the worst case scenario for CMD, the performance degradation may reach ~1% TPS. For an average workload of 8:2 R/W, the degradation is negligible.

Alternative approaches

Can we check the total memory usage before and after each command? We know which slot each command operated on.

Yes, this was the very first design candidate we ideated. Initially, we expected this to be as simple as subtracting zmalloc_used_memory()_after_cmd - zmalloc_used_memory()_before_cmd. However, it carried far greater complexity due to the following reasons;

  • Maintenance and hard-to-follow logic. At first glance, this approach seems simplistic to implement. However, zmalloc context switching from customer key-space to others intents (including but are not limited to; 1. Transient / temporary 2. Redis administration 3. Client input / output buffer) can occur throughout all depths of Redis mutative command call-stack. Out of all zmalloc operations, we must isolate those relevant to customer key-space. Thus, for every mutative Redis command, we must first completely map-out these context switching windows, followed by its maintenance upon any new zmalloc introduction within these windows.
    • The 2nd candidate solves this maintenance problem by logically separating all size tracking within the memory sparse internal data-structure files, such as rax.c, dict.c, quicklist.c and so on. The size tracking will not creep into other depths of call-stacks.
    • Down the road, if any bug is introduced, 1st candidate will require sweeping across all zmalloc operations within all depths of call-stacks. For the 2nd candidate, we may simply refer to the specific internal data-structure file.
  • Complex and invasive, as zmalloc can not be relied under all cases.
    • For example, in order to get the relevant slot number, the input must first be parsed. However, parsing of this input requires zmalloc. We now run into a cyclic dependency, where zmalloc needs slot number to increment, but the slot number can only be obtained once key is parsed through zmalloc. To mitigate, we may temporarily save the size of these variables, then increment them once the slot number is parsed and request is successful. But now, we need a way to carry this additional temporary variable, either through another global variable, or additional argument across all call-stacks.
    • Another example would be, robj value are conditionally re-created following the initial parsing (createStringObjectFromLongLongWithOptions()). So then, the size of the initially parsed value may or may not be disregarded from the slot metrics array. This requires another layer of consideration.
    • After a few edge case considerations, the implementation touches multiple signatures and growing number of global variables.

We’ve also investigated various “offline” approaches, such as 1) background thread, 2) cron-job, and 3) forking, all of which were not preferred due to unbounded upper scanning limit, as well as recency lag.

This was greatly discussed over in the other threads, here and here.

@zuiderkwast
Copy link
Contributor

Thanks! Yes, I have seen those threads before but I didn't follow this carefully back then. :)

OK, so memory is tracked even for standalone mode and it has almost 1% throughput impact for standalone and nearly 2% in cluster mode. This makes me think that we should add a config for it and wrap all of these in if, like if (server.memory_tracking) { d->size += zmalloc_size(p); }. If the config is off, CPU branch prediction will make sure it doesn't cost anything to execute this kind of if statements.

Why? I think speed is more important than metrics for some users. 1% is not that much but it adds up.

kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Jul 1, 2024
…alkey-io#20).

The metric tracks network ingress bytes under per-slot context,
by reverse calculation of c->argv_len_sum and c->argc, stored
under a newly introduced field c->net_input_bytes_curr_cmd.

Signed-off-by: Kyle Kim <[email protected]>
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Jul 1, 2024
…alkey-io#20).

The metric tracks network ingress bytes under per-slot context,
by reverse calculation of c->argv_len_sum and c->argc, stored
under a newly introduced field c->net_input_bytes_curr_cmd.

Signed-off-by: Kyle Kim <[email protected]>
@kyle-yh-kim
Copy link
Contributor

kyle-yh-kim commented Jul 1, 2024

PR for per-slot Network bytes-in metric has been opened; #720

The metric tracks network ingress bytes under per-slot context, by reverse calculation of c->argv_len_sum and c->argc, stored under a newly introduced field c->net_input_bytes_curr_cmd.

Similar to CPU metric PR, the first revision only holds implementation changes for initial feedback purposes, with pending perf testing. Integration tests are not up-to-date, and thus failing. This will soon be followed-up.

@kyle-yh-kim
Copy link
Contributor

Performance benchmarking result has been attached below. This will help us to decide whether to enable or disable the per-slot metrics by default, for all instances with CME (cluster-mode-enabled). For CMD (cluster-mode-disabled) instances, below performance penalty will not apply.

Performance benchmarking summary

With both cpu-usec and network-bytes-in metrics enabled, we can note a reduction of 0.70% in TPS.

Naive With cpu-usec Percentage diff
p50 (ms) 2.183 2.206 1.05%
p90 (ms) 3.357 3.369 0.36%
p99 (ms) 3.966 4.006 1.02%
TPS 158280 157179 -0.70%

Appendix: Test setup

Server setup

  • 1 server (r6g.xlarge), pre-filled with 3 million keys, 512 bytes each.

Traffic generator setup

  • 8 traffic generators (m6g.large) running on separate ARM instances.
  • Each traffic generator running the following command (50 clients, SET command, 512 bytes data size), yielding server CPU to pin at 100%.
./valkey-benchmark -h ${TARGET_IP} -c 50 -r 3000000 -n 100000000 -t set -d 514

kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Jul 10, 2024
…alkey-io#20).

The metric tracks network egress bytes under per-slot context,
by hooking onto COB buffer mutations.

The metric can be viewed by calling the CLUSTER SLOT-STATS command,
with sample response attached below;

```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
   2) 1) "key-count"
      2) (integer) 1
      3) "network-bytes-out"
      4) (integer) 175
```

Signed-off-by: Kyle Kim <[email protected]>
@kyle-yh-kim
Copy link
Contributor

PR for per-slot Network bytes-out metric has been opened; #771

This concludes opening of all three per-slot metrics PRs targeted for Valkey 8.0 rc1, which are now pending review / approval from the core team;

@madolson
Copy link
Member

@valkey-io/core-team We think there should be a config since there is a small performance impact. Here are the options for naming:

  1. cluster-slots-command-metrics for (cpu, network-in, network-out) and cluster-slot-data-metrics (for memory).
  2. cluster-slots-operation-metrics for (cpu, network-in, network-out) and cluster-slot-data-metrics (for memory).
  3. cluster-slot-stats-network-enabled, cluster-slot-stats-cpu-enabled, cluster-slot-stats-memory-enabled. (This is not in valkey 8, we can finalize the name later)
  4. cluster-slot-stats-enabled with a separate future config name for memory.

Please also give input if the config should be mutable or immutable.

@zuiderkwast
Copy link
Contributor

I vote 4. cluster-slot-stats-enabled, mutable.

The future config for memory should not be cluster-specific. Name idea: memory-tracking-enabled. Apart from cluster slot-stats, it would make MEMORY USAGE, MEMORY STATS and other info exact (avoid sampling). It should be immutable (since it's non-trivial to make it mutable).

@PingXie
Copy link
Member Author

PingXie commented Jul 16, 2024

The future config for memory should not be cluster-specific. Name idea: memory-tracking-enabled. Apart from cluster slot-stats, it would make MEMORY USAGE, MEMORY STATS and other info exact (avoid sampling).

@zuiderkwast, my understanding of the cluster use case is find out the "big" slot(s) with the large memory footprint. Is the non cluster use case here about finding "big keys" eventually?

@zuiderkwast
Copy link
Contributor

@zuiderkwast, my understanding of the cluster use case is find out the "big" slot(s) with the large memory footprint. Is the non cluster use case here about finding "big keys" eventually?

Yes, it can be used for that too; valkey-cli --memkeys can definitely benefit. (It's using MEMORY USAGE.)

It's more useful to enable it in a cluster than in standalone mode, but it's not useless in standalone mode. If we keep track of memory per key, then aggregating it per slot is very cheap (presumably), so I don't think we need yet another config for memory per slot.

@PingXie
Copy link
Member Author

PingXie commented Jul 16, 2024

Got it. Option 4 sounds good to me and the user needs to enable both memory-tracking (future) and cluster-slot-stats-enabled (8.0) to get the memory stats.

@madolson
Copy link
Member

My preference is 3 -> 4, so I'm OK with 4.

@madolson
Copy link
Member

@kyle-yh-kim Can you update this PR to use a config with the name cluster-slot-stats-enabled, we can sort out an updated name later, but it would be good to get all of the naming out of the way. For now make the config mutable.

@kyle-yh-kim
Copy link
Contributor

Sure. I believe our latest decision was to disable the config by default. The following line will do the trick.

// config.c
createBoolConfig("cluster-slot-stats-enabled", NULL, MODIFIABLE_CONFIG, server.cluster_slot_stats_enabled, 0, NULL, NULL),

The three per-slot metrics PRs have now been updated to include the above config, alongside the previously missing TCL integration tests.

This concludes all planned changes for the three PRs targeted for Valkey 8.0 rc1, now pending review / approval from the core team;

madolson added a commit that referenced this issue Jul 23, 2024
…712)

The metric tracks cpu time in micro-seconds, sharing the same value as
`INFO COMMANDSTATS`, aggregated under per-slot context.

---------

Signed-off-by: Kyle Kim <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Jul 24, 2024
…alkey-io#20).

The metric tracks network egress bytes under per-slot context,
by hooking onto COB buffer mutations.

The metric can be viewed by calling the CLUSTER SLOT-STATS command,
with sample response attached below;

```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
    2) 1) "key-count"
       2) (integer) 0
       3) "cpu-usec"
       4) (integer) 0
       5) "network-bytes-in"
       6) (integer) 0
       7) "network-bytes-out"
       8) (integer) 0
```

Signed-off-by: Kyle Kim <[email protected]>
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Jul 24, 2024
…alkey-io#20).

The metric tracks network egress bytes under per-slot context,
by hooking onto COB buffer mutations.

The metric can be viewed by calling the CLUSTER SLOT-STATS command,
with sample response attached below;

```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
    2) 1) "key-count"
       2) (integer) 0
       3) "cpu-usec"
       4) (integer) 0
       5) "network-bytes-in"
       6) (integer) 0
       7) "network-bytes-out"
       8) (integer) 0
```

Signed-off-by: Kyle Kim <[email protected]>
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Jul 24, 2024
…alkey-io#20).

The metric tracks network egress bytes under per-slot context,
by hooking onto COB buffer mutations.

The metric can be viewed by calling the CLUSTER SLOT-STATS command,
with sample response attached below;

```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
    2) 1) "key-count"
       2) (integer) 0
       3) "cpu-usec"
       4) (integer) 0
       5) "network-bytes-in"
       6) (integer) 0
       7) "network-bytes-out"
       8) (integer) 0
```

Signed-off-by: Kyle Kim <[email protected]>
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Jul 24, 2024
…alkey-io#20).

The metric tracks network egress bytes under per-slot context,
by hooking onto COB buffer mutations.

The metric can be viewed by calling the CLUSTER SLOT-STATS command,
with sample response attached below;

```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
    2) 1) "key-count"
       2) (integer) 0
       3) "cpu-usec"
       4) (integer) 0
       5) "network-bytes-in"
       6) (integer) 0
       7) "network-bytes-out"
       8) (integer) 0
```

Signed-off-by: Kyle Kim <[email protected]>
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Jul 24, 2024
…alkey-io#20).

The metric tracks network egress bytes under per-slot context,
by hooking onto COB buffer mutations.

The metric can be viewed by calling the CLUSTER SLOT-STATS command,
with sample response attached below;

```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
    2) 1) "key-count"
       2) (integer) 0
       3) "cpu-usec"
       4) (integer) 0
       5) "network-bytes-in"
       6) (integer) 0
       7) "network-bytes-out"
       8) (integer) 0
```

Signed-off-by: Kyle Kim <[email protected]>
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Jul 24, 2024
…alkey-io#20).

The metric tracks network egress bytes under per-slot context,
by hooking onto COB buffer mutations.

The metric can be viewed by calling the CLUSTER SLOT-STATS command,
with sample response attached below;

```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
    2) 1) "key-count"
       2) (integer) 0
       3) "cpu-usec"
       4) (integer) 0
       5) "network-bytes-in"
       6) (integer) 0
       7) "network-bytes-out"
       8) (integer) 0
```

Signed-off-by: Kyle Kim <[email protected]>
kyle-yh-kim added a commit to kyle-yh-kim/valkey that referenced this issue Jul 24, 2024
…alkey-io#20).

The metric tracks network egress bytes under per-slot context,
by hooking onto COB buffer mutations.

The metric can be viewed by calling the CLUSTER SLOT-STATS command,
with sample response attached below;

```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
    2) 1) "key-count"
       2) (integer) 0
       3) "cpu-usec"
       4) (integer) 0
       5) "network-bytes-in"
       6) (integer) 0
       7) "network-bytes-out"
       8) (integer) 0
```

Signed-off-by: Kyle Kim <[email protected]>
hwware pushed a commit to hwware/valkey that referenced this issue Jul 25, 2024
…io#20). (valkey-io#712)

The metric tracks cpu time in micro-seconds, sharing the same value as
`INFO COMMANDSTATS`, aggregated under per-slot context.

---------

Signed-off-by: Kyle Kim <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
madolson added a commit that referenced this issue Jul 26, 2024
…ER SLOT-STATS command (#20) (#720)

Adds two new metrics for per-slot statistics, network-bytes-in and
network-bytes-out. The network bytes are inclusive of replication bytes
but exclude other types of network traffic such as clusterbus traffic.

#### network-bytes-in
The metric tracks network ingress bytes under per-slot context, by
reverse calculation of `c->argv_len_sum` and `c->argc`, stored under a
newly introduced field `c->net_input_bytes_curr_cmd`.

#### network-bytes-out
The metric tracks network egress bytes under per-slot context, by
hooking onto COB buffer mutations.

#### sample response
Both metrics are reported under the `CLUSTER SLOT-STATS` command.
```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
    2) 1) "key-count"
       2) (integer) 0
       3) "cpu-usec"
       4) (integer) 0
       5) "network-bytes-in"
       6) (integer) 0
       7) "network-bytes-out"
       8) (integer) 0
```

---------

Signed-off-by: Kyle Kim <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
@madolson madolson moved this from In Progress to Done in Valkey 8.0 Jul 26, 2024
@madolson
Copy link
Member

The four components for Valkey 8.0 are now merged. We will follow up with memory in Valkey 8.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster major-decision-pending Major decision pending by TSC team
Projects
Status: Done
Development

No branches or pull requests

6 participants