[BUG] ISM force_merge on datastream index #1255

disaster37 · 2024-09-16T14:31:22Z

What is the bug?

On Opensearch 2.16.0

I have created ISM policy, with force_merge step to force to have one segment after the datastream index has rolled out and move to warm node. The step always finished on timeout.
After put ISM log level to DEBUG, I get the following logs:

{"type": "json_logger", "timestamp": "2024-09-16T14:04:13,248Z", "level": "DEBUG", "component": "o.o.i.i.s.f.WaitForForceMergeStep", "cluster.name": "logmanagement2-rec", "node.name": "opensearch-data-os-2", "message": "Force merge still running on [.ds-logs-log-default-000617] with [2] shards containing unmerged segments", "cluster.uuid": "ZbghcuYqTtWRmCHMd4tbyw", "node.id": "cYyrcay5QPS7_zi0HxvyJg"  }

How can one reproduce the bug?

Create new Opensearch cluster with hot and warm tiers
Create Index template to allow create datastream index

{
  "index_patterns": [
    "logs-*"
  ],
  "priority": "500",
  "data_stream": {
    "timestamp_field": {
      "name": "@timestamp"
    }
  },
  "name": "template_log",
  "template": {}
}

Create datastream index logs-log-default
Create ISM policy

{
    "id": "policy-log",
    "seqNo": 2848481,
    "primaryTerm": 23,
    "policy": {
        "policy_id": "policy-log",
        "description": "Policy for logs index",
        "last_updated_time": 1725961147454,
        "schema_version": 21,
        "error_notification": null,
        "default_state": "hot",
        "states": [
            {
                "name": "hot",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "rollover": {
                            "min_index_age": "1d",
                            "min_primary_shard_size": "5gb",
                            "copy_alias": false
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "warm",
                        "conditions": {
                            "min_index_age": "1d"
                        }
                    }
                ]
            },
            {
                "name": "warm",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "read_only": {}
                    },
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "allocation": {
                            "require": {
                                "temp": "warm"
                            },
                            "include": {},
                            "exclude": {},
                            "wait_for": false
                        }
                    },
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "index_priority": {
                            "priority": 50
                        }
                    },
                    {
                        "timeout": "1d",
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "force_merge": {
                            "max_num_segments": 1
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "delete",
                        "conditions": {
                            "min_index_age": "2d"
                        }
                    }
                ]
            },
            {
                "name": "delete",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "delete": {}
                    }
                ],
                "transitions": []
            }
        ],
        "ism_template": [
            {
                "index_patterns": [
                    "logs-log-*"
                ],
                "priority": 100,
                "last_updated_time": 1725961147454
            }
        ]
    }
}

Wait Force merge step. The force_merge step always in timeout.

What is the expected behavior?
Force merge run successfully on get one segment per shard.

What is your host/environment?

Opensearch 2.16.0

The text was updated successfully, but these errors were encountered:

disaster37 · 2024-09-23T14:21:49Z

I finnaly found a right log on data node that host the last shard without merge segments.
"Caused by: java.io.IOException: No space left on device",

disaster37 · 2024-09-24T14:20:58Z

I think the force_merge setp must be estimate the target size to look if there are sufficious space on node.
And in all case, the setp must be failed because node space left on device instead to failed with Action time out

bharath-techie · 2024-09-30T15:39:23Z

@disaster37 did you try explain API to get the information on the policy failure ?

dblock · 2024-10-07T16:08:51Z

[Catch All Triage - 1, 2, 3, 4]

disaster37 added bug Something isn't working untriaged labels Sep 16, 2024

disaster37 closed this as completed Sep 23, 2024

disaster37 reopened this Sep 24, 2024

dblock removed the untriaged label Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] ISM force_merge on datastream index #1255

[BUG] ISM force_merge on datastream index #1255

disaster37 commented Sep 16, 2024 •

edited

Loading

disaster37 commented Sep 23, 2024

disaster37 commented Sep 24, 2024

bharath-techie commented Sep 30, 2024

dblock commented Oct 7, 2024

[BUG] ISM force_merge on datastream index #1255

[BUG] ISM force_merge on datastream index #1255

Comments

disaster37 commented Sep 16, 2024 • edited Loading

disaster37 commented Sep 23, 2024

disaster37 commented Sep 24, 2024

bharath-techie commented Sep 30, 2024

dblock commented Oct 7, 2024

disaster37 commented Sep 16, 2024 •

edited

Loading