Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ISM Rollover is stuck waiting for a rollover #974

Closed
kinseii opened this issue Oct 3, 2023 · 3 comments
Closed

[BUG] ISM Rollover is stuck waiting for a rollover #974

kinseii opened this issue Oct 3, 2023 · 3 comments
Labels
bug Something isn't working untriaged

Comments

@kinseii
Copy link

kinseii commented Oct 3, 2023

Describe the bug
Rolling it once worked fine. That is, for all indexes the rollover from suffix 000001 to 000002 was done. But for several days ISM has been waiting for the rolling time to occur, but the time does not increase:

{
    "message": "Pending rollover of index [index=fluent-bit-xxx-yyyy-zzzzz-000002]",
    "conditions": {
        "min_index_age": {
            "condition": "1d",
            "current": "9.5h",
            "creationDate": 1696051868987
        }
    }
}

The current time has not increased since those 9.5 hours.

ISM Policy:

{
    "id": "ism-policy-fluent-bit-common",
    "seqNo": 1,
    "primaryTerm": 1,
    "policy": {
        "policy_id": "ism-policy-fluent-bit-common",
        "description": "Workflow for HOT, WARM, COOL and DELETE states",
        "last_updated_time": 1695961699024,
        "schema_version": 18,
        "error_notification": {
            "channel": {
                "id": "notification-channels"
            },
            "message_template": {
                "source": "The index {{ctx.index}} failed during ISM policy execution",
                "lang": "mustache"
            }
        },
        "default_state": "hot",
        "states": [
            {
                "name": "hot",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "notification": {
                            "channel": {
                                "id": "notification-channels"
                            },
                            "message_template": {
                                "source": "The index: {{ctx.index}} has been relocating to a WARM state",
                                "lang": "mustache"
                            }
                        }
                    },
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "rollover": {
                            "min_index_age": "1d"
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "warm",
                        "conditions": {
                            "min_rollover_age": "30d"
                        }
                    }
                ]
            },
            {
                "name": "warm",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "notification": {
                            "channel": {
                                "id": "notification-channels"
                            },
                            "message_template": {
                                "source": "The index: {{ctx.index}} has been relocating to a COOL state",
                                "lang": "mustache"
                            }
                        }
                    },
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "allocation": {
                            "require": {
                                "temp": "warm"
                            },
                            "include": {},
                            "exclude": {},
                            "wait_for": false
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "cool",
                        "conditions": {
                            "min_rollover_age": "120d"
                        }
                    }
                ]
            },
            {
                "name": "cool",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "notification": {
                            "channel": {
                                "id": "notification-channels"
                            },
                            "message_template": {
                                "source": "The index: {{ctx.index}} has been relocating to a DELETE state",
                                "lang": "mustache"
                            }
                        }
                    },
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "allocation": {
                            "require": {
                                "temp": "warm"
                            },
                            "include": {},
                            "exclude": {},
                            "wait_for": false
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "delete",
                        "conditions": {
                            "min_rollover_age": "750d"
                        }
                    }
                ]
            },
            {
                "name": "delete",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "notification": {
                            "channel": {
                                "id": "notification-channels"
                            },
                            "message_template": {
                                "source": "The index {{ctx.index}} has been deleting",
                                "lang": "mustache"
                            }
                        }
                    },
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "delete": {}
                    }
                ],
                "transitions": []
            }
        ],
        "ism_template": [
            {
                "index_patterns": [
                    "fluent-bit-*"
                ],
                "priority": 100,
                "last_updated_time": 1695961699024
            }
        ]
    }
}

I didn't find any other errors in the logs. Just waiting for rollover time. On another cluster with a shorter Rollover time (3h) works fine.

Expected behavior
Rollover should work as it should, with no stuck.

Plugins

opensearch-alerting                  2.9.0.0
opensearch-anomaly-detection         2.9.0.0
opensearch-asynchronous-search       2.9.0.0
opensearch-cross-cluster-replication 2.9.0.0
opensearch-geospatial                2.9.0.0
opensearch-index-management          2.9.0.0
opensearch-job-scheduler             2.9.0.0
opensearch-knn                       2.9.0.0
opensearch-ml                        2.9.0.0
opensearch-neural-search             2.9.0.0
opensearch-notifications             2.9.0.0
opensearch-notifications-core        2.9.0.0
opensearch-observability             2.9.0.0
opensearch-performance-analyzer      2.9.0.0
opensearch-reports-scheduler         2.9.0.0
opensearch-security                  2.9.0.0
opensearch-security-analytics        2.9.0.0
opensearch-sql                       2.9.0.0
repository-s3                        2.9.0

Host/Environment (please complete the following information):

  • OS: Azure k8s service v1.26.6 with Ubuntu nodes
  • OpenSearch version 2.9.0
@kinseii kinseii added bug Something isn't working untriaged labels Oct 3, 2023
@dblock dblock transferred this issue from opensearch-project/OpenSearch Oct 3, 2023
@kinseii
Copy link
Author

kinseii commented Oct 3, 2023

I tried to rollover the index manually, but the policy didn't pick up:

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "fluent-bit-xxx-yyyy-zzzzzz-000002",
        "alias": "fluent-bit-xxx-yyyy-zzzzzz",
        "is_write_index": false
      }
    }, {
      "add": {
        "index": "fluent-bit-xxx-yyyy-zzzzzz-000003",
        "alias": "fluent-bit-xxx-yyyy-zzzzzz",
        "is_write_index": true
      }
    }
  ]
}

It says it's initializing, but it's been 15 hours and it's not working :(
image

@kinseii
Copy link
Author

kinseii commented Oct 3, 2023

I tried recreating the ISM policy, but nothing changed.

{
  "fluent-bit-xxx-yyyy-zzzzz-000002": {
    "index.plugins.index_state_management.policy_id": "ism-policy-fluent-bit-common",
    "index.opendistro.index_state_management.policy_id": "ism-policy-fluent-bit-common",
    "index": "fluent-bit-xxx-yyyy-zzzzz-000002",
    "index_uuid": "D60hVglyRBaMHpSkt1yXZw",
    "policy_id": "ism-policy-fluent-bit-common",
    "policy_seq_no": -2,
    "policy_primary_term": 0,
    "rolled_over": false,
    "index_creation_date": 1696051868987,
    "state": {
      "name": "hot",
      "start_time": 1696052311588
    },
    "action": {
      "name": "rollover",
      "start_time": 1696053135698,
      "index": 1,
      "failed": false,
      "consumed_retries": 0,
      "last_retry_time": 0
    },
    "step": {
      "name": "attempt_rollover",
      "start_time": 1696053135698,
      "step_status": "condition_not_met"
    },
    "retry_info": {
      "failed": false,
      "consumed_retries": 0
    },
    "info": {
      "message": "Pending rollover of index [index=fluent-bit-xxx-yyyy-zzzzz-000002]",
      "conditions": {
        "min_index_age": {
          "condition": "1d",
          "current": "9.5h",
          "creationDate": 1696051868987
        }
      }
    },
    "enabled": true
  },
  "total_managed_indices": 1
}

@kinseii
Copy link
Author

kinseii commented Oct 6, 2023

The problem was that one of the indexes was in red status

@kinseii kinseii closed this as completed Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged
Projects
None yet
Development

No branches or pull requests

1 participant