Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porter stem #446

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

ebraminio
Copy link

@ebraminio ebraminio commented Jul 22, 2024

@dblock Following #444 let's add a test for a currently available filter.

Actually I wasn't familiar with the testing system and so and it was fun!

This one currently fails with the following as I haven't figured out how to hook any filter. As far as I know I should configure an analyzer but not sure how and couldn't find something similar so I can learn from but the whole thing looks nice!

FAILED  _core/search/porter_stem_token_filter.yaml (/Users/ebrahim/opensearch-api-specification/tests/_core/search/porter_stem_token_filter.yaml)
    FAILED  CHAPTERS
        FAILED  Search with a match query field.
            PASSED  PARAMETERS
                PASSED  index
            PASSED  REQUEST BODY
            PASSED  RESPONSE STATUS
            FAILED  RESPONSE PAYLOAD BODY (expected hits.total.value='1', got '0', missing hits.hits[0]='[object Object]')
            PASSED  RESPONSE PAYLOAD SCHEMA

@ebraminio
Copy link
Author

ebraminio commented Jul 22, 2024

@ebraminio ebraminio marked this pull request as draft July 22, 2024 20:50
@dblock
Copy link
Member

dblock commented Jul 22, 2024

@ebraminio Run with -- --verbose how does it fail?

You may need a refresh for search to consistently return results.

Copy link

github-actions bot commented Jul 22, 2024

Changes Analysis

Commit SHA: 3b60a47
Comparing To SHA: bf2772a

API Changes

Summary

NO CHANGES

Report

The full API changes report is available at: https://github.com/opensearch-project/opensearch-api-specification/actions/runs/10061450423/artifacts/1731274121

API Coverage

Before After Δ
Covered (%) 490 (47.99 %) 490 (47.99 %) 0 (0 %)
Uncovered (%) 531 (52.01 %) 531 (52.01 %) 0 (0 %)
Unknown 24 24 0

@dblock
Copy link
Member

dblock commented Jul 22, 2024

For the spell checker, either use an existing movie or add to .cspell.

See other lint failures & stuff, we can skip changelog for tests.

@dblock dblock added the skip-changelog No need to update CHANGELOG. label Jul 22, 2024
@ebraminio
Copy link
Author

ebraminio commented Jul 23, 2024

@ebraminio Run with -- --verbose how does it fail?

The problem is I expect it to current fail as I don't how to configure the index yet to use porter_stem yet and it fails but on your CI it doesn't fail which is weird.

FAILED  _core/search/porter_stem_token_filter.yaml (/Users/ebrahim/opensearch-api-specification/tests/_core/search/porter_stem_token_filter.yaml)
    PASSED  PROLOGUES
        PASSED  POST /movies/_doc
    FAILED  CHAPTERS
        FAILED  Search with a match query field.
            PASSED  PARAMETERS
                PASSED  index
            PASSED  REQUEST BODY
            PASSED  RESPONSE STATUS
            FAILED  RESPONSE PAYLOAD BODY (expected hits.total.value='1', got '0', missing hits.hits[0]='[object Object]')
            PASSED  RESPONSE PAYLOAD SCHEMA
    PASSED  EPILOGUES
        PASSED  DELETE /movies

[INFO] => POST /movies/_doc ({
  "refresh": true
}) [application/json] | {
  "director": "Bennett Miller",
  "title": "Moneyball",
  "year": 2011
}
[INFO] <= 201 (application/json; charset=UTF-8) | {
  "_index": "movies",
  "_id": "ViQf3pAB2iv_XREEwTRb",
  "_version": 1,
  "result": "created",
  "forced_refresh": true,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}
[INFO] => POST /movies/_search ({
  "seq_no_primary_term": true
}) [application/json] | undefined
[INFO] <= 200 (application/json; charset=UTF-8) | {
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "movies",
        "_id": "ViQf3pAB2iv_XREEwTRb",
        "_seq_no": 0,
        "_primary_term": 1,
        "_score": 1,
        "_source": {
          "director": "Bennett Miller",
          "title": "Moneyball",
          "year": 2011
        }
      }
    ]
  }
}
[INFO] {
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "movies",
        "_id": "ViQf3pAB2iv_XREEwTRb",
        "_seq_no": 0,
        "_primary_term": 1,
        "_score": 1,
        "_source": {
          "director": "Bennett Miller",
          "title": "Moneyball",
          "year": 2011
        }
      }
    ]
  }
}
[INFO] => DELETE /movies ({}) [application/json] | undefined
[INFO] <= 200 (application/json; charset=UTF-8) | {
  "acknowledged": true
}

@dblock
Copy link
Member

dblock commented Jul 23, 2024

The problem is I expect it to current fail as I don't how to configure the index yet to use porter_stem yet and it fails but on your CI it doesn't fail which is weird.

I don't see a mention of the porter-stem in your test, what am I missing?

@ebraminio
Copy link
Author

I don't see a mention of the porter-stem in your test, what am I missing?

Just that I don't know how best I should reference porter_stem yet, I was even trying to set the language to porter2 with no luck (honestly I don't even know what Porter language is) but if I figure this out I can add test for this and for the merged patch and can even hook another stemmer in OpenSearch but this time there is an established example on how best it can be done.

@dblock
Copy link
Member

dblock commented Jul 23, 2024

I couldn't find an OpenSearch doc, but I think it's the same as in https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-porterstem-tokenfilter.html.

Do open an issue in https://github.com/opensearch-project/documentation-website to document how to use token filters.

@ebraminio
Copy link
Author

ebraminio commented Jul 23, 2024

Thank you so much for the help, apparently I'm getting somewhere but not there yet, I tried swapping consolingly with consolingli which is brought from https://github.com/opensearch-project/OpenSearch/blob/6227dc6ae70d82b7826f8f08bcc57b277c254056/modules/analysis-common/src/test/java/org/opensearch/analysis/common/StemmerTokenFilterFactoryTests.java#L83 maybe I just don't know something about the language, will try persian_stem also,

[INFO] => PUT /movies ({}) [application/json] | {
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "porter_stem"
          ]
        }
      }
    }
  }
}
[INFO] <= 200 (application/json; charset=UTF-8) | {
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "movies"
}
[INFO] => POST /movies/_doc ({
  "refresh": true
}) [application/json] | {
  "director": "Consolingly",
  "title": "Moneyball",
  "year": 2011
}
[INFO] <= 201 (application/json; charset=UTF-8) | {
  "_index": "movies",
  "_id": "LbYe4JAB6G__aQMm1-87",
  "_version": 1,
  "result": "created",
  "forced_refresh": true,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}
[INFO] => POST /movies/_search ({}) [application/json] | {
  "size": 1,
  "query": {
    "match": {
      "director": "Consolingli"
    }
  }
}
[INFO] <= 200 (application/json; charset=UTF-8) | {
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}
[INFO] {
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}
[INFO] => DELETE /movies ({}) [application/json] | undefined
[INFO] <= 200 (application/json; charset=UTF-8) | {
  "acknowledged": true
}

FAILED  _core/search/porter_stem_token_filter.yaml (/Users/ebrahim/opensearch-api-specification/tests/_core/search/porter_stem_token_filter.yaml)
    PASSED  PROLOGUES
        PASSED  PUT /movies
        PASSED  POST /movies/_doc
    FAILED  CHAPTERS
        FAILED  Search with a match query field.
            PASSED  PARAMETERS
                PASSED  index
            PASSED  REQUEST BODY
            PASSED  RESPONSE STATUS
            FAILED  RESPONSE PAYLOAD BODY (expected hits.total.value='1', got '0', missing hits.hits[0]='[object Object]')
            PASSED  RESPONSE PAYLOAD SCHEMA
    PASSED  EPILOGUES
        PASSED  DELETE /movies

@dblock
Copy link
Member

dblock commented Jul 23, 2024

It's getting somewhere! I am not sure why the filter isn't working, but i you can get any filter working that would be a useful contribution.

We should start organizing things better too, so I would move tests/_core/search/porter_stem_token_filter.yaml to something like tests/_core/search/tokenizers/filters/porter_stem.yaml, this way it will be easy to copy-paste a test for another one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-changelog No need to update CHANGELOG.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants