Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different output than the one shown on the README #2

Open
danielizham opened this issue Sep 18, 2022 · 4 comments
Open

Different output than the one shown on the README #2

danielizham opened this issue Sep 18, 2022 · 4 comments

Comments

@danielizham
Copy link

Hello there,

I have found that the model is not giving the results I expect based on my reading of the documentation on GitHub and Hugging Face.

When I tried the sentence 'رغم الهدنة .. معارك قره باغ متواصلة وأذربيجان تعلن سيطرتها على مزيد من القرى', I got:

[[], [{'entity': 'B-LOCATION', 'score': 0.99871314, 'index': 2, 'word': 'قر', 'start': 7, 'end': 9}, {'entity': 'B-LOCATION', 'score': 0.998519, 'index': 3, 'word': '##ه', 'start': 9, 'end': 10}, {'entity': 'I-LOCATION', 'score': 0.9986701, 'index': 4, 'word': 'باغ', 'start': 11, 'end': 14}]]

when what is expected is the following:

{"text": "رغم الهدنة . . معارك قره باغ متواصلة و أذربيجان تعلن سيطرتها على مزيد من القرى", "entities": [{"type": "LOCATION", "entity": "قره باغ", "start_offset": 21, "end_offset": 28}, {"type": "ORGANIZATION", "entity": "أذربيجان", "start_offset": 39, "end_offset": 47}]}

In terms of the format, I can modify my code or the source code so that I can process the data as I want to. But the output of the entities are not as good and some common entities have been missed (i.e. أذربيجان). I have ran this with multiple examples and the results are the same. Has the model changed?

@hatmimoha
Copy link
Owner

hatmimoha commented Sep 19, 2022 via email

@danielizham
Copy link
Author

Thank you for your response. The output looks nicer and as expected now. However, I noticed that the words that are not named entities are being detected as PERSON and the named entities are labelled wrongly:

{"text": "رغم الهدنة . . معارك قره باغ متواصلة و أذربيجان تعلن سيطرتها على مزيد من القرى", "entities": [{"type": "PERSON", "entity": "رغم", "start_offset": 1, "end_offset": 3}, {"type": "PERSON", "entity": "الهدنة", "start_offset": 4, "end_offset": 10}, {"type": "PERSON", "entity": ".", "start_offset": 11, "end_offset": 12}, {"type": "PERSON", "entity": ".", "start_offset": 13, "end_offset": 14}, {"type": "PERSON", "entity": "معارك", "start_offset": 15, "end_offset": 20}, {"type": "LOCATION", "entity": "قره", "start_offset": 21, "end_offset": 24}, {"type": "DATE", "entity": "باغ", "start_offset": 25, "end_offset": 28}, {"type": "PERSON", "entity": "متواصلة", "start_offset": 29, "end_offset": 36}, {"type": "PERSON", "entity": "و", "start_offset": 37, "end_offset": 38}, {"type": "PERSON", "entity": "أذربيجان", "start_offset": 39, "end_offset": 47}, {"type": "PERSON", "entity": "تعلن", "start_offset": 48, "end_offset": 52}, {"type": "PERSON", "entity": "سيطرتها", "start_offset": 53, "end_offset": 60}, {"type": "PERSON", "entity": "على", "start_offset": 61, "end_offset": 64}, {"type": "PERSON", "entity": "مزيد", "start_offset": 65, "end_offset": 69}, {"type": "PERSON", "entity": "من", "start_offset": 70, "end_offset": 72}, {"type": "PERSON", "entity": "القرى", "start_offset": 73, "end_offset": 78}]}

Is this due to the outdated model as you mentioned?

@hatmimoha
Copy link
Owner

hatmimoha commented Sep 22, 2022 via email

@hatmimoha
Copy link
Owner

hatmimoha commented Nov 29, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants