Different output than the one shown on the README #2

danielizham · 2022-09-18T15:48:04Z

Hello there,

I have found that the model is not giving the results I expect based on my reading of the documentation on GitHub and Hugging Face.

When I tried the sentence 'رغم الهدنة .. معارك قره باغ متواصلة وأذربيجان تعلن سيطرتها على مزيد من القرى', I got:

[[], [{'entity': 'B-LOCATION', 'score': 0.99871314, 'index': 2, 'word': 'قر', 'start': 7, 'end': 9}, {'entity': 'B-LOCATION', 'score': 0.998519, 'index': 3, 'word': '##ه', 'start': 9, 'end': 10}, {'entity': 'I-LOCATION', 'score': 0.9986701, 'index': 4, 'word': 'باغ', 'start': 11, 'end': 14}]]

when what is expected is the following:

{"text": "رغم الهدنة . . معارك قره باغ متواصلة و أذربيجان تعلن سيطرتها على مزيد من القرى", "entities": [{"type": "LOCATION", "entity": "قره باغ", "start_offset": 21, "end_offset": 28}, {"type": "ORGANIZATION", "entity": "أذربيجان", "start_offset": 39, "end_offset": 47}]}

In terms of the format, I can modify my code or the source code so that I can process the data as I want to. But the output of the entities are not as good and some common entities have been missed (i.e. أذربيجان). I have ran this with multiple examples and the results are the same. Has the model changed?

The text was updated successfully, but these errors were encountered:

hatmimoha · 2022-09-19T10:21:49Z

Hello Daniel, Indeed, the code does not provide the intended result. I update the git repository and added the postprocessing function. You need to update your repository in order to have the latest code. I did not update the transformer model. Should be done in a few months. Thank you for your feedback! Have a nice day Mohamed Hatmi

…

On Sun, Sep 18, 2022 at 5:48 PM Daniel Izham ***@***.***> wrote: Hello there, I have found that the model is not giving the results I expect based on my reading of the documentation on GitHub and Hugging Face. When I tried the sentence 'رغم الهدنة .. معارك قره باغ متواصلة وأذربيجان تعلن سيطرتها على مزيد من القرى', I got: [[], [{'entity': 'B-LOCATION', 'score': 0.99871314, 'index': 2, 'word': 'قر', 'start': 7, 'end': 9}, {'entity': 'B-LOCATION', 'score': 0.998519, 'index': 3, 'word': '##ه', 'start': 9, 'end': 10}, {'entity': 'I-LOCATION', 'score': 0.9986701, 'index': 4, 'word': 'باغ', 'start': 11, 'end': 14}]] when what is expected is the following: {"text": "رغم الهدنة . . معارك قره باغ متواصلة و أذربيجان تعلن سيطرتها على مزيد من القرى", "entities": [{"type": "LOCATION", "entity": "قره باغ", "start_offset": 21, "end_offset": 28}, {"type": "ORGANIZATION", "entity": "أذربيجان", "start_offset": 39, "end_offset": 47}]} In terms of the format, I can modify my code or the source code so that I can process the data as I want to. But the output of the entities are not as good and some common entities have been missed (i.e. أذربيجان). I have ran this with multiple examples and the results are the same. Has the model changed? — Reply to this email directly, view it on GitHub <#2>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFLPK3FTQYU2TO7T5BEFUXTV642T7ANCNFSM6AAAAAAQPP3YY4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

danielizham · 2022-09-20T06:23:49Z

Thank you for your response. The output looks nicer and as expected now. However, I noticed that the words that are not named entities are being detected as PERSON and the named entities are labelled wrongly:

{"text": "رغم الهدنة . . معارك قره باغ متواصلة و أذربيجان تعلن سيطرتها على مزيد من القرى", "entities": [{"type": "PERSON", "entity": "رغم", "start_offset": 1, "end_offset": 3}, {"type": "PERSON", "entity": "الهدنة", "start_offset": 4, "end_offset": 10}, {"type": "PERSON", "entity": ".", "start_offset": 11, "end_offset": 12}, {"type": "PERSON", "entity": ".", "start_offset": 13, "end_offset": 14}, {"type": "PERSON", "entity": "معارك", "start_offset": 15, "end_offset": 20}, {"type": "LOCATION", "entity": "قره", "start_offset": 21, "end_offset": 24}, {"type": "DATE", "entity": "باغ", "start_offset": 25, "end_offset": 28}, {"type": "PERSON", "entity": "متواصلة", "start_offset": 29, "end_offset": 36}, {"type": "PERSON", "entity": "و", "start_offset": 37, "end_offset": 38}, {"type": "PERSON", "entity": "أذربيجان", "start_offset": 39, "end_offset": 47}, {"type": "PERSON", "entity": "تعلن", "start_offset": 48, "end_offset": 52}, {"type": "PERSON", "entity": "سيطرتها", "start_offset": 53, "end_offset": 60}, {"type": "PERSON", "entity": "على", "start_offset": 61, "end_offset": 64}, {"type": "PERSON", "entity": "مزيد", "start_offset": 65, "end_offset": 69}, {"type": "PERSON", "entity": "من", "start_offset": 70, "end_offset": 72}, {"type": "PERSON", "entity": "القرى", "start_offset": 73, "end_offset": 78}]}

Is this due to the outdated model as you mentioned?

hatmimoha · 2022-09-22T10:20:26Z

Hello Daniel, Indeed, something went wrong. I think it is related to the mapping between the output of the model and the labels. I will check it and get back to you. Thank you for your feedback Have a nice day

…

On Tue, Sep 20, 2022 at 8:24 AM Daniel Izham ***@***.***> wrote: Thank you for your response. The output looks nicer and as expected now. However, I noticed that the words that are not named entities are being detected as PERSON and the named entities are labelled wrongly: {"text": "رغم الهدنة . . معارك قره باغ متواصلة و أذربيجان تعلن سيطرتها على مزيد من القرى", "entities": [{"type": "PERSON", "entity": "رغم", "start_offset": 1, "end_offset": 3}, {"type": "PERSON", "entity": "الهدنة", "start_offset": 4, "end_offset": 10}, {"type": "PERSON", "entity": ".", "start_offset": 11, "end_offset": 12}, {"type": "PERSON", "entity": ".", "start_offset": 13, "end_offset": 14}, {"type": "PERSON", "entity": "معارك", "start_offset": 15, "end_offset": 20}, {"type": "LOCATION", "entity": "قره", "start_offset": 21, "end_offset": 24}, {"type": "DATE", "entity": "باغ", "start_offset": 25, "end_offset": 28}, {"type": "PERSON", "entity": "متواصلة", "start_offset": 29, "end_offset": 36}, {"type": "PERSON", "entity": "و", "start_offset": 37, "end_offset": 38}, {"type": "PERSON", "entity": "أذربيجان", "start_offset": 39, "end_offset": 47}, {"type": "PERSON", "entity": "تعلن", "start_offset": 48, "end_offset": 52}, {"type": "PERSON", "entity": "سيطرتها", "start_offset": 53, "end_offset": 60}, {"type": "PERSON", "entity": "على", "start_offset": 61, "end_offset": 64}, {"type": "PERSON", "entity": "مزيد", "start_offset": 65, "end_offset": 69}, {"type": "PERSON", "entity": "من", "start_offset": 70, "end_offset": 72}, {"type": "PERSON", "entity": "القرى", "start_offset": 73, "end_offset": 78}]} Is this due to the outdated model as you mentioned? — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFLPK3B26WWWUN557QULPNTV7FKABANCNFSM6AAAAAAQPP3YY4> . You are receiving this because you commented.Message ID: ***@***.***>

hatmimoha · 2022-11-29T10:57:09Z

Hello Daniel, I updated the Arabic NER model. It should be working correctly now. Have a nice day On Thu, Sep 22, 2022 at 12:20 PM Mohamed Hatmi ***@***.***> wrote:

…

Hello Daniel, Indeed, something went wrong. I think it is related to the mapping between the output of the model and the labels. I will check it and get back to you. Thank you for your feedback Have a nice day On Tue, Sep 20, 2022 at 8:24 AM Daniel Izham ***@***.***> wrote: > Thank you for your response. The output looks nicer and as expected now. > However, I noticed that the words that are not named entities are being > detected as PERSON and the named entities are labelled wrongly: > > {"text": "رغم الهدنة . . معارك قره باغ متواصلة و أذربيجان تعلن سيطرتها على مزيد من القرى", "entities": [{"type": "PERSON", "entity": "رغم", "start_offset": 1, "end_offset": 3}, {"type": "PERSON", "entity": "الهدنة", "start_offset": 4, "end_offset": 10}, {"type": "PERSON", "entity": ".", "start_offset": 11, "end_offset": 12}, {"type": "PERSON", "entity": ".", "start_offset": 13, "end_offset": 14}, {"type": "PERSON", "entity": "معارك", "start_offset": 15, "end_offset": 20}, {"type": "LOCATION", "entity": "قره", "start_offset": 21, "end_offset": 24}, {"type": "DATE", "entity": "باغ", "start_offset": 25, "end_offset": 28}, {"type": "PERSON", "entity": "متواصلة", "start_offset": 29, "end_offset": 36}, {"type": "PERSON", "entity": "و", "start_offset": 37, "end_offset": 38}, {"type": "PERSON", "entity": "أذربيجان", "start_offset": 39, "end_offset": 47}, {"type": "PERSON", "entity": "تعلن", "start_offset": 48, "end_offset": 52}, {"type": "PERSON", "entity": "سيطرتها", "start_offset": 53, "end_offset": 60}, {"type": "PERSON", "entity": "على", "start_offset": 61, "end_offset": 64}, {"type": "PERSON", "entity": "مزيد", "start_offset": 65, "end_offset": 69}, {"type": "PERSON", "entity": "من", "start_offset": 70, "end_offset": 72}, {"type": "PERSON", "entity": "القرى", "start_offset": 73, "end_offset": 78}]} > > > Is this due to the outdated model as you mentioned? > > — > Reply to this email directly, view it on GitHub > <#2 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AFLPK3B26WWWUN557QULPNTV7FKABANCNFSM6AAAAAAQPP3YY4> > . > You are receiving this because you commented.Message ID: > ***@***.***> >

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different output than the one shown on the README #2

Different output than the one shown on the README #2

danielizham commented Sep 18, 2022

hatmimoha commented Sep 19, 2022 via email

danielizham commented Sep 20, 2022

hatmimoha commented Sep 22, 2022 via email

hatmimoha commented Nov 29, 2022 via email

Different output than the one shown on the README #2

Different output than the one shown on the README #2

Comments

danielizham commented Sep 18, 2022

hatmimoha commented Sep 19, 2022 via email

danielizham commented Sep 20, 2022

hatmimoha commented Sep 22, 2022 via email

hatmimoha commented Nov 29, 2022 via email