-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different output than the one shown on the README #2
Comments
Hello Daniel,
Indeed, the code does not provide the intended result. I update the git
repository and added the postprocessing function.
You need to update your repository in order to have the latest code.
I did not update the transformer model. Should be done in a few months.
Thank you for your feedback!
Have a nice day
Mohamed Hatmi
…On Sun, Sep 18, 2022 at 5:48 PM Daniel Izham ***@***.***> wrote:
Hello there,
I have found that the model is not giving the results I expect based on my
reading of the documentation on GitHub and Hugging Face.
When I tried the sentence 'رغم الهدنة .. معارك قره باغ متواصلة وأذربيجان
تعلن سيطرتها على مزيد من القرى', I got:
[[], [{'entity': 'B-LOCATION', 'score': 0.99871314, 'index': 2, 'word': 'قر', 'start': 7, 'end': 9}, {'entity': 'B-LOCATION', 'score': 0.998519, 'index': 3, 'word': '##ه', 'start': 9, 'end': 10}, {'entity': 'I-LOCATION', 'score': 0.9986701, 'index': 4, 'word': 'باغ', 'start': 11, 'end': 14}]]
when what is expected is the following:
{"text": "رغم الهدنة . . معارك قره باغ متواصلة و أذربيجان تعلن سيطرتها على مزيد من القرى", "entities": [{"type": "LOCATION", "entity": "قره باغ", "start_offset": 21, "end_offset": 28}, {"type": "ORGANIZATION", "entity": "أذربيجان", "start_offset": 39, "end_offset": 47}]}
In terms of the format, I can modify my code or the source code so that I
can process the data as I want to. But the output of the entities are not
as good and some common entities have been missed (i.e. أذربيجان). I have
ran this with multiple examples and the results are the same. Has the model
changed?
—
Reply to this email directly, view it on GitHub
<#2>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFLPK3FTQYU2TO7T5BEFUXTV642T7ANCNFSM6AAAAAAQPP3YY4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you for your response. The output looks nicer and as expected now. However, I noticed that the words that are not named entities are being detected as PERSON and the named entities are labelled wrongly:
Is this due to the outdated model as you mentioned? |
Hello Daniel,
Indeed, something went wrong. I think it is related to the mapping between
the output of the model and the labels. I will check it and get back to you.
Thank you for your feedback
Have a nice day
…On Tue, Sep 20, 2022 at 8:24 AM Daniel Izham ***@***.***> wrote:
Thank you for your response. The output looks nicer and as expected now.
However, I noticed that the words that are not named entities are being
detected as PERSON and the named entities are labelled wrongly:
{"text": "رغم الهدنة . . معارك قره باغ متواصلة و أذربيجان تعلن سيطرتها على مزيد من القرى", "entities": [{"type": "PERSON", "entity": "رغم", "start_offset": 1, "end_offset": 3}, {"type": "PERSON", "entity": "الهدنة", "start_offset": 4, "end_offset": 10}, {"type": "PERSON", "entity": ".", "start_offset": 11, "end_offset": 12}, {"type": "PERSON", "entity": ".", "start_offset": 13, "end_offset": 14}, {"type": "PERSON", "entity": "معارك", "start_offset": 15, "end_offset": 20}, {"type": "LOCATION", "entity": "قره", "start_offset": 21, "end_offset": 24}, {"type": "DATE", "entity": "باغ", "start_offset": 25, "end_offset": 28}, {"type": "PERSON", "entity": "متواصلة", "start_offset": 29, "end_offset": 36}, {"type": "PERSON", "entity": "و", "start_offset": 37, "end_offset": 38}, {"type": "PERSON", "entity": "أذربيجان", "start_offset": 39, "end_offset": 47}, {"type": "PERSON", "entity": "تعلن", "start_offset": 48, "end_offset": 52}, {"type": "PERSON", "entity": "سيطرتها", "start_offset": 53, "end_offset": 60}, {"type": "PERSON", "entity": "على", "start_offset": 61, "end_offset": 64}, {"type": "PERSON", "entity": "مزيد", "start_offset": 65, "end_offset": 69}, {"type": "PERSON", "entity": "من", "start_offset": 70, "end_offset": 72}, {"type": "PERSON", "entity": "القرى", "start_offset": 73, "end_offset": 78}]}
Is this due to the outdated model as you mentioned?
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFLPK3B26WWWUN557QULPNTV7FKABANCNFSM6AAAAAAQPP3YY4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hello Daniel,
I updated the Arabic NER model. It should be working correctly now.
Have a nice day
On Thu, Sep 22, 2022 at 12:20 PM Mohamed Hatmi ***@***.***>
wrote:
… Hello Daniel,
Indeed, something went wrong. I think it is related to the mapping between
the output of the model and the labels. I will check it and get back to you.
Thank you for your feedback
Have a nice day
On Tue, Sep 20, 2022 at 8:24 AM Daniel Izham ***@***.***>
wrote:
> Thank you for your response. The output looks nicer and as expected now.
> However, I noticed that the words that are not named entities are being
> detected as PERSON and the named entities are labelled wrongly:
>
> {"text": "رغم الهدنة . . معارك قره باغ متواصلة و أذربيجان تعلن سيطرتها على مزيد من القرى", "entities": [{"type": "PERSON", "entity": "رغم", "start_offset": 1, "end_offset": 3}, {"type": "PERSON", "entity": "الهدنة", "start_offset": 4, "end_offset": 10}, {"type": "PERSON", "entity": ".", "start_offset": 11, "end_offset": 12}, {"type": "PERSON", "entity": ".", "start_offset": 13, "end_offset": 14}, {"type": "PERSON", "entity": "معارك", "start_offset": 15, "end_offset": 20}, {"type": "LOCATION", "entity": "قره", "start_offset": 21, "end_offset": 24}, {"type": "DATE", "entity": "باغ", "start_offset": 25, "end_offset": 28}, {"type": "PERSON", "entity": "متواصلة", "start_offset": 29, "end_offset": 36}, {"type": "PERSON", "entity": "و", "start_offset": 37, "end_offset": 38}, {"type": "PERSON", "entity": "أذربيجان", "start_offset": 39, "end_offset": 47}, {"type": "PERSON", "entity": "تعلن", "start_offset": 48, "end_offset": 52}, {"type": "PERSON", "entity": "سيطرتها", "start_offset": 53, "end_offset": 60}, {"type": "PERSON", "entity": "على", "start_offset": 61, "end_offset": 64}, {"type": "PERSON", "entity": "مزيد", "start_offset": 65, "end_offset": 69}, {"type": "PERSON", "entity": "من", "start_offset": 70, "end_offset": 72}, {"type": "PERSON", "entity": "القرى", "start_offset": 73, "end_offset": 78}]}
>
>
> Is this due to the outdated model as you mentioned?
>
> —
> Reply to this email directly, view it on GitHub
> <#2 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AFLPK3B26WWWUN557QULPNTV7FKABANCNFSM6AAAAAAQPP3YY4>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello there,
I have found that the model is not giving the results I expect based on my reading of the documentation on GitHub and Hugging Face.
When I tried the sentence 'رغم الهدنة .. معارك قره باغ متواصلة وأذربيجان تعلن سيطرتها على مزيد من القرى', I got:
when what is expected is the following:
In terms of the format, I can modify my code or the source code so that I can process the data as I want to. But the output of the entities are not as good and some common entities have been missed (i.e. أذربيجان). I have ran this with multiple examples and the results are the same. Has the model changed?
The text was updated successfully, but these errors were encountered: