-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to prepare dataset for training the model? #63
Comments
Hi, There are multiple annotation tools for entities/relations, for example brat. However, you probably need to convert the data annotated with a tool such as brat to the format used in SpERT. |
Hi! I've develop a parser to transform from brat standoff to SpERT format, it loses some data due to the complexity of brat standoff and the simplicity of the data required by SpERT, but is better than nothing. Ask me if you want it, i'll try to documentate it for it's use. |
|
@markus-eberts Thanks for answering. I would like to know more on the data side. I have very long paragraphs consider more than 1024 tokens. Does SpERT has restriction in tokens size or it uses sliding window concept to tackle long paragraphs from relation and entity extraction? I used Spacy Relation extraction model but it failed on long distance relationship extraction when two entities are present at little significant distance. Does this problem SpERT pertains? Or it can handle such long relationship extraction when entities are little far from eachother? |
Thank you so much! That would be a great if you could do so! It would benefit the audience. However, did you train the model using SpERT on your dataset? How was the performance? if you could also prepare such descriptive step by step implementation it would be super good! @karndeepsingh Im currently working with SpERT on my CS final thesis, testing different clinical-related datasets and trying to improve the model. Currently i´ve parsed all BioNLP2011 task corpus, theorically i´m able to parse any Brat Standoff formatted corpus. With BioNLP2011 SpERT performs quite good, better than expected to be honest. The following weeks i´ll clear the code a bit, and documentate it. |
@markus-eberts @Kerman-Sanjuan Thanks |
@karndeepsingh Yes, no problem, the usage is a bit tricky and has some limitations at the moment, but i can explain/improve it. |
@Kerman-Sanjuan Hi, could you please send it to my email address [email protected] ? Thanks a lot! |
@Kerman-Sanjuan Hi! I was following this issue's thread, since I'm really interested in converting my brat annotations (as @Yolalapi or @karndeepsingh) and if there's someone that's already done this work, it could save a lot of time for me and for all the community. Could you kindly share with me (or the entire community) the code to "parse" the brat annotation in json? Thank you so much for your help in advance and your time. Pierpaolo |
Hi, Thanks for sharing this awesome work. I have a few doubts please help me to understand:
I have a set of text paragraphs and want to extract entities and relationships between the entities detected. How would I prepare my dataset for NER and Relation Extraction model on this paragraph? What formate should I follow?
If any tool you could recommend or any way to prepare tor annotate he data according to the desired format that the model is expecting, it would be a great help.
Thanks.
The text was updated successfully, but these errors were encountered: