You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. I want to finetune a model on data where some of them do not contain entities (so that there is less fp). I tried to do it with such examples in the dataset:
{'tokenized_text': ['In', 'this', 'year', '.'], 'ner': []},
And I have an error:
Traceback (most recent call last):
File "/home/jovyan/work/dev/ner/gliner/GLiNER/examples/finetuning/finetune-balanced-remove-short-orgs-empty-ner.py", line 59, in <module>
trainer.train(num_epochs=25)
File "/home/jovyan/work/dev/ner/gliner/GLiNER/examples/finetuning/trainer.py", line 213, in train
total_loss = self.model(batch)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/gliner/model.py", line 141, in forward
logits_label = scores.view(-1, num_classes)
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous
Or this format:
{'tokenized_text': ['In', 'this', 'year', '.'], 'ner': [[]]},
And error:
Traceback (most recent call last):
File "/home/jovyan/work/dev/ner/gliner/GLiNER/examples/finetuning/finetune-balanced-remove-short-orgs-empty-ner.py", line 59, in <module>
trainer.train(num_epochs=25)
File "/home/jovyan/work/dev/ner/gliner/GLiNER/examples/finetuning/trainer.py", line 208, in train
for batch_idx, batch in progress_bar:
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/accelerate/data_loader.py", line 464, in __iter__
next_batch = next(dataloader_iter)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 634, in __next__
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 678, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/usr/local/lib/python3.10/dist-packages/gliner/modules/data.py", line 83, in <lambda>
return DataLoader(data, collate_fn=lambda x: self.collate_fn(x, entity_types), **kwargs)
File "/usr/local/lib/python3.10/dist-packages/gliner/modules/data.py", line 67, in collate_fn
class_to_ids, id_to_classes = self.batch_generate_class_mappings(batch_list)
File "/usr/local/lib/python3.10/dist-packages/gliner/modules/data.py", line 42, in batch_generate_class_mappings
negs = self.get_negatives(batch_list, 100)
File "/usr/local/lib/python3.10/dist-packages/gliner/modules/data.py", line 34, in get_negatives
types = set([el[-1] for el in b['ner']])
File "/usr/local/lib/python3.10/dist-packages/gliner/modules/data.py", line 34, in <listcomp>
types = set([el[-1] for el in b['ner']])
IndexError: list index out of range
Is there any way to fix this?
The text was updated successfully, but these errors were encountered:
AnnaKholkina
changed the title
Train with balance data
Train on data without entities
Jul 2, 2024
Hi. I want to finetune a model on data where some of them do not contain entities (so that there is less fp). I tried to do it with such examples in the dataset:
{'tokenized_text': ['In', 'this', 'year', '.'], 'ner': []},
And I have an error:
Or this format:
{'tokenized_text': ['In', 'this', 'year', '.'], 'ner': [[]]},
And error:
Is there any way to fix this?
The text was updated successfully, but these errors were encountered: