Train on data without entities #139

AnnaKholkina · 2024-07-02T14:29:29Z

Hi. I want to finetune a model on data where some of them do not contain entities (so that there is less fp). I tried to do it with such examples in the dataset:
{'tokenized_text': ['In', 'this', 'year', '.'], 'ner': []},
And I have an error:

Traceback (most recent call last):
  File "/home/jovyan/work/dev/ner/gliner/GLiNER/examples/finetuning/finetune-balanced-remove-short-orgs-empty-ner.py", line 59, in <module>
    trainer.train(num_epochs=25)
  File "/home/jovyan/work/dev/ner/gliner/GLiNER/examples/finetuning/trainer.py", line 213, in train
    total_loss = self.model(batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/gliner/model.py", line 141, in forward
    logits_label = scores.view(-1, num_classes)
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

Or this format:
{'tokenized_text': ['In', 'this', 'year', '.'], 'ner': [[]]},
And error:

Traceback (most recent call last):
  File "/home/jovyan/work/dev/ner/gliner/GLiNER/examples/finetuning/finetune-balanced-remove-short-orgs-empty-ner.py", line 59, in <module>
    trainer.train(num_epochs=25)
  File "/home/jovyan/work/dev/ner/gliner/GLiNER/examples/finetuning/trainer.py", line 208, in train
    for batch_idx, batch in progress_bar:
  File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.10/dist-packages/accelerate/data_loader.py", line 464, in __iter__
    next_batch = next(dataloader_iter)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 678, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.10/dist-packages/gliner/modules/data.py", line 83, in <lambda>
    return DataLoader(data, collate_fn=lambda x: self.collate_fn(x, entity_types), **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/gliner/modules/data.py", line 67, in collate_fn
    class_to_ids, id_to_classes = self.batch_generate_class_mappings(batch_list)
  File "/usr/local/lib/python3.10/dist-packages/gliner/modules/data.py", line 42, in batch_generate_class_mappings
    negs = self.get_negatives(batch_list, 100)
  File "/usr/local/lib/python3.10/dist-packages/gliner/modules/data.py", line 34, in get_negatives
    types = set([el[-1] for el in b['ner']])
  File "/usr/local/lib/python3.10/dist-packages/gliner/modules/data.py", line 34, in <listcomp>
    types = set([el[-1] for el in b['ner']])
IndexError: list index out of range

Is there any way to fix this?

The text was updated successfully, but these errors were encountered:

urchade · 2024-07-03T14:27:22Z

You cannot train the model without any entity types. The model needs entity types to compute de matching scores.

you can pre-define the list of labels under the key "label", if the list of named entities is empty:

{'tokenized_text': ['In', 'this', 'year', '.'], 'ner': [], 'label': ["person", "org"]}

AnnaKholkina changed the title ~~Train with balance data~~ Train on data without entities Jul 2, 2024

KameniAlexNea mentioned this issue Sep 11, 2024

RuntimeError: The input size 0, plus negative padding 0 and 0 resulted in a negative output size, which is invalid. Check dimension 1 of your input. #188

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train on data without entities #139

Train on data without entities #139

AnnaKholkina commented Jul 2, 2024

urchade commented Jul 3, 2024

Train on data without entities #139

Train on data without entities #139

Comments

AnnaKholkina commented Jul 2, 2024

urchade commented Jul 3, 2024