Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while Evaluating #29

Closed
Izorar opened this issue Mar 16, 2022 · 10 comments
Closed

Error while Evaluating #29

Izorar opened this issue Mar 16, 2022 · 10 comments
Labels

Comments

@Izorar
Copy link

Izorar commented Mar 16, 2022

Hello @yusanshi,
There is an issue with indexing in the evaluate file precisely line 262:

  File "src/evaluate.py", line 262, in <listcomp>
    int(news[0].split('-')[1]) for news in minibatch['impressions']
IndexError: list index out of range

This issue happens on MIND large dataset
Thanks

@yusanshi
Copy link
Owner

That's because the test set of the MIND large dataset has no labels, so the format of the test file is not correct. See #11 and msnews/MIND#8.

... and it seems that there is nothing we can do 😢

@Izorar
Copy link
Author

Izorar commented Mar 16, 2022

@yusanshi. Does it mean the labels were not released by the authors of dataset/task?

@Izorar
Copy link
Author

Izorar commented Mar 16, 2022

And if the case is that it was not released? How do we evaluate? Thanks

@yusanshi
Copy link
Owner

@yusanshi. Does it mean the labels were not released by the authors of dataset/task?

Exactly.


And if the case is that it was not released? How do we evaluate? Thanks

Please see https://msnews.github.io/ and https://competitions.codalab.org/competitions/24122#participate. Basically you need to upload the inference result to the online evaluation platform. To generate the evaluation results in required file format, you should make some changes to the code. In the earlier version of the repo there are some code that may be helpful:

if generate_txt:
answer_file.write(
f"{minibatch['impression_id'][0]} {str(list(value2rank(impression).values())).replace(' ','')}\n"
)

@Izorar
Copy link
Author

Izorar commented Mar 17, 2022

Thanks. I am going to try that out

@Izorar
Copy link
Author

Izorar commented Mar 17, 2022

There is still an issue with the model

Traceback (most recent call last):
  File "src/evaluate.py", line 289, in <module>
    './data/test/prediction.txt')
  File "/home/izorar/.local/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "src/evaluate.py", line 185, in evaluate
    news_vector = model.get_news_vector(minibatch)
  File "/home/izorar/news-recommendation/src/model/TANR/__init__.py", line 82, in get_news_vector
    return self.news_encoder(news)
  File "/home/izorar/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/izorar/news-recommendation/src/model/TANR/news_encoder.py", line 40, in forward
    title_vector = F.dropout(self.word_embedding(news['title'].to(device)),
AttributeError: 'list' object has no attribute 'to'

@yusanshi
Copy link
Owner

Looks like you're using the old version code? Please use the latest code, i.e., the newest commit in master branch. Or I don't know which code you're using so I can't give any suggestions.

@yusanshi
Copy link
Owner

AttributeError: 'list' object has no attribute 'to', so the news_encoder.py is assuming that news['title'] is a torch tensor, instead of a python list. The conversion from list to tensor should have been done in other files (dataset.py). So one possible reason is that there're some bugs with the old version dataset.py. So please git pull to apply all the changes. If this happens again, we can investigate it furtherly.

@Izorar
Copy link
Author

Izorar commented Mar 17, 2022

Thanks @yusanshi. I have done that already. I am working on writing the results to file as suggested by you and according to the standard specified by the organizers but I seem to be lost.
The problem is still the issue of label, precisely this line:

            y_list = [
                int(news[0].split('-')[1]) for news in minibatch['impressions']
            ]

in the evaluate file

@yusanshi
Copy link
Owner

yusanshi commented Mar 17, 2022

Since we have no labels so the following code makes no sense:

y_pred_list = list(impression.values())
y_list = [
int(news[0].split('-')[1]) for news in minibatch['impressions']
]
auc = roc_auc_score(y_list, y_pred_list)
mrr = mrr_score(y_list, y_pred_list)
ndcg5 = ndcg_score(y_list, y_pred_list, 5)
ndcg10 = ndcg_score(y_list, y_pred_list, 10)
aucs.append(auc)
mrrs.append(mrr)
ndcg5s.append(ndcg5)
ndcg10s.append(ndcg10)

Simply removing them should be OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants