Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cort-predict-raw runs on python2 but not python3.5 #17

Open
bennytieu opened this issue Apr 27, 2017 · 4 comments
Open

cort-predict-raw runs on python2 but not python3.5 #17

bennytieu opened this issue Apr 27, 2017 · 4 comments

Comments

@bennytieu
Copy link

bennytieu commented Apr 27, 2017

I was trying to run cort-predict-raw with following command:

python3.5 /usr/local/bin/cort-predict-raw -in ~/data/pilot_44_docs/*.txt
-model models/model-pair-train.obj
-extractor cort.coreference.approaches.mention_ranking.extract_substructures
-perceptron cort.coreference.approaches.mention_ranking.RankingPerceptron
-clusterer cort.coreference.clusterer.all_ante
-corenlp ~/systems/stanford/stanford-corenlp-full-2016-10-31

and got the following error message:

Traceback (most recent call last):
File "/usr/local/bin/cort-predict-raw", line 136, in
doc.system_mentions = mention_extractor.extract_system_mentions(doc)
File "/usr/local/lib/python3.5/dist-packages/cort/core/mention_extractor.py", line 36, in extract_system_mentions
for span in __extract_system_mention_spans(document)]
File "/usr/local/lib/python3.5/dist-packages/cort/core/mention_extractor.py", line 36, in
for span in __extract_system_mention_spans(document)]
File "/usr/local/lib/python3.5/dist-packages/cort/core/mentions.py", line 126, in from_document
i, sentence_span = document.get_sentence_id_and_span(span)
TypeError: 'NoneType' object is not iterable
2017-04-27 09:17:06,058 WARNING Killing subprocess 14154
2017-04-27 09:17:06,395 INFO Subprocess seems to be stopped, exit code -9

It works without a problem with python2 though. I'm running this on Ubuntu16.04.

@smartschat
Copy link
Owner

Can you isolate (and post) the document which causes the error message?

@bennytieu
Copy link
Author

I have isolated it to this string:

Contact for company: Sven Svensson 212 584 5242
[email protected].

I'm guessing it is the sequence of number that is at fault. Single instances of numbers are ok, for example, there are years like 2017 in other documents that are fine.

This example works:

Contact for company: Sven Svensson 584 5242
[email protected].

@smartschat
Copy link
Owner

I did some debugging, the first example is tokenized as ['Contact', 'for', 'company', ':', 'Sven', 'Svensson', '212Â\xa0584Â\xa05242', '[email protected]', '.']. I suspect that the TypeError happens because some representation I rely on handles the numbers as individual tokens. I will not be able to fix this right now, is using Python2 an option for you?

@bennytieu
Copy link
Author

I will try and run on Python2 in the meantime or just skip this special case. I'm doing a study on efficiency, so it would be most optimal to run it using Python3. Thank you for your quick reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants