Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running amidict or ami-dictionary in Windows 10 #62

Open
Priya-Jk-15 opened this issue Jun 16, 2020 · 21 comments
Open

Running amidict or ami-dictionary in Windows 10 #62

Priya-Jk-15 opened this issue Jun 16, 2020 · 21 comments
Assignees

Comments

@Priya-Jk-15
Copy link
Collaborator

I am trying to create a dictionary using amidict commands' in Windows 10.

I have installed AMI and checked its installation using ami --help. I have also used getpapers in downloading the papers and ami -search in arranging the papers with respect to the required dictionaries.
ami error unknown1 (2)

Now, I am trying to run amidict for creating a new dictionary. I was able to give the command amidict --help and it showed the commands. (as per in FAQ https://github.com/petermr/openVirus/wiki/FAQ )
amidict

But when I gave the command

amidict create --terms thymol menthol --dictionary myterprenes --directory Dictionary --outformats xml,html

for testing to create a dictionary from tigr2ess tutorial https://github.com/petermr/tigr2ess/blob/master/dictionaries/TUTORIAL.md . I got the following output as no directory given

Generic values (DictionaryCreationTool)
====================
-v to see generic values
Specific values (DictionaryCreationTool)
====================
java.lang.RuntimeException: no directory given
        at org.contentmine.ami.tools.dictionary.DictionaryCreationTool.createDictionary(DictionaryCreationTool.java:265)
[...]

I have also tried creating a new directory and gave the same above command. But the same no directory given was the output.

What shall I change in the syntax to create a new dictionary? Kindly guide me.

@petermr
Copy link
Owner

petermr commented Jun 16, 2020

Thanks
I think this is a at least a documentation bug (or worse) and I have to fix it.

@petermr
Copy link
Owner

petermr commented Jun 16, 2020

Try the following:

  • put the terms in a file (e.g. terpenes.txt), one per line:
menthol
thymol

then run:

amidict -v --dictionary myterpenes --directory junkdir --input junkterms.txt create --informat list --outformats xml,html

and get

Generic values (DictionaryCreationTool)
================================
--testString: null (default)
--wikilinks: [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@701fc37a (default)
--datacols: null (default)
--hrefcols: null (default)
--informat: list (matched)
--linkcol: null (default)
--namecol: null (default)
--outformats: [Lorg.contentmine.ami.tools.AbstractAMIDictTool$DictionaryFileFormat;@4148db48 (matched)
--query: 10 (default)
--template: null (default)
--termcol: null (default)
--termfile: null (default)
--terms: null (default)
--wptype: null (default)
--help: false (default)
--version: false (default)

Specific values (DictionaryCreationTool)
================================
--testString: null (default)
--wikilinks: [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@701fc37a (default)
--datacols: null (default)
--hrefcols: null (default)
--informat: list (matched)
--linkcol: null (default)
--namecol: null (default)
--outformats: [Lorg.contentmine.ami.tools.AbstractAMIDictTool$DictionaryFileFormat;@4148db48 (matched)
--query: 10 (default)
--template: null (default)
--termcol: null (default)
--termfile: null (default)
--terms: null (default)
--wptype: null (default)
--help: false (default)
--version: false (default)
N 2; T 2
[Fatal Error] :2214:5: The element type "input" must be terminated by the matching end-tag "</input>".
0    [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: menthol; cannot parse/read stream: 
0 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: menthol; cannot parse/read stream: 
[Fatal Error] :1298:5: The element type "input" must be terminated by the matching end-tag "</input>".
187  [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: thymol; cannot parse/read stream: 
187 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: thymol; cannot parse/read stream: 
++>> myterpenes
>> dict myterpenes
writing dictionary to /Users/pm286/projects/junk/junkdir/myterpenes.xml
writing dictionary to /Users/pm286/projects/junk/junkdir/myterpenes.html

This creates the dictionaries (actually we can probably drop the html)

The Wikipedia errors mean that the format of the Wikipedia page has changed and we need to change the code. This is tedious and common with remote sites.

@Priya-Jk-15
Copy link
Collaborator Author

Where should I open that file to put the terms? Should I open it in openVirus/Dictionaries?

@petermr
Copy link
Owner

petermr commented Jun 16, 2020

Kareena has already correctly created
https://github.com/petermr/openVirus/blob/master/dictionaries/virus

I think you will need
https://github.com/petermr/openVirus/blob/master/dictionaries/disease

We should probably have a 6th directory:
https://github.com/petermr/openVirus/blob/master/dictionaries/test

where anyone can create and test small dictionaries

@AmbrineH
Copy link
Collaborator

While running the command as per your suggestion @petermr, the dictionary is created but there are multiple errors and some values are left out. eg in my case the input countries were 180 while the output .xml file had only 117 entries.

The errors look something like this:

Cannot add entry: nu.xom.ParsingException: The element type "input" must be terminated by the matching end-tag "</input>". at line 186, column 5
[Fatal Error] :1825:5: The element type "input" must be terminated by the matching end-tag "</input>".
1142608 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: yuan dynasty; cannot parse/read stream:
1142608 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: yuan dynasty; cannot parse/read stream:
[Fatal Error] :186:5: The element type "input" must be terminated by the matching end-tag "</input>".
<186/5>badline >                </div>
                </div>
Cannot add entry: nu.xom.ParsingException: The element type "input" must be terminated by the matching end-tag "</input>". at line 186, column 5
[Fatal Error] :1712:5: The element type "input" must be terminated by the matching end-tag "</input>".
1147681 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: zambia; cannot parse/read stream:
1147681 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: zambia; cannot parse/read stream:
[Fatal Error] :186:5: The element type "input" must be terminated by the matching end-tag "</input>".
<186/5>badline >                </div>
                </div>
Cannot add entry: nu.xom.ParsingException: The element type "input" must be terminated by the matching end-tag "</input>". at line 186, column 5
[Fatal Error] :2431:5: The element type "input" must be terminated by the matching end-tag "</input>".
1153027 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: zimbabwe; cannot parse/read stream:
1153027 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: zimbabwe; cannot parse/read stream:
[Fatal Error] :186:5: The element type "input" must be terminated by the matching end-tag "</input>".
<186/5>badline >                </div>
                </div>
Cannot add entry: nu.xom.ParsingException: The element type "input" must be terminated by the matching end-tag "</input>". at line 186, column 5
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>> country
>> dict country
writing dictionary to C:\Users\eless\country\country.xml
writing dictionary to C:\Users\eless\country\country.html
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Missing wikipedia: :

county of nassau; kingdom of aksum; kingdom of the netherlands; nassau-hadamar; northern mariana islands; polska ludowa; principality of iberia; q11908127;
q21076477; q30904761; rattanakosin kingdom; sahrawi arab democratic republic; saint lucia; sovereign military order of malta; s?o tomé and príncipe; the gambia;

Missing wikidata: : 

@richardofsussex
Copy link
Collaborator

Hi, these sound like simple XML parsing errors, suggesting that your input is not well-formed XML. Use something like https://www.xmlvalidation.com/, or post it here for me to check.

@AmbrineH
Copy link
Collaborator

The input was provided as a .txt file as was indicated in the discussion within this thread do I need to convert it to XML first?

@petermr
Copy link
Owner

petermr commented Jun 17, 2020 via email

@petermr
Copy link
Owner

petermr commented Jun 17, 2020 via email

@richardofsussex
Copy link
Collaborator

@petermr where are you looking up these terms? Can't we access a data source that is well-formed XML, e.g. by using an Accept header?

@petermr
Copy link
Owner

petermr commented Jun 17, 2020 via email

@petermr
Copy link
Owner

petermr commented Jun 17, 2020 via email

@richardofsussex
Copy link
Collaborator

Any scope for using Wikipedia's Linked Data twin dbpedia? (Not sure from the above exactly what you are searching for.)

@petermr
Copy link
Owner

petermr commented Jun 17, 2020 via email

@richardofsussex
Copy link
Collaborator

Then why are you searching in Wikipedia?

@petermr
Copy link
Owner

petermr commented Jun 17, 2020 via email

@richardofsussex
Copy link
Collaborator

OK, well the advantage of dbpedia is that you can put SPARQL queries to it, and get back machine-processible responses. Of course, it may not have the content you are searching for: it's just the content of the info boxes. If you give me a specific query requirement, I can look into how well dbpedia might be able to answer it.

@petermr
Copy link
Owner

petermr commented Jun 18, 2020 via email

@richardofsussex
Copy link
Collaborator

Wikidata is a separate exercise (with its own problems of data consistency). My point is that dbpedia is in lockstep with (= is automatically extracted from) Wikipedia, so if you're searching for a Wikipedia page with a title which matches your dictionary term, you could just as well be searching for the equivalent 'page' in dbpedia. By doing that, you can establish whether the page exists (and therefore whether the corresponding Wikipedia page exists) and get back a reliable machine-processible response. Which brings me back to my original question: what information is there in the corresponding Wikipedia page which we need, and which we can't get from dbpedia?

@petermr
Copy link
Owner

petermr commented Jun 19, 2020 via email

@richardofsussex
Copy link
Collaborator

Peter, please re-read my comments, noting that I am simply trying to address the problems reported above that Wikipedia responses are not processible. I am not suggesting replacing WD by DBP!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants