Running `amidict` or `ami-dictionary` in Windows 10 #62

Priya-Jk-15 · 2020-06-16T14:07:42Z

I am trying to create a dictionary using amidict commands' in Windows 10.

I have installed AMI and checked its installation using ami --help. I have also used getpapers in downloading the papers and ami -search in arranging the papers with respect to the required dictionaries.

Now, I am trying to run amidict for creating a new dictionary. I was able to give the command amidict --help and it showed the commands. (as per in FAQ https://github.com/petermr/openVirus/wiki/FAQ )

But when I gave the command

amidict create --terms thymol menthol --dictionary myterprenes --directory Dictionary --outformats xml,html

for testing to create a dictionary from tigr2ess tutorial https://github.com/petermr/tigr2ess/blob/master/dictionaries/TUTORIAL.md . I got the following output as no directory given

Generic values (DictionaryCreationTool)
====================
-v to see generic values
Specific values (DictionaryCreationTool)
====================
java.lang.RuntimeException: no directory given
        at org.contentmine.ami.tools.dictionary.DictionaryCreationTool.createDictionary(DictionaryCreationTool.java:265)
[...]

I have also tried creating a new directory and gave the same above command. But the same no directory given was the output.

What shall I change in the syntax to create a new dictionary? Kindly guide me.

The text was updated successfully, but these errors were encountered:

petermr · 2020-06-16T14:23:55Z

Thanks
I think this is a at least a documentation bug (or worse) and I have to fix it.

petermr · 2020-06-16T14:57:55Z

Try the following:

put the terms in a file (e.g. terpenes.txt), one per line:

menthol
thymol

then run:

amidict -v --dictionary myterpenes --directory junkdir --input junkterms.txt create --informat list --outformats xml,html

and get

Generic values (DictionaryCreationTool)
================================
--testString: null (default)
--wikilinks: [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@701fc37a (default)
--datacols: null (default)
--hrefcols: null (default)
--informat: list (matched)
--linkcol: null (default)
--namecol: null (default)
--outformats: [Lorg.contentmine.ami.tools.AbstractAMIDictTool$DictionaryFileFormat;@4148db48 (matched)
--query: 10 (default)
--template: null (default)
--termcol: null (default)
--termfile: null (default)
--terms: null (default)
--wptype: null (default)
--help: false (default)
--version: false (default)

Specific values (DictionaryCreationTool)
================================
--testString: null (default)
--wikilinks: [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@701fc37a (default)
--datacols: null (default)
--hrefcols: null (default)
--informat: list (matched)
--linkcol: null (default)
--namecol: null (default)
--outformats: [Lorg.contentmine.ami.tools.AbstractAMIDictTool$DictionaryFileFormat;@4148db48 (matched)
--query: 10 (default)
--template: null (default)
--termcol: null (default)
--termfile: null (default)
--terms: null (default)
--wptype: null (default)
--help: false (default)
--version: false (default)
N 2; T 2
[Fatal Error] :2214:5: The element type "input" must be terminated by the matching end-tag "</input>".
0    [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: menthol; cannot parse/read stream: 
0 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: menthol; cannot parse/read stream: 
[Fatal Error] :1298:5: The element type "input" must be terminated by the matching end-tag "</input>".
187  [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: thymol; cannot parse/read stream: 
187 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: thymol; cannot parse/read stream: 
++>> myterpenes
>> dict myterpenes
writing dictionary to /Users/pm286/projects/junk/junkdir/myterpenes.xml
writing dictionary to /Users/pm286/projects/junk/junkdir/myterpenes.html

This creates the dictionaries (actually we can probably drop the html)

The Wikipedia errors mean that the format of the Wikipedia page has changed and we need to change the code. This is tedious and common with remote sites.

Priya-Jk-15 · 2020-06-16T15:40:20Z

Where should I open that file to put the terms? Should I open it in openVirus/Dictionaries?

petermr · 2020-06-16T16:19:09Z

Kareena has already correctly created
https://github.com/petermr/openVirus/blob/master/dictionaries/virus

I think you will need
https://github.com/petermr/openVirus/blob/master/dictionaries/disease

We should probably have a 6th directory:
https://github.com/petermr/openVirus/blob/master/dictionaries/test

where anyone can create and test small dictionaries

AmbrineH · 2020-06-17T06:03:26Z

While running the command as per your suggestion @petermr, the dictionary is created but there are multiple errors and some values are left out. eg in my case the input countries were 180 while the output .xml file had only 117 entries.

The errors look something like this:

Cannot add entry: nu.xom.ParsingException: The element type "input" must be terminated by the matching end-tag "</input>". at line 186, column 5
[Fatal Error] :1825:5: The element type "input" must be terminated by the matching end-tag "</input>".
1142608 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: yuan dynasty; cannot parse/read stream:
1142608 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: yuan dynasty; cannot parse/read stream:
[Fatal Error] :186:5: The element type "input" must be terminated by the matching end-tag "</input>".
<186/5>badline >                </div>
                </div>
Cannot add entry: nu.xom.ParsingException: The element type "input" must be terminated by the matching end-tag "</input>". at line 186, column 5
[Fatal Error] :1712:5: The element type "input" must be terminated by the matching end-tag "</input>".
1147681 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: zambia; cannot parse/read stream:
1147681 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: zambia; cannot parse/read stream:
[Fatal Error] :186:5: The element type "input" must be terminated by the matching end-tag "</input>".
<186/5>badline >                </div>
                </div>
Cannot add entry: nu.xom.ParsingException: The element type "input" must be terminated by the matching end-tag "</input>". at line 186, column 5
[Fatal Error] :2431:5: The element type "input" must be terminated by the matching end-tag "</input>".
1153027 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: zimbabwe; cannot parse/read stream:
1153027 [main] ERROR org.contentmine.ami.tools.AbstractAMIDictTool  - cannot parse wikipedia page for: zimbabwe; cannot parse/read stream:
[Fatal Error] :186:5: The element type "input" must be terminated by the matching end-tag "</input>".
<186/5>badline >                </div>
                </div>
Cannot add entry: nu.xom.ParsingException: The element type "input" must be terminated by the matching end-tag "</input>". at line 186, column 5
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>> country
>> dict country
writing dictionary to C:\Users\eless\country\country.xml
writing dictionary to C:\Users\eless\country\country.html
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Missing wikipedia: :

county of nassau; kingdom of aksum; kingdom of the netherlands; nassau-hadamar; northern mariana islands; polska ludowa; principality of iberia; q11908127;
q21076477; q30904761; rattanakosin kingdom; sahrawi arab democratic republic; saint lucia; sovereign military order of malta; s?o tomé and príncipe; the gambia;

Missing wikidata: :

richardofsussex · 2020-06-17T07:40:20Z

Hi, these sound like simple XML parsing errors, suggesting that your input is not well-formed XML. Use something like https://www.xmlvalidation.com/, or post it here for me to check.

AmbrineH · 2020-06-17T08:11:01Z

The input was provided as a .txt file as was indicated in the discussion within this thread do I need to convert it to XML first?

petermr · 2020-06-17T08:24:16Z

It will be the Wikimedia pages. I will have to out them through a different cleaner. Most HTML in the wild is awful.

…

On Wed, Jun 17, 2020 at 9:11 AM Ambreen H ***@***.***> wrote: The input was provided as a .txt file as was indicated in the discussion within this thread do I need to convert it to XML first? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTCS74FJXP3SSRSQ2EUXTRXB3CHANCNFSM4N7U6HWQ> .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr · 2020-06-17T08:24:43Z

sorry out => put On Wed, Jun 17, 2020 at 9:24 AM Peter Murray-Rust < [email protected]> wrote:

…

It will be the Wikimedia pages. I will have to out them through a different cleaner. Most HTML in the wild is awful. On Wed, Jun 17, 2020 at 9:11 AM Ambreen H ***@***.***> wrote: > The input was provided as a .txt file as was indicated in the discussion > within this thread do I need to convert it to XML first? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#62 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAFTCS74FJXP3SSRSQ2EUXTRXB3CHANCNFSM4N7U6HWQ> > . > -- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

richardofsussex · 2020-06-17T08:55:17Z

@petermr where are you looking up these terms? Can't we access a data source that is well-formed XML, e.g. by using an Accept header?

petermr · 2020-06-17T10:13:03Z

They are Wikipedia pages. There is no alternative. Here's the culprit

>

<form action="/w/index.php" id="searchform"> <div id="simpleSearch"> <input type="search" name="search" placeholder="Search Wikipedia" title="Search Wikipedia [f]" accesskey="f" id="searchInput"/> <input type="hidden" name="title" value="Special:Search"> <input type="submit" name="fulltext" value="Search" title="Search Wikipedia for this text" id="mw-searchButton" class="searchButton mw-fallbackSearchButton"/> <input type="submit" name="go" value="Go" title="Go to a page with this exact name if it exists" id="searchButton" class="searchButton"/> </div> </form> On Wed, Jun 17, 2020 at 9:55 AM Richard Light <[email protected]> wrote: This is HTML5 - its' not well formed. It was a pointless exercise. The onus is on me to parse it. Drives me wild. @petermr <https://github.com/petermr> where are you looking up these terms?

…

Can't we access a data source that is well-formed XML, e.g. by using an Accept header? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTCS4A5C6DAXFRUAFBR5LRXCAIJANCNFSM4N7U6HWQ> .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr · 2020-06-17T10:14:44Z

Another thing that screws people is DTDs. Often they are not resolvable and we crash. I strip all DTDs. On Wed, Jun 17, 2020 at 11:12 AM Peter Murray-Rust < [email protected]> wrote:

…

They are Wikipedia pages. There is no alternative. Here's the culprit >> <form action="/w/index.php" id="searchform"> <div id="simpleSearch"> <input type="search" name="search" placeholder="Search Wikipedia" title="Search Wikipedia [f]" accesskey="f" id="searchInput"/> <input type="hidden" name="title" value="Special:Search"> <input type="submit" name="fulltext" value="Search" title="Search Wikipedia for this text" id="mw-searchButton" class="searchButton mw-fallbackSearchButton"/> <input type="submit" name="go" value="Go" title="Go to a page with this exact name if it exists" id="searchButton" class="searchButton"/> </div> </form> On Wed, Jun 17, 2020 at 9:55 AM Richard Light ***@***.***> wrote: This is HTML5 - its' not well formed. It was a pointless exercise. The onus is on me to parse it. Drives me wild. @petermr <https://github.com/petermr> where are you looking up these > terms? Can't we access a data source that is well-formed XML, e.g. by using > an Accept header? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#62 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAFTCS4A5C6DAXFRUAFBR5LRXCAIJANCNFSM4N7U6HWQ> > . > -- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

richardofsussex · 2020-06-17T10:50:12Z

Any scope for using Wikipedia's Linked Data twin dbpedia? (Not sure from the above exactly what you are searching for.)

petermr · 2020-06-17T16:48:56Z

I don't think DBPedia gives any advantages. IN any case we are tooled up for Wikidata.

…

On Wed, Jun 17, 2020 at 11:50 AM Richard Light ***@***.***> wrote: Any scope for using Wikipedia's Linked Data twin dbpedia? (Not sure from the above exactly what you are searching for.) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTCSY35GJWPQWKT56GDT3RXCNXFANCNFSM4N7U6HWQ> .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

richardofsussex · 2020-06-17T16:52:06Z

Then why are you searching in Wikipedia?

petermr · 2020-06-17T20:33:55Z

On Wed, Jun 17, 2020 at 5:52 PM Richard Light ***@***.***> wrote: Then why are you searching in Wikipedia?

We're searching in both. * named wikipedia pages * wikimedia categories * pages with lists and * wikidata items * search lists from Wikidata

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTCS63IEFP62S3MMAOFT3RXDYEJANCNFSM4N7U6HWQ> .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

richardofsussex · 2020-06-17T20:47:59Z

OK, well the advantage of dbpedia is that you can put SPARQL queries to it, and get back machine-processible responses. Of course, it may not have the content you are searching for: it's just the content of the info boxes. If you give me a specific query requirement, I can look into how well dbpedia might be able to answer it.

petermr · 2020-06-18T07:56:26Z

But that's what Wikidata does. It has effectively subsumed DBPedia (it has all the infoboxes and a lot more directly donated.

…

On Wed, Jun 17, 2020 at 9:48 PM Richard Light ***@***.***> wrote: OK, well the advantage of dbpedia is that you can put SPARQL queries to it, and get back machine-processible responses. Of course, it may not have the content you are searching for: it's just the content of the info boxes. If you give me a specific query requirement, I can look into how well dbpedia might be able to answer it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTCS4H4QNMIWWHUDS5KZ3RXETY3ANCNFSM4N7U6HWQ> .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

richardofsussex · 2020-06-18T08:22:14Z

Wikidata is a separate exercise (with its own problems of data consistency). My point is that dbpedia is in lockstep with (= is automatically extracted from) Wikipedia, so if you're searching for a Wikipedia page with a title which matches your dictionary term, you could just as well be searching for the equivalent 'page' in dbpedia. By doing that, you can establish whether the page exists (and therefore whether the corresponding Wikipedia page exists) and get back a reliable machine-processible response. Which brings me back to my original question: what information is there in the corresponding Wikipedia page which we need, and which we can't get from dbpedia?

petermr · 2020-06-19T08:55:13Z

There is a lot of software that needs to be written. If you're volunteering to do this for DBP and show that it's superior to WD, fine. But it's not top priority. We work closely with the Wikidata group in Berlin.

…

On Thu, Jun 18, 2020 at 9:22 AM Richard Light ***@***.***> wrote: Wikidata is a separate exercise (with its own problems of data consistency). My point is that dbpedia is in lockstep with (= is automatically extracted from) Wikipedia, so if you're searching for a Wikipedia page with a title which matches your dictionary term, you could just as well be searching for the equivalent 'page' in dbpedia. By doing that, you can establish whether the page exists (and therefore whether the corresponding Wikipedia page exists) and get back a reliable machine-processible response. Which brings me back to my original question: what *information* is there in the corresponding Wikipedia page which we need, and which we can't get from dbpedia? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#62 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFTCS3JUOGCSMDLL6QE3GLRXHFEJANCNFSM4N7U6HWQ> .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

richardofsussex · 2020-06-19T09:18:16Z

Peter, please re-read my comments, noting that I am simply trying to address the problems reported above that Wikipedia responses are not processible. I am not suggesting replacing WD by DBP!

Priya-Jk-15 assigned petermr Jun 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running `amidict` or `ami-dictionary` in Windows 10 #62

Running `amidict` or `ami-dictionary` in Windows 10 #62

Priya-Jk-15 commented Jun 16, 2020

petermr commented Jun 16, 2020

petermr commented Jun 16, 2020 •

edited

Loading

Priya-Jk-15 commented Jun 16, 2020

petermr commented Jun 16, 2020

AmbrineH commented Jun 17, 2020

richardofsussex commented Jun 17, 2020

AmbrineH commented Jun 17, 2020

petermr commented Jun 17, 2020 via email

petermr commented Jun 17, 2020 via email

richardofsussex commented Jun 17, 2020

petermr commented Jun 17, 2020 via email

petermr commented Jun 17, 2020 via email

richardofsussex commented Jun 17, 2020

petermr commented Jun 17, 2020 via email

richardofsussex commented Jun 17, 2020

petermr commented Jun 17, 2020 via email

richardofsussex commented Jun 17, 2020

petermr commented Jun 18, 2020 via email

richardofsussex commented Jun 18, 2020

petermr commented Jun 19, 2020 via email

richardofsussex commented Jun 19, 2020

Running amidict or ami-dictionary in Windows 10 #62

Running amidict or ami-dictionary in Windows 10 #62

Comments

Priya-Jk-15 commented Jun 16, 2020

petermr commented Jun 16, 2020

petermr commented Jun 16, 2020 • edited Loading

Priya-Jk-15 commented Jun 16, 2020

petermr commented Jun 16, 2020

AmbrineH commented Jun 17, 2020

richardofsussex commented Jun 17, 2020

AmbrineH commented Jun 17, 2020

petermr commented Jun 17, 2020 via email

petermr commented Jun 17, 2020 via email

richardofsussex commented Jun 17, 2020

petermr commented Jun 17, 2020 via email

petermr commented Jun 17, 2020 via email

richardofsussex commented Jun 17, 2020

petermr commented Jun 17, 2020 via email

richardofsussex commented Jun 17, 2020

petermr commented Jun 17, 2020 via email

richardofsussex commented Jun 17, 2020

petermr commented Jun 18, 2020 via email

richardofsussex commented Jun 18, 2020

petermr commented Jun 19, 2020 via email

richardofsussex commented Jun 19, 2020

Running `amidict` or `ami-dictionary` in Windows 10 #62

Running `amidict` or `ami-dictionary` in Windows 10 #62

petermr commented Jun 16, 2020 •

edited

Loading