ApertiumRDF

datasets & results used when writing "Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries" (paper at LREC-2016)

New-data

This directory contains 'cycle computation' for all English nouns in the Apertium RDF data:

EN-nouns.txt: 15,630 English nouns taken from the Apertium RDF data. (multiwords removed)

EN-dict.txt: 15,630 English nouns + their context (set of translation pairs).

Targets-EN.txt: 24,356 Potential Targets generated by cycle computation + some figures (see below)

getData.py: Python script used to get gata from Apertium RDF SPARQL server (http://linguistic.linkeddata.es/sparql)

calculateCycles.py: Python script used for cycle calculation (used to generate Targets-EN.txt)

ApertiumRDF-GraphContexts.ipynb: ipynb notebook document 'analysing' the context graphs

ApertiumRDF-PotentialTargtes.ipynb: ipynb notebook document 'analysing' the generated Potential Targets

Targets-EN.txt format

Targets-EN.txt file is a csv file with:

Word: the source English word.
Cycles: the number of cycles containing source & target words.
Uniq Cycles: number of 'unique' cycles with source & target words (abcda = acdba).
Nodes: number of nodes in the Word's graph (the local context for Word).
Edges: number of edges in the Word's graph (the local context for Word).
Known Targets: number of already known targets for Word in the Apertium data.
Potential TargtespT: number of potential targets for Word (nodes in cycles not linked to Word).
Graph Density: graph density (density of the context).
Potential Target: the potential target.
Lan: indicates whether there is another Target word with same language.
Score: the cycle's density.
InC: the number of cycles the Target word occurs in with the same score.
length: the length of the cycle.

How to generate the data

$python get.Data.py en EN-nouns.txt > EN-dict.txt (*)

$python calculateCycles.py EN-dict.txt en v > Targets-EN.txt

(*) getData.py generates one dict for each input word, you need to 'join' all dicts into a single one before running calculateCycles.py script.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
EN-ES-experiment-1		EN-ES-experiment-1
EN-ES-experiment-2		EN-ES-experiment-2
FrenchExperiment		FrenchExperiment
New-data		New-data
sample words		sample words
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ApertiumRDF

New-data

Targets-EN.txt format

How to generate the data

About

Releases

Packages

Languages

martavillegas/ApertiumRDF

Folders and files

Latest commit

History

Repository files navigation

ApertiumRDF

New-data

Targets-EN.txt format

How to generate the data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages