Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation based on NER #3

Open
alexanderpanchenko opened this issue Nov 30, 2015 · 0 comments
Open

Evaluation based on NER #3

alexanderpanchenko opened this issue Nov 30, 2015 · 0 comments

Comments

@alexanderpanchenko
Copy link
Contributor

Motivation

A preliminary evaluation of quality of clustering based on named entity recognition task. Deadline -- 16 of december.

Implementation

  1. Select manually from the results clusters that correspond to

    • names
    • surnames
    • cities
    • countries
    • names of programming languages and technologies e.g. "javascrpt"
    • companies
    • fruits and vegetables

    To select clusters look for keywords that are unambigous e.g. Pepsi or Javascript or Robert.

  2. Create an ElasticSearch index with all these clusters. Add as an attribute corresponding category. Each category can have several attributes.

  3. Download the texts here (the xml files reuters.xml and 500news.xml) https://github.com/AKSW/n3-collection

  4. Parse the xml files to get the plain text.

  5. For each word in the text retrieve clusters from ElasticSearch it belongs to. Assign category to the word. Example of the output format:

    Darmstadt  CITY  NamedEntityInText
    is 
    a 
    nice 
    city.  CITY NamedEntityInText
    John  NAME  NamedEntityInText
    Smith  SURNAME NamedEntityInText
    is 
    a 
    well-known
    layer.
    
  6. For each occurrence of the tag in the text, manually count precision as the number of correct tags vs the number of all tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant