Skip to content

Learning semantic relations with distributional similarity

License

Notifications You must be signed in to change notification settings

tudarmstadt-lt/sensim

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sensim

Learning semantic relations with distributional similarity.

Contents

  • A natural language processing pipeline based on DKPro Core which utilizes Pig for local or Hadoop-based execution.

  • Annotation: segmentation, part-of-speech tagging, lemmatization, and dependency parsing based on Stanford NLP. Any of these components can be conveniently replaced with alternative implementations and models, e.g., the Stanford Parser with the Berkeley parser or a PCFG model with an RNN parser. This list provides an overview of available models.

  • Feature extraction: all subtrees along a dependency parse that involve two tokens of a specific type are extracted as features. The type of token is specified generically – implemented options are common nouns, proper nouns, and named entities, but other types of tokens can be added easily. Features are weighed using the Lexicographer's mutual information[1].

  • Classification with logistic regression as implemented in scikit-learn; see simsets for details.

  • Clustering with Chinese Whispers. Extrinsic cluster evaluation with various measures, see clustering_utils and evaluate_cw_clustering.

  • Some Root code to plot histograms of many samples and/or dimensions.

  • Evaluation for both classification and clustering is done using the BLESS data set.

Run it on a hadoop cluster in mapreduce mode

cd sensim
mvn package -Dmaven.test.skip=true -Phadoop-job
cd src/main/pig/
pig -P <propertyfile> -m <parameterfile> pipeline.pig &> <logfile>

Run locally using Pig directly

# as above but substitute last line with
pig -x local -P properties -m parameters pipeline.pig

Run locally using a JUnit test

Import this Maven project into the IDE of your choice and run the method testCoreNLPAnnotator() in CoreNLPAnnotatorTest.java.

References

[1] http://wortschatz.uni-leipzig.de/~sbordag/papers/BordagMC08.pdf
[2] http://root.cern.ch

About

Learning semantic relations with distributional similarity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 43.9%
  • Python 33.6%
  • PigLatin 21.6%
  • C 0.9%