diff --git a/CITATION.cff b/CITATION.cff new file mode 100644 index 0000000..102df24 --- /dev/null +++ b/CITATION.cff @@ -0,0 +1,17 @@ +cff-version: 1.2.0 +title: emlangkit +message: "If you use this software, please cite it as below." +type: software +authors: + - given-names: Olaf + family-names: Lipinski + email: o.lipinski@soton.ac.uk + affiliation: University of Southampton + orcid: 'https://orcid.org/0000-0002-2023-7617' +url: 'https://github.com/olipinski/emlangkit' +keywords: + - artificial intelligence + - reinforcement learning + - emergent communication +license: MIT +version: 0.0.1 diff --git a/README.md b/README.md index d827264..26f5c01 100644 --- a/README.md +++ b/README.md @@ -1 +1,87 @@ +[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/) +[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) + # Emergent Language Analysis Toolkit + +This toolkit aims to collect all metrics currently used in emergent communication research into one place. +The usage should be convenient and the inputs should be standardised, to ease adoption and spread of these metrics. + +## Installation + +To install emlangkit, run `pip install emlangkit`. + +Automatic tests are run for Python 3.9, 3.10, 3.11. + +## Usage + +All metrics are available through the `Language` class in `emlangkit.Language`. +This class accepts two numpy arrays as inputs - messages and observations. These +are then used, with some possible speedups, to calculate any requested metric, +as per example below + +```python +import numpy as np +from emlangkit import Language + +messages = np.array( + [[1, 2, 0, 3, 4], [1, 2, 2, 3, 4], [1, 2, 2, 3, 0], [1, 0, 0, 1, 2]] +) +observations = np.array([[4, 4], [4, 3], [3, 2], [1, 4]]) + +lang = Language(messages=messages, observations=observations) + +score, p_value = lang.topsim() + +# Mutual information already requires both language and observation entropy +mi = lang.mutual_information() + +# So this call uses less computation +lang_entropy = lang.language_entropy() +``` + +## Metrics + +Currently available metrics, with their implementations as per below. + +- Entropy (`emlangkit.metrics.entropy`) +- Mutual Information (`emlangkit.metrics.mutual_information`) +- Topographic Similarity (`emlangkit.metrics.topsim`) +- Positional Disentanglement (`emlangkit.metrics.posdis`) +- Bag-of-Words Disentanglement (`emlangkit.metrics.bosdis`) + +## Contributing + +All pull requests are welcome! Just please make sure to install pre-commit and +run the pytests before submitting a PR. Additionally, if a lot of new code is +added, please also add the relevant tests. + +## Related Libraries + +This is a non-exhaustive list of libraries related to EC research. Please feel +free to open a PR to add to it! + +- EGG - https://github.com/facebookresearch/EGG +- Harris' Articulation Scheme - https://github.com/wedddy0707/HarrisSegmentation +- CGI - + https://github.com/wedddy0707/categorial_grammar_induction_of_emergent_language + +## Sources + +Most of the base metrics are inspired or taken from either +[EGG](https://github.com/facebookresearch/EGG), or code from the paper +"Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of +Compositional Communication" +[here](https://proceedings.neurips.cc/paper/2021/hash/c2839bed26321da8b466c80a032e4714-Abstract.html). + +## Citation + +If you find emlangkit useful in your work, please cite it as below: + +``` +@software{lipinski_emlangkit_2023, + title = {emlangkit: Emergent Language Analysis Toolkit}, + url = {https://github.com/olipinski/emlangkit}, + author = {Lipinski, Olaf}, + year = {2023} +} +```