Update README and citation

Signed-off-by: Olaf Lipinski <[email protected]>
olipinski · Nov 2, 2023 · 4c5a8bb · 4c5a8bb
1 parent 0319b2d
commit 4c5a8bb
Show file tree

Hide file tree

Showing 2 changed files with 103 additions and 0 deletions.
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,17 @@
+cff-version: 1.2.0
+title: emlangkit
+message: "If you use this software, please cite it as below."
+type: software
+authors:
+  - given-names: Olaf
+    family-names: Lipinski
+    email: [email protected]
+    affiliation: University of Southampton
+    orcid: 'https://orcid.org/0000-0002-2023-7617'
+url: 'https://github.com/olipinski/emlangkit'
+keywords:
+  - artificial intelligence
+  - reinforcement learning
+  - emergent communication
+license: MIT
+version: 0.0.1
diff --git a/README.md b/README.md
@@ -1 +1,87 @@
+[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+
 # Emergent Language Analysis Toolkit
+
+This toolkit aims to collect all metrics currently used in emergent communication research into one place. 
+The usage should be convenient and the inputs should be standardised, to ease adoption and spread of these metrics.
+
+## Installation
+
+To install emlangkit, run `pip install emlangkit`.
+
+Automatic tests are run for Python 3.9, 3.10, 3.11.
+
+## Usage
+
+All metrics are available through the `Language` class in `emlangkit.Language`.
+This class accepts two numpy arrays as inputs - messages and observations. These
+are then used, with some possible speedups, to calculate any requested metric,
+as per example below
+
+```python
+import numpy as np
+from emlangkit import Language
+
+messages = np.array(
+    [[1, 2, 0, 3, 4], [1, 2, 2, 3, 4], [1, 2, 2, 3, 0], [1, 0, 0, 1, 2]]
+)
+observations = np.array([[4, 4], [4, 3], [3, 2], [1, 4]])
+
+lang = Language(messages=messages, observations=observations)
+
+score, p_value = lang.topsim()
+
+# Mutual information already requires both language and observation entropy
+mi = lang.mutual_information()
+
+# So this call uses less computation
+lang_entropy = lang.language_entropy()
+```
+
+## Metrics
+
+Currently available metrics, with their implementations as per below.
+
+- Entropy (`emlangkit.metrics.entropy`)
+- Mutual Information (`emlangkit.metrics.mutual_information`)
+- Topographic Similarity (`emlangkit.metrics.topsim`)
+- Positional Disentanglement (`emlangkit.metrics.posdis`)
+- Bag-of-Words Disentanglement (`emlangkit.metrics.bosdis`)
+
+## Contributing
+
+All pull requests are welcome! Just please make sure to install pre-commit and
+run the pytests before submitting a PR. Additionally, if a lot of new code is
+added, please also add the relevant tests.
+
+## Related Libraries
+
+This is a non-exhaustive list of libraries related to EC research. Please feel
+free to open a PR to add to it!
+
+- EGG - https://github.com/facebookresearch/EGG
+- Harris' Articulation Scheme - https://github.com/wedddy0707/HarrisSegmentation
+- CGI -
+  https://github.com/wedddy0707/categorial_grammar_induction_of_emergent_language
+
+## Sources
+
+Most of the base metrics are inspired or taken from either
+[EGG](https://github.com/facebookresearch/EGG), or code from the paper
+"Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of
+Compositional Communication"
+[here](https://proceedings.neurips.cc/paper/2021/hash/c2839bed26321da8b466c80a032e4714-Abstract.html).
+
+## Citation
+
+If you find emlangkit useful in your work, please cite it as below:
+
+```
+@software{lipinski_emlangkit_2023,
+        title = {emlangkit: Emergent Language Analysis Toolkit},
+        url = {https://github.com/olipinski/emlangkit},
+        author = {Lipinski, Olaf},
+        year = {2023}
+}
+```