Skip to content
This repository has been archived by the owner on May 29, 2020. It is now read-only.

Add optimizations for maximum entropy tokenization #5

Open
nyxtom opened this issue Dec 19, 2012 · 1 comment
Open

Add optimizations for maximum entropy tokenization #5

nyxtom opened this issue Dec 19, 2012 · 1 comment

Comments

@nyxtom
Copy link

nyxtom commented Dec 19, 2012

Depending on the version of opennlp and whichever fork you use off of it (language of choice), you can extend tokenization to support certain pre-optimizations such as: preserving hashtags, urls, @mentions, email addresses, emoticons..etc. It would be nice if chalk supported these kind of extensive features.

@jasonbaldridge
Copy link
Member

Sorry for the delay on this. I'll look into it as I refactor things. Any help from others welcome!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants