four-letter-words

An anomaly-detecting autoencoder for detecting fake four-letter words

Autoencoders use a neural network which takes an input, passes the
activation through a low-dimensional "bottleneck" layer,
and then tries to reconstruct the original input from this sparse low-dimensional representation.

This is commonly used for dimensionality reduction, but it also
poses another interesting use-case: anomaly detection. Autoencoders
are trained by minimising the reconstruction error between the
input and output, so by training the network on a well-represented
set of inputs, the reconstruction error on the trained network
should be higher for atypical or unusual examples.

This autoencoder is trained on a corpus of (real, intelligible)
four-letter words. The intention is that it should provide a low
reconstruction error for "real-looking" four-letter words
(including actually real words like "ball" or "fish", but also
words with a similar construction to natural-sounding four-letter
words such as "pone" or "woog"), while providing a high
reconstruction error for words which are not real or
natural-sounding (such as "jqxo" or "wwov").

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
helpers		helpers
models		models
notebooks		notebooks
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate_model.py		evaluate_model.py
generate_corpus.py		generate_corpus.py
train_convnet.py		train_convnet.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

four-letter-words

About

Releases

Packages

Languages

License

rikkhill/four-letter-words

Folders and files

Latest commit

History

Repository files navigation

four-letter-words

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages