-
Notifications
You must be signed in to change notification settings - Fork 0
testscript
Michal Koutny edited this page Apr 18, 2011
·
5 revisions
-
works with letter models
-
input file format
- utf-8 encoding
- one sentence per line
- sentence consists of space separated words
-
output (in textual form)
- cross entropy of chosen model over given file(s)
- average no. of keystrokes per character
- stems from model of only arrow-enter key model
- no. of keystrokes needed is (o_c + 1), where (o_c) is order of the char in a probability sorted list, the one is for enter
-
modes of operation
- name of the model is a parameter
- simple
- model should be already trained
- for each filename CLI argument calculates the measures
- for more files calculates also mean value and variance
- leaving-one-out
- (N) is a parameter
- model must support receiving trainig data
- no. of files must be either one or a multiple of (N)
- for one file it must have at least (N) lines
- input data are divided into (N) groups (units are either files or lines)
- each test leaves one group out
- that is held out data/test data Note: clear out the differce.
- the rest is used as training data
- the output is then analogous to simple mode with more files