Skip to content

Commit

Permalink
Document the --stoplist flag
Browse files Browse the repository at this point in the history
  • Loading branch information
hrs committed May 23, 2023
1 parent 609ebe7 commit 89f557c
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 6 deletions.
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ algorithm, you'll almost certainly want to use the `--no-stoplist` and
`--no-stemming` flags if your documents are written in another language
(including source code).

Optionally, you can use the `--stoplist` flag to provide a custom stoplist. A
custom stoplist is just a text file of words to ignore, separated by whitespace.

## Installation

The easiest thing is probably to grab a [compiled binary][] appropriate to your
Expand All @@ -85,9 +88,8 @@ Or just:
$ go install github.com/hrs/docsim/docsim@latest
```

Note that using `go install` that doesn't include the [`man` page][], which you
can optionally install manually by copying into e.g.
`/usr/local/share/man/man1`.
Note that using `go install` doesn't include the [`man` page][], which you can
optionally install manually by copying into e.g. `/usr/local/share/man/man1`.

[`man` page]: ./man/docsim.1

Expand Down
17 changes: 14 additions & 3 deletions man/docsim.1
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ between every document and the query.
.PP
docsim works best with English text documents, since the included stoplist and
stemming algorithm both operate on English words. Results won't be quite as
good, but those features can be disabled with the \fV\-\-no\-stemming\fR and
\fV\-\-no\-stoplist\fR flags:
good, but those features can be disabled with the \fB\-\-no\-stemming\fR and
\fB\-\-no\-stoplist\fR flags:
.IP
.nf
\f[C]
Expand All @@ -28,14 +28,22 @@ docsim --query demando.txt --no-stoplist --no-stemming ~/esperanto_notes
.fi
.PP
Similarly, the same flags can be applied to a repository of source code.
Operators (like \fV;\fR and \fV+=\fR) won't factor into the comparison, but
Operators (like \fB;\fR and \fB+=\fR) won't factor into the comparison, but
identifiers and keywords will.
.IP
.nf
\f[C]
docsim --query main.c --no-stoplist --no-stemming **/*.c
\f[R]
.fi
.PP
However, docsim does support custom stoplists with the \fB\-\-stoplist\fR flag.
.IP
.nf
\f[C]
docsim --query demando.txt --stoplist ~/esperanto_stoplist.txt --no-stemming ~/esperanto_notes
\f[R]
.fi
.SH OPTIONS
.TP
.BR \-\-best\-first
Expand Down Expand Up @@ -80,6 +88,9 @@ excluded from textual analysis.
Generally stoplists are filled with common words (like "the" and "because" in
English, or "char" and "struct" in C) that don't carry significant semantic
value.
.PP
Note that if the \fB\-\-no\-stoplist\fR flag is also set it will supersede this
one and the custom stoplist will be ignored.
.RE
.TP
.BR \-\-verbose
Expand Down

0 comments on commit 89f557c

Please sign in to comment.