Skip to content

Commit

Permalink
Fix spelling errors
Browse files Browse the repository at this point in the history
  • Loading branch information
ahornace committed Jul 19, 2018
1 parent c8a4e3d commit 090d92f
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 15 deletions.
4 changes: 2 additions & 2 deletions text/chap01.tex
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ \section{Overview}
\begin{itemize}
\item Support for multiple projects. \textbf{Project} is a directory containing source files. Most commonly, it is a directory
containing the source files for one software project; thus, the name.
\item Support for authentization and authorization (\cite{OpengrokAuthLayer}). For instance, by using LDAP
\item Support for authentication and authorization (\cite{OpengrokAuthLayer}). For instance, by using LDAP
\footnote{\url{https://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol}}.
\item Support for multiple version control systems\footnote{\url{https://en.wikipedia.org/wiki/Version\_control}},
e.g. git, mercurial, etc.
Expand Down Expand Up @@ -125,7 +125,7 @@ \subsubsection{Configuration}
\subsubsection{REST API}
\label{opengrok_rest}

Opengrok provides REST API support. This is a relatively new feature. Before that, OpenGrok had known a concept of
OpenGrok provides REST API support. This is a relatively new feature. Before that, OpenGrok had known a concept of
\textit{Messages} – custom serialization of Java objects passed to the Web application via a custom port.
So far, most of the REST API calls can be only made from the machine on which the OpenGrok runs.
This is mainly because these REST API calls are meant as a means of communication between the Indexer and Web application
Expand Down
14 changes: 7 additions & 7 deletions text/chap02.tex
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ \chapter{Analysis}
\begin{itemize}
\item \ref{general_architecture} \textbf{General Architecture} – explains the chosen suggester architecture and
how it could be combined into the overall OpenGrok architecture.
\item \ref{opengrok_modifications} \textbf{Opengrok Modifications} – describes the major modifications that had
\item \ref{opengrok_modifications} \textbf{OpenGrok Modifications} – describes the major modifications that had
to be made in OpenGrok code to enable suggester functionality.
\item \ref{suggester_module} \textbf{Suggester} – provides detailed explanation of how the suggester functionality
was implemented.
Expand Down Expand Up @@ -72,7 +72,7 @@ \subsubsection{Showing the Suggestions}
\label{showing_suggestions}

The Suggester needs to detect if the user pressed a key while having a specific input selected for which it is enabled. Upon
detecting this change, it needs to process the data, send it to the backend part of the software, processs the returned
detecting this change, it needs to process the data, send it to the backend part of the software, process the returned
result and show it to
the user. All this should be as quick as possible so the user considers it to be seamless.

Expand Down Expand Up @@ -533,12 +533,12 @@ \subsection{Wildcard Query}
The specific case of \textit{prefix*} is covered in the previous Section \ref{prefix_query}. Therefore,
all the other cases of wildcard queries will be covered in this section.
The implementation of WFST cannot be used because of its nature.
There is no way to efficiently search in WFST tokens for the query of type \textit{*sufffix}. The required result is
There is no way to efficiently search in WFST tokens for the query of type \textit{*suffix}. The required result is
the same as for the prefix query: to find the terms which are accepted by the query with the top score. However, the data
structure which could achieve this for generic wildcard query with the WFST performance is not known to the author.
Nonetheless, the Lucene evaluation of wildcard queries could be leveraged. An automaton specific to the wildcard query is
created by using the Lucene automaton implementation by replacing \texttt{?} to accept any character and \texttt{*} to accept
any string. Then the terms are filtered using this automaton. The implemenation is slower than WFST because all the terms
any string. Then the terms are filtered using this automaton. The implementation is slower than WFST because all the terms
need to be filtered once if they are accepted by the automaton then they need to be filtered for the second time based on
their score.

Expand Down Expand Up @@ -931,7 +931,7 @@ \subsection{Promoting Suggestions Based on the Previous Searches}
\textbf{Chosen solution} – \textit{nearest completion} and thus \textit{hybrid completion} are very intriguing and could
improve the suggestions by a big margin. However, they would need to be adapted to Lucene and the implementation
might not be completely straightforward. Therefore, the basic implementation of suggester will only include the
\textit{most popular completion}. Implementation of the \textit{nearest completion} is a very promising canditate for
\textit{most popular completion}. Implementation of the \textit{nearest completion} is a very promising candidate for
future extensions.

\subsubsection{Most Popular Completion – Simple Queries}
Expand All @@ -949,7 +949,7 @@ \subsubsection{Most Popular Completion – Simple Queries}
efficiency. There are multiple options how to achieve this functionality:
\begin{itemize}
\item \textbf{Java Map} implementation with concurrent access, e.g. \textit{ConcurrentHashMap}. This map could be stored
on the disk periodically to fulfil the persistency requirement. This solution has a few drawbacks:
on the disk periodically to fulfill the persistency requirement. This solution has a few drawbacks:
\begin{itemize}
\item Loss of recent data after restart/crash.
\item The data are held in memory. The size of the data is non-trivial, e.g.
Expand Down Expand Up @@ -1113,7 +1113,7 @@ \subsubsection{Most Popular Completion – Simple Queries}
The memory usage increased by approximately $22$ \% for \textit{English words} dataset. However, it can be
almost doubled as can be seen on \textit{Linux kernel} dataset where approximately $92$ \% size increase can be noted.
The graph \ref{enc_comp} also shows the case when the encoding would use \textit{long} datatype. Although Lucene's \textit{Lookup}
interface specificies \textit{long} datatype, WFST implementation supports only \textit{int} so far.
interface specifies \textit{long} datatype, WFST implementation supports only \textit{int} so far.

\item Lucene's WFST implementation does not know the notion of nodes – the data are stored only in arcs. The arcs that
start in the root node might be stored in memory directly and therefore are not encoded in a byte array since these
Expand Down
2 changes: 1 addition & 1 deletion text/chap04.tex
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ \section{Use as a Separate Library}
\item Remove the \textit{projectsEnabled} parameter from the constructor. It is OpenGrok specific.
\item Overload method \textit{search(List\textless NamedIndexReader\textgreater, SuggesterQuery, Query)} to provide
possibility to search without the need for the list of \textit{IndexReader} variables. They are provided now to better
faciliate the resource reuse. However, they are not needed and could be created from the index paths specified in
facilitate the resource reuse. However, they are not needed and could be created from the index paths specified in
the \textit{init(Collection\textless NamedIndexDir\textgreater)} method. The overloaded method could have the following signature:
\textit{search(List\textless String\textgreater, SuggesterQuery, Query)} which would only specify index names.
\item Provide a default parser which would be able to create \textit{SuggesterQuery} instances. This could be a
Expand Down
10 changes: 5 additions & 5 deletions text/chap05.tex
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,11 @@ \section{Impact on Hardware Requirements}
is a lookup in the WFST data structure which is optimized for this kind of scenarios. However, in other cases
index searches are performed which can consume a lot of CPU time.
\item \textbf{Memory} – the WFST data structures are held in memory. Although their memory footprint is very low,
one data structure needs to be created per Lucene field per project which can sum up to a signifcant value.
one data structure needs to be created per Lucene field per project which can sum up to a significant value.
Also, data for most popular completion are stored in the Chronicle Map implementation which translates to additional
memory consumption.
\item \textbf{Disk} – the WFST data structures are stored on the disk to provide a quick startup.
The data for most popular completion need to be stored as well. The comparison of disk consumptions for different datasets
The data for most popular completion need to be stored as well. The comparison of disk consumption for different datasets
can be seen in the Figure \ref{comp_suggester_size}. The data show how much percentage of the index size the suggester
data take. The data were measured on the machine with the operating system
macOS\footnote{\url{https://en.wikipedia.org/wiki/MacOS}} and
Expand Down Expand Up @@ -143,7 +143,7 @@ \section{Impact on the Demo Instance}
applications in a Tomcat instance. From those the most significant are:
\begin{itemize}
\item \textbf{JMX\footnote{\url{https://en.wikipedia.org/wiki/Java\_Management\_Extensions}} remote} –
Tomcat provides possiblity to manage and monitor the applications via JMX. Many applications use this functionality
Tomcat provides possibility to manage and monitor the applications via JMX. Many applications use this functionality
to their advantage.
\item \textbf{JavaMelody}\footnote{\url{https://github.com/javamelody/javamelody}} – can be added to the project as a
dependency and creates a simple page with monitoring information available at \textit{application\_URI/monitoring}.
Expand Down Expand Up @@ -240,7 +240,7 @@ \subsection{Simple prefix query across all projects}
\item Prefix \texttt{c} in \textit{full} field.
\end{enumerate}
The requests specified all 22 projects. The results can be seen in the Figure \ref{load_test_prefix_fig}. The slow startup
can be noted; however, later requests were taking only a few miliseconds on average.
can be noted; however, later requests were taking only a few milliseconds on average.

\begin{figure}[htbp]
\centering
Expand Down Expand Up @@ -2199,7 +2199,7 @@ \subsection{Simple prefix query across all projects}

\subsection{Worst case across all projects}
\label{load_worst}
Query \texttt{". $\vert$"} was used where $\vert$ represents a caret positon. Term $.$ occurs in $205,943$ files. All these files
Query \texttt{". $\vert$"} was used where $\vert$ represents a caret position. Term $.$ occurs in $205,943$ files. All these files
need to be checked for the $.$ position and then all terms are traversed to check if they occur at the positions next to the
$.$ term. Time threshold was set to the default value: $2$ seconds. The test case sends this request $1,000$ times
linearly distributed during a $10$ second time period. It can be noted that the system was not able to satisfy the requests;
Expand Down

0 comments on commit 090d92f

Please sign in to comment.