From 090d92f08b815139390fd5e858a00b8ea6099b26 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Adam=20Horn=C3=A1=C4=8Dek?= Date: Thu, 19 Jul 2018 13:17:49 +0200 Subject: [PATCH] Fix spelling errors --- text/chap01.tex | 4 ++-- text/chap02.tex | 14 +++++++------- text/chap04.tex | 2 +- text/chap05.tex | 10 +++++----- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/text/chap01.tex b/text/chap01.tex index 62320a5..374e7dc 100644 --- a/text/chap01.tex +++ b/text/chap01.tex @@ -16,7 +16,7 @@ \section{Overview} \begin{itemize} \item Support for multiple projects. \textbf{Project} is a directory containing source files. Most commonly, it is a directory containing the source files for one software project; thus, the name. - \item Support for authentization and authorization (\cite{OpengrokAuthLayer}). For instance, by using LDAP + \item Support for authentication and authorization (\cite{OpengrokAuthLayer}). For instance, by using LDAP \footnote{\url{https://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol}}. \item Support for multiple version control systems\footnote{\url{https://en.wikipedia.org/wiki/Version\_control}}, e.g. git, mercurial, etc. @@ -125,7 +125,7 @@ \subsubsection{Configuration} \subsubsection{REST API} \label{opengrok_rest} -Opengrok provides REST API support. This is a relatively new feature. Before that, OpenGrok had known a concept of +OpenGrok provides REST API support. This is a relatively new feature. Before that, OpenGrok had known a concept of \textit{Messages} – custom serialization of Java objects passed to the Web application via a custom port. So far, most of the REST API calls can be only made from the machine on which the OpenGrok runs. This is mainly because these REST API calls are meant as a means of communication between the Indexer and Web application diff --git a/text/chap02.tex b/text/chap02.tex index a9cc7ec..4cd6bfd 100644 --- a/text/chap02.tex +++ b/text/chap02.tex @@ -6,7 +6,7 @@ \chapter{Analysis} \begin{itemize} \item \ref{general_architecture} \textbf{General Architecture} – explains the chosen suggester architecture and how it could be combined into the overall OpenGrok architecture. - \item \ref{opengrok_modifications} \textbf{Opengrok Modifications} – describes the major modifications that had + \item \ref{opengrok_modifications} \textbf{OpenGrok Modifications} – describes the major modifications that had to be made in OpenGrok code to enable suggester functionality. \item \ref{suggester_module} \textbf{Suggester} – provides detailed explanation of how the suggester functionality was implemented. @@ -72,7 +72,7 @@ \subsubsection{Showing the Suggestions} \label{showing_suggestions} The Suggester needs to detect if the user pressed a key while having a specific input selected for which it is enabled. Upon -detecting this change, it needs to process the data, send it to the backend part of the software, processs the returned +detecting this change, it needs to process the data, send it to the backend part of the software, process the returned result and show it to the user. All this should be as quick as possible so the user considers it to be seamless. @@ -533,12 +533,12 @@ \subsection{Wildcard Query} The specific case of \textit{prefix*} is covered in the previous Section \ref{prefix_query}. Therefore, all the other cases of wildcard queries will be covered in this section. The implementation of WFST cannot be used because of its nature. -There is no way to efficiently search in WFST tokens for the query of type \textit{*sufffix}. The required result is +There is no way to efficiently search in WFST tokens for the query of type \textit{*suffix}. The required result is the same as for the prefix query: to find the terms which are accepted by the query with the top score. However, the data structure which could achieve this for generic wildcard query with the WFST performance is not known to the author. Nonetheless, the Lucene evaluation of wildcard queries could be leveraged. An automaton specific to the wildcard query is created by using the Lucene automaton implementation by replacing \texttt{?} to accept any character and \texttt{*} to accept -any string. Then the terms are filtered using this automaton. The implemenation is slower than WFST because all the terms +any string. Then the terms are filtered using this automaton. The implementation is slower than WFST because all the terms need to be filtered once if they are accepted by the automaton then they need to be filtered for the second time based on their score. @@ -931,7 +931,7 @@ \subsection{Promoting Suggestions Based on the Previous Searches} \textbf{Chosen solution} – \textit{nearest completion} and thus \textit{hybrid completion} are very intriguing and could improve the suggestions by a big margin. However, they would need to be adapted to Lucene and the implementation might not be completely straightforward. Therefore, the basic implementation of suggester will only include the -\textit{most popular completion}. Implementation of the \textit{nearest completion} is a very promising canditate for +\textit{most popular completion}. Implementation of the \textit{nearest completion} is a very promising candidate for future extensions. \subsubsection{Most Popular Completion – Simple Queries} @@ -949,7 +949,7 @@ \subsubsection{Most Popular Completion – Simple Queries} efficiency. There are multiple options how to achieve this functionality: \begin{itemize} \item \textbf{Java Map} implementation with concurrent access, e.g. \textit{ConcurrentHashMap}. This map could be stored - on the disk periodically to fulfil the persistency requirement. This solution has a few drawbacks: + on the disk periodically to fulfill the persistency requirement. This solution has a few drawbacks: \begin{itemize} \item Loss of recent data after restart/crash. \item The data are held in memory. The size of the data is non-trivial, e.g. @@ -1113,7 +1113,7 @@ \subsubsection{Most Popular Completion – Simple Queries} The memory usage increased by approximately $22$ \% for \textit{English words} dataset. However, it can be almost doubled as can be seen on \textit{Linux kernel} dataset where approximately $92$ \% size increase can be noted. The graph \ref{enc_comp} also shows the case when the encoding would use \textit{long} datatype. Although Lucene's \textit{Lookup} - interface specificies \textit{long} datatype, WFST implementation supports only \textit{int} so far. + interface specifies \textit{long} datatype, WFST implementation supports only \textit{int} so far. \item Lucene's WFST implementation does not know the notion of nodes – the data are stored only in arcs. The arcs that start in the root node might be stored in memory directly and therefore are not encoded in a byte array since these diff --git a/text/chap04.tex b/text/chap04.tex index 0b04eca..6670bd8 100644 --- a/text/chap04.tex +++ b/text/chap04.tex @@ -156,7 +156,7 @@ \section{Use as a Separate Library} \item Remove the \textit{projectsEnabled} parameter from the constructor. It is OpenGrok specific. \item Overload method \textit{search(List\textless NamedIndexReader\textgreater, SuggesterQuery, Query)} to provide possibility to search without the need for the list of \textit{IndexReader} variables. They are provided now to better - faciliate the resource reuse. However, they are not needed and could be created from the index paths specified in + facilitate the resource reuse. However, they are not needed and could be created from the index paths specified in the \textit{init(Collection\textless NamedIndexDir\textgreater)} method. The overloaded method could have the following signature: \textit{search(List\textless String\textgreater, SuggesterQuery, Query)} which would only specify index names. \item Provide a default parser which would be able to create \textit{SuggesterQuery} instances. This could be a diff --git a/text/chap05.tex b/text/chap05.tex index a320145..51ed10f 100644 --- a/text/chap05.tex +++ b/text/chap05.tex @@ -57,11 +57,11 @@ \section{Impact on Hardware Requirements} is a lookup in the WFST data structure which is optimized for this kind of scenarios. However, in other cases index searches are performed which can consume a lot of CPU time. \item \textbf{Memory} – the WFST data structures are held in memory. Although their memory footprint is very low, - one data structure needs to be created per Lucene field per project which can sum up to a signifcant value. + one data structure needs to be created per Lucene field per project which can sum up to a significant value. Also, data for most popular completion are stored in the Chronicle Map implementation which translates to additional memory consumption. \item \textbf{Disk} – the WFST data structures are stored on the disk to provide a quick startup. - The data for most popular completion need to be stored as well. The comparison of disk consumptions for different datasets + The data for most popular completion need to be stored as well. The comparison of disk consumption for different datasets can be seen in the Figure \ref{comp_suggester_size}. The data show how much percentage of the index size the suggester data take. The data were measured on the machine with the operating system macOS\footnote{\url{https://en.wikipedia.org/wiki/MacOS}} and @@ -143,7 +143,7 @@ \section{Impact on the Demo Instance} applications in a Tomcat instance. From those the most significant are: \begin{itemize} \item \textbf{JMX\footnote{\url{https://en.wikipedia.org/wiki/Java\_Management\_Extensions}} remote} – - Tomcat provides possiblity to manage and monitor the applications via JMX. Many applications use this functionality + Tomcat provides possibility to manage and monitor the applications via JMX. Many applications use this functionality to their advantage. \item \textbf{JavaMelody}\footnote{\url{https://github.com/javamelody/javamelody}} – can be added to the project as a dependency and creates a simple page with monitoring information available at \textit{application\_URI/monitoring}. @@ -240,7 +240,7 @@ \subsection{Simple prefix query across all projects} \item Prefix \texttt{c} in \textit{full} field. \end{enumerate} The requests specified all 22 projects. The results can be seen in the Figure \ref{load_test_prefix_fig}. The slow startup -can be noted; however, later requests were taking only a few miliseconds on average. +can be noted; however, later requests were taking only a few milliseconds on average. \begin{figure}[htbp] \centering @@ -2199,7 +2199,7 @@ \subsection{Simple prefix query across all projects} \subsection{Worst case across all projects} \label{load_worst} -Query \texttt{". $\vert$"} was used where $\vert$ represents a caret positon. Term $.$ occurs in $205,943$ files. All these files +Query \texttt{". $\vert$"} was used where $\vert$ represents a caret position. Term $.$ occurs in $205,943$ files. All these files need to be checked for the $.$ position and then all terms are traversed to check if they occur at the positions next to the $.$ term. Time threshold was set to the default value: $2$ seconds. The test case sends this request $1,000$ times linearly distributed during a $10$ second time period. It can be noted that the system was not able to satisfy the requests;