Program documentation first iteration

ahornace · Jul 15, 2018 · 5c55d74 · 5c55d74
1 parent 286d616
commit 5c55d74
Show file tree

Hide file tree

Showing 3 changed files with 107 additions and 7 deletions.
diff --git a/text/chap02.tex b/text/chap02.tex
@@ -1169,6 +1169,7 @@ \subsubsection{Most Popular Completion – Complex Queries}
 As a result, the complex query is only dismantled into simple term queries which are used to update the popularity term counts.
 
 \subsubsection{Increasing Search Counts for Specific Terms}
+\label{increasing_search_counts}
 Administrators might already have some search data stored. These data could be used to initialize or update the
 popularity term counts.
 
@@ -1179,7 +1180,7 @@ \subsubsection{Increasing Search Counts for Specific Terms}
             \item Target endpoint:
             \begin{itemize}
                 \item Request method: POST
-                \item Endpoint: /suggest/add
+                \item Endpoint: /suggest/init/queries
                 \item Media type: application/json
             \end{itemize}
             \item Example data:
@@ -1192,7 +1193,7 @@ \subsubsection{Increasing Search Counts for Specific Terms}
             \item Target endpoint:
             \begin{itemize}
                 \item Request method: POST
-                \item Endpoint: /suggest/add
+                \item Endpoint: /suggest/init/raw
                 \item Media type: application/json
             \end{itemize}
             \item Example data:
@@ -1295,3 +1296,6 @@ \subsection{Tolerating Errors}
 distance $k$ of $s$. $s$ is called a \textit{k-extension}. Then, for some prefix it is possible to consider its
 k-extensions up to some constant $k$. Authors presented Trie and Q-gram based algorithms. However, this would need
 another data structures and does not work for other types of queries. Therefore, it is considered as a possible extension.
+
+\subsection{Retrieving Popularity Data}
+\label{retrieving_popularity_data}
diff --git a/text/chap04.tex b/text/chap04.tex
@@ -6,7 +6,7 @@ \chapter{Program Documentation}
 \section{Overview}
 As mentioned, suggestions are retrieved by a REST API call made to the Web module. This module processes the data and
 invokes the Suggester module to return the suggestions. Simplified diagram which show the main interactions between the
-objects and modules can be seen in the figure \ref{programmer_sequence}.
+objects and modules can be seen in the Figure \ref{programmer_sequence}.
 \begin{figure}[htbp]
     \centering
     \includegraphics[width=145mm]{../img/programmer_sequence.pdf}
@@ -15,18 +15,113 @@ \section{Overview}
 \end{figure}
 
 \section{Web module}
+Suggester support was added as a part to the already existing OpenGrok's REST API. The \textit{SuggesterController}
+class serves as a REST API endpoint for suggester related queries. Other needed classes are located in the
+\textit{org.opensolaris.opengrok.web.api.v1.suggester} package. Classes which are auto-detected by the Jersey
+implementation are located in \textit{org.opensolaris.opengrok.web.api.v1.suggester.provider} package and are annotated
+by the \textit{@Provider}\footnote{\url{https://docs.oracle.com/javaee/7/api/javax/ws/rs/ext/Provider.html}} annotation.
+The communication with the Suggester module is abstracted in \textit{SuggesterService} interface and implemented in
+\textit{SuggesterServiceImpl} class. The implementation is injected into the classes which require the
+\textit{SuggesterService} by the use of
+\textit{@Inject}\footnote{\url{https://docs.oracle.com/javaee/7/api/javax/inject/Inject.html}} annotation. If some other
+component which is not part of the REST API part of the application needs to access the \textit{SuggesterService},
+it can retrieve it by invoking \textit{getDefault()} method in \textit{SuggesterServiceFactory} class.
+
+\subsubsection{Configuration Change}
+The suggester data structures need to be refreshed on configuration change.
+Many basic OpenGrok properties might have changed, for instance:
+\begin{itemize}
+    \item \textit{dataRoot} which specifies where OpenGrok stores the data.
+    \item Suggester configuration.
+    \item Projects.
+\end{itemize}
+
+It is not trivial to notice all the changes and it would not be feasible from the modifiability point of view to check
+specific values for change. Therefore, \textit{SuggesterServiceImpl} closes the underlying suggester data structures
+and tries to reinitialize the suggester with the new configuration. If the configuration have not changed in a major
+aspect (e.g. different \textit{dataRoot} or new projects were added), then the Suggester module just reloads the data
+from disk. Reload from disk should be fast (from the experience less than $1s$).
 
 \section{Suggester module}
+The Suggester module is located under the \textit{suggester} directory of the root of the project. The main class which
+exposes the suggester functionality is the \textit{Suggester} class. Its public API is described in \ref{public_api}.
 
 \subsubsection{Public API}
+\label{public_api}
+The public API consists of the following methods:
+\begin{itemize}
+    \item \textit{init(Collection\textless Path\textgreater)} – initializes all the data structures based on the paths
+    to the indexes.
+    \item \textit{remove(Iterable\textless String\textgreater)} – removes all the data structures stored for the
+    specified names.
+    \item \textit{search(List\textless NamedIndexReader\textgreater, SuggesterQuery, Query)} –
+    searches the suggester data and returns suggestions to the provided queries.
+    \item \textit{onSearch(Iterable\textless String\textgreater, Query)} –
+    event signalization that the query was searched. Updates the data structures for the most popular completion.
+    \item \textit{increaseSearchCount(String, Term, int)} – updates search count for the specified project and term by the
+    specified int value. Negative values are not allowed.
+    \item \textit{getSearchCounts(String, String, int, int)} – returns popularity data for specified project and field.
+    \item \textit{close()} – closes the suggester data structures and other functionality.
+\end{itemize}
+
+\subsubsection{Suggester Data}
+The following shows the typical \textit{dataRoot} content for multi-project OpenGrok setup along with the suggester data.
+\dirtree{%
+.1 \textit{dataRoot}\DTcomment{OpenGrok's data root specified in configuration}.
+.2 historycache\DTcomment{history data for files}.
+.3 project1\DTcomment{history data for files of \textit{project1}}.
+.4 {…}.
+.3 {…}.
+.2 index\DTcomment{index data}.
+.3 project1\DTcomment{index data for \textit{project1}}.
+.4 {…}.
+.3 {…}.
+.2 statistics.json\DTcomment{stored statistics data}.
+.2 suggester\DTcomment{suggester data}.
+.3 project1\DTcomment{suggester data for \textit{project1}}.
+.4 \{field\}\_map.cfg\DTcomment{stored Chronicle Map configuration}.
+.4 \{field\}\_search\_count.db\DTcomment{stored Chronicle Map}.
+.4 \{field\}.wfst\DTcomment{stored WFST data structure}.
+.4 version.txt\DTcomment{version of the suggester data}.
+.3 {…}.
+.2 timestamp\DTcomment{timestamp of last indexing}.
+.2 xref\DTcomment{pre-generated HTML files}.
+.3 project1\DTcomment{pre-generated HTML files for \textit{project1}}.
+.4 {…}.
+.3 {…}.
+}
 
-\subsubsection{Suggester data}
+In the case of single-project OpenGrok setup, there is no \textit{project1} as specified in the previous example but the
+data are stored directly in directories \textit{historycache, index, suggester} and \textit{xref}.
 
-\subsubsection{Detecting index version}
+The \textit{\{field\}} represents the Lucene field for which the files contain the data, e.g. \textit{full}.
+If some data are corrupted and
+suggester is not able to read them, the best solution might be:
+\begin{enumerate}
+    \item If possible, then backup the popularity data as specified in \ref{retrieving_popularity_data}. (Not needed
+    if popularity data is not important or turned off.)
+    \item Stop the web application.
+    \item Remove either the corrupted data or the whole suggester directory.
+    \item Start the web application again.
+    \item Initialize the popularity data as specified in \ref{increasing_search_counts}. Some data modifications would
+    be needed; however, if the need arises, an automatic tool might be created for this purpose. (Not needed
+    if popularity data is not important or turned off.)
+\end{enumerate}
 
-\subsubsection{Configuration change}
+\subsubsection{Detecting Index Version}
+The Indexer might run even if the Web application is turned off. Therefore, it would not be possible to notify the
+suggester about the change. As a result, the suggester needs to detect this by itself. Exactly for this purpose serves the
+aforementioned file \textit{version.txt}. It contains a number which specifies the generation of the last index commit
+for which the data were created.
+Upon detecting that the data version does not match with the index version, the suggester will rebuild its data by itself.
 
-\subsubsection{Data structures abstractions}
+\subsubsection{Data Structures Abstractions}
+The implementation contains interfaces data structures so the implementation might change and there would no need to
+modify other parts of the Suggester module. Abstract data structures:
+\begin{itemize}
+    \item \textbf{PopularityMap} - abstraction for most popular completion.
+    \item \textbf{IntsHolder} – abstraction for holding a set of positions in a document used for Phrase Query evaluation.
+\end{itemize}
 
 \section{Testing}
 Unit tests\footnote{\url{https://en.wikipedia.org/wiki/Unit_testing}} were written for the Suggester module to test

diff --git a/text/thesis.tex b/text/thesis.tex
@@ -53,6 +53,7 @@
 
 % custom packages
 \usepackage{pgfplots} % plots
+\usepackage{dirtree}
 
 %%% Basic information on the thesis