Skip to content

Commit

Permalink
Program documentation first iteration
Browse files Browse the repository at this point in the history
  • Loading branch information
ahornace committed Jul 15, 2018
1 parent 286d616 commit 5c55d74
Show file tree
Hide file tree
Showing 3 changed files with 107 additions and 7 deletions.
8 changes: 6 additions & 2 deletions text/chap02.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1169,6 +1169,7 @@ \subsubsection{Most Popular Completion – Complex Queries}
As a result, the complex query is only dismantled into simple term queries which are used to update the popularity term counts.

\subsubsection{Increasing Search Counts for Specific Terms}
\label{increasing_search_counts}
Administrators might already have some search data stored. These data could be used to initialize or update the
popularity term counts.

Expand All @@ -1179,7 +1180,7 @@ \subsubsection{Increasing Search Counts for Specific Terms}
\item Target endpoint:
\begin{itemize}
\item Request method: POST
\item Endpoint: /suggest/add
\item Endpoint: /suggest/init/queries
\item Media type: application/json
\end{itemize}
\item Example data:
Expand All @@ -1192,7 +1193,7 @@ \subsubsection{Increasing Search Counts for Specific Terms}
\item Target endpoint:
\begin{itemize}
\item Request method: POST
\item Endpoint: /suggest/add
\item Endpoint: /suggest/init/raw
\item Media type: application/json
\end{itemize}
\item Example data:
Expand Down Expand Up @@ -1295,3 +1296,6 @@ \subsection{Tolerating Errors}
distance $k$ of $s$. $s$ is called a \textit{k-extension}. Then, for some prefix it is possible to consider its
k-extensions up to some constant $k$. Authors presented Trie and Q-gram based algorithms. However, this would need
another data structures and does not work for other types of queries. Therefore, it is considered as a possible extension.

\subsection{Retrieving Popularity Data}
\label{retrieving_popularity_data}
105 changes: 100 additions & 5 deletions text/chap04.tex
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ \chapter{Program Documentation}
\section{Overview}
As mentioned, suggestions are retrieved by a REST API call made to the Web module. This module processes the data and
invokes the Suggester module to return the suggestions. Simplified diagram which show the main interactions between the
objects and modules can be seen in the figure \ref{programmer_sequence}.
objects and modules can be seen in the Figure \ref{programmer_sequence}.
\begin{figure}[htbp]
\centering
\includegraphics[width=145mm]{../img/programmer_sequence.pdf}
Expand All @@ -15,18 +15,113 @@ \section{Overview}
\end{figure}

\section{Web module}
Suggester support was added as a part to the already existing OpenGrok's REST API. The \textit{SuggesterController}
class serves as a REST API endpoint for suggester related queries. Other needed classes are located in the
\textit{org.opensolaris.opengrok.web.api.v1.suggester} package. Classes which are auto-detected by the Jersey
implementation are located in \textit{org.opensolaris.opengrok.web.api.v1.suggester.provider} package and are annotated
by the \textit{@Provider}\footnote{\url{https://docs.oracle.com/javaee/7/api/javax/ws/rs/ext/Provider.html}} annotation.
The communication with the Suggester module is abstracted in \textit{SuggesterService} interface and implemented in
\textit{SuggesterServiceImpl} class. The implementation is injected into the classes which require the
\textit{SuggesterService} by the use of
\textit{@Inject}\footnote{\url{https://docs.oracle.com/javaee/7/api/javax/inject/Inject.html}} annotation. If some other
component which is not part of the REST API part of the application needs to access the \textit{SuggesterService},
it can retrieve it by invoking \textit{getDefault()} method in \textit{SuggesterServiceFactory} class.

\subsubsection{Configuration Change}
The suggester data structures need to be refreshed on configuration change.
Many basic OpenGrok properties might have changed, for instance:
\begin{itemize}
\item \textit{dataRoot} which specifies where OpenGrok stores the data.
\item Suggester configuration.
\item Projects.
\end{itemize}

It is not trivial to notice all the changes and it would not be feasible from the modifiability point of view to check
specific values for change. Therefore, \textit{SuggesterServiceImpl} closes the underlying suggester data structures
and tries to reinitialize the suggester with the new configuration. If the configuration have not changed in a major
aspect (e.g. different \textit{dataRoot} or new projects were added), then the Suggester module just reloads the data
from disk. Reload from disk should be fast (from the experience less than $1s$).

\section{Suggester module}
The Suggester module is located under the \textit{suggester} directory of the root of the project. The main class which
exposes the suggester functionality is the \textit{Suggester} class. Its public API is described in \ref{public_api}.

\subsubsection{Public API}
\label{public_api}
The public API consists of the following methods:
\begin{itemize}
\item \textit{init(Collection\textless Path\textgreater)} – initializes all the data structures based on the paths
to the indexes.
\item \textit{remove(Iterable\textless String\textgreater)} – removes all the data structures stored for the
specified names.
\item \textit{search(List\textless NamedIndexReader\textgreater, SuggesterQuery, Query)} –
searches the suggester data and returns suggestions to the provided queries.
\item \textit{onSearch(Iterable\textless String\textgreater, Query)} –
event signalization that the query was searched. Updates the data structures for the most popular completion.
\item \textit{increaseSearchCount(String, Term, int)} – updates search count for the specified project and term by the
specified int value. Negative values are not allowed.
\item \textit{getSearchCounts(String, String, int, int)} – returns popularity data for specified project and field.
\item \textit{close()} – closes the suggester data structures and other functionality.
\end{itemize}

\subsubsection{Suggester Data}
The following shows the typical \textit{dataRoot} content for multi-project OpenGrok setup along with the suggester data.
\dirtree{%
.1 \textit{dataRoot}\DTcomment{OpenGrok's data root specified in configuration}.
.2 historycache\DTcomment{history data for files}.
.3 project1\DTcomment{history data for files of \textit{project1}}.
.4 {…}.
.3 {…}.
.2 index\DTcomment{index data}.
.3 project1\DTcomment{index data for \textit{project1}}.
.4 {…}.
.3 {…}.
.2 statistics.json\DTcomment{stored statistics data}.
.2 suggester\DTcomment{suggester data}.
.3 project1\DTcomment{suggester data for \textit{project1}}.
.4 \{field\}\_map.cfg\DTcomment{stored Chronicle Map configuration}.
.4 \{field\}\_search\_count.db\DTcomment{stored Chronicle Map}.
.4 \{field\}.wfst\DTcomment{stored WFST data structure}.
.4 version.txt\DTcomment{version of the suggester data}.
.3 {…}.
.2 timestamp\DTcomment{timestamp of last indexing}.
.2 xref\DTcomment{pre-generated HTML files}.
.3 project1\DTcomment{pre-generated HTML files for \textit{project1}}.
.4 {…}.
.3 {…}.
}

\subsubsection{Suggester data}
In the case of single-project OpenGrok setup, there is no \textit{project1} as specified in the previous example but the
data are stored directly in directories \textit{historycache, index, suggester} and \textit{xref}.

\subsubsection{Detecting index version}
The \textit{\{field\}} represents the Lucene field for which the files contain the data, e.g. \textit{full}.
If some data are corrupted and
suggester is not able to read them, the best solution might be:
\begin{enumerate}
\item If possible, then backup the popularity data as specified in \ref{retrieving_popularity_data}. (Not needed
if popularity data is not important or turned off.)
\item Stop the web application.
\item Remove either the corrupted data or the whole suggester directory.
\item Start the web application again.
\item Initialize the popularity data as specified in \ref{increasing_search_counts}. Some data modifications would
be needed; however, if the need arises, an automatic tool might be created for this purpose. (Not needed
if popularity data is not important or turned off.)
\end{enumerate}

\subsubsection{Configuration change}
\subsubsection{Detecting Index Version}
The Indexer might run even if the Web application is turned off. Therefore, it would not be possible to notify the
suggester about the change. As a result, the suggester needs to detect this by itself. Exactly for this purpose serves the
aforementioned file \textit{version.txt}. It contains a number which specifies the generation of the last index commit
for which the data were created.
Upon detecting that the data version does not match with the index version, the suggester will rebuild its data by itself.

\subsubsection{Data structures abstractions}
\subsubsection{Data Structures Abstractions}
The implementation contains interfaces data structures so the implementation might change and there would no need to
modify other parts of the Suggester module. Abstract data structures:
\begin{itemize}
\item \textbf{PopularityMap} - abstraction for most popular completion.
\item \textbf{IntsHolder} – abstraction for holding a set of positions in a document used for Phrase Query evaluation.
\end{itemize}

\section{Testing}
Unit tests\footnote{\url{https://en.wikipedia.org/wiki/Unit_testing}} were written for the Suggester module to test
Expand Down
1 change: 1 addition & 0 deletions text/thesis.tex
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@

% custom packages
\usepackage{pgfplots} % plots
\usepackage{dirtree}

%%% Basic information on the thesis

Expand Down

0 comments on commit 5c55d74

Please sign in to comment.