Skip to content

rmacd/synology-lucene-client

Repository files navigation

synology-lucene-client

Summary

This is a REST endpoint that permits the in-built Lucene++ index on Synology NAS instances to be queried remotely.

The output can then be consumed via eg synology-lucene-client-ui, which permits the files to be queried and retrieved via the browser.

This is an alpha release. There are probably issues with threading and deadlocks. Caveat emptor.

Deep dive

Lucene file types: an intro

The Synology file index service creates a bunch of files under /<volume>/<share/@eaDir/[email protected]. Poking around the files tells you these are generated by Lucene++ (v3.0.7 on my system). A summary of these file types can be found below; taken from https://lucene.apache.org/core/2_9_4/fileformats.html:

Name Extension Brief Description
Segments File segments.gen, segments_N Stores information about segments
Lock File write.lock The Write lock prevents multiple IndexWriters from writing to the same file.
Compound File .cfs An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles.
Fields .fnm Stores information about the fields
Field Index .fdx Contains pointers to field data
Field Data .fdt The stored fields for documents
Term Infos .tis Part of the term dictionary, stores term info
Term Info Index .tii The index into the Term Infos file
Frequencies .frq Contains the list of docs which contain each term along with frequency
Positions .prx Stores position information about where a term occurs in the index
Norms .nrm Encodes length and boost factors for docs and fields
Term Vector Index .tvx Stores offset into the document data file
Term Vector Documents .tvd Contains information about each document that has term vectors
Term Vector Fields .tvf The field level info about term vectors
Deleted Documents .del Info about what files are deleted

Luke 4.3.0 can be used to read the Lucene++ indices. With that, we can extract the fields within the Lucene++ index. I've mapped them across as follows (note last column, where I've tried to describe what is stored within these fields):

Field name within Synology Lucene Name within this library Description / format
SYNODriveFileID driveFileID empty
SYNODriveFileLabel driveFileLabel empty
SYNODriveFileStar driveFileStar empty
SYNOMDAcquisitionMake acquisitionMake empty
SYNOMDAcquisitionModel acquisitionModel empty
SYNOMDAttributeChangeDate attributeChangeDate Unix epoch sec
SYNOMDAuthors authors eg Jane Doe
SYNOMDCity city empty
SYNOMDContentModificationDate contentModificationDate Unix epoch sec
SYNOMDContributors contributors empty
SYNOMDCopyright copyright empty
SYNOMDCountry country empty
SYNOMDCoverage coverage empty
SYNOMDCreator creator eg Microsoft Office 2010
SYNOMDDateAdded dateAdded Unix epoch sec
SYNOMDDescription description empty
SYNOMDDisplayName displayName File name without path
SYNOMDDocInfo.SYNOMDPageLengthVector docInfo Character count per page eg 1280 1820 ...
SYNOMDExtension extension eg docx
SYNOMDFSContentChangeDate fsContentChangeDate Unix epoch sec
SYNOMDFSCreationDate fsCreationDate Unix epoch sec
SYNOMDFSName fsName File name without path
SYNOMDFSSize fsSize Size in bytes
SYNOMDFinderOpenDate finderOpenDate Unix epoch sec
SYNOMDHeadline headline empty
SYNOMDIdentifier identifier empty
SYNOMDIsDir isDir String y / n
SYNOMDKeywords keywords empty
SYNOMDKind kind eg docx
SYNOMDLanguages languages empty
SYNOMDLastUsedDate lastUsedDate Unix epoch sec
SYNOMDLogicalSize logicalSize Size in bytes
SYNOMDOwnerGroupID ownerGroupID Unix GID
SYNOMDOwnerUserID ownerUserID Unix UID
SYNOMDParent parent eg /volume1/sharename
SYNOMDPath path Full path to file eg /volume1/sharename/file.docx
SYNOMDPrivilege privilege Unix privs string eg rwxrwx---
SYNOMDPublishers publishers empty
SYNOMDRights rights empty
SYNOMDSearchAncestor searchAncestor empty
SYNOMDSearchFileName searchFileName empty
SYNOMDTextContent textContent Full text of document
SYNOMDTitle title empty
SYNOMDWildcard wildcard empty
SYNOStateOrProvince stateOrProvince empty
_SYNOMDFinderLabel sysFinderLabel eg 0
_SYNOMDGroupId sysGroupId Unix GID
_SYNOMDUserTags sysUserTags empty

Deploying

The REST API provides two endpoints: /search (for performing the search), and /get (for retrieving the doc).

Overview

  • Runs in Docker
  • Exposes search interface on port 18080

Build locally

docker build . -t synosearch

Configure

To find relevant paths within volumeX:

find /volumeX -maxdepth 1 -mindepth 1 -type d ! -name "@*"

important do not deploy this version on any production system: the /get endpoint allows all documents to be returned with no permissions checks, and there is no filtering currently applied to the search.

To run and expose a share /volume1/dropbox, execute as follows:

nas$> docker run -p 18080:18080 \
  -v /volume1/dropbox:/volume1/dropbox:ro \
  -v /volume1/dropbox/@eaDir/[email protected]:/indices/dropbox:ro \
  synosearch -index /indices/dropbox

Then you should be able to hit the /search endpoint:

curl "http://hostname:18080/search?q=example" | jq '.'

This will then return search results as JSON:

{
  "hits": [
    {
      "path": "/volume1/dropbox/ExampleNotificationPlugin.java",
      "fs_size": 995,
      "score": 1.72329,
      "extension": "java"
    }
  ],
  "total_hits": 1
}

Then you can run the synology-lucene-client-ui on top of this to provide a friendly UI.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages