Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snap improvements #4298

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Conversation

brlin-tw
Copy link
Contributor

@brlin-tw brlin-tw commented Aug 9, 2024

This pull request introduces various improvements to the snap recipe, please review.

This patch mitigates risk of supply chain attack via tainted source
release packages(e.g. CVE-2024-3094).

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
This patch addresses the following Snapcraft linter warning:

```
- library: libopenjp2.so.7: unused library 'usr/lib/x86_64-linux-gnu/libo
penjp2.so.2.4.0'. (https://snapcraft.io/docs/linters-library)
```

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
* Reduce part pull time by limiting the history depth to clone
* Reduce part build time by disable components that we don't need.
* Reduce overall snap size by only shipping files that are needed in
 runtime.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
This is the current latest version.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
The application is still runnable due to some of the packages are
already staged by the Leptonica part, however it should still be
declared for clarity.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
This change reduces package size by 44%(73637888 -> 40894464).

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
Currently this feature isn't exposed to the host, dropping the
dependencies to reduce built package size.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
Previously this is implicitly configured as the build dependency isn't
available and the training tools build target isn't called.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
The zlib library is Leptonica's dependency, not Tesseract's.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
These should be Leptonica part's build dependencies.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
… tesseract and leptonica part

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
@stweil
Copy link
Contributor

stweil commented Aug 9, 2024

Maybe libcurl and libarchive could be added, too. Both add more features to tesseract.

@brlin-tw
Copy link
Contributor Author

brlin-tw commented Aug 9, 2024

@stweil

Maybe libcurl and libarchive could be added, too. Both add more features to tesseract.

I have plans to add them as well, will do.

…ble snap publishing

This scriptlet allows stable builds to be always available for the
downstream snap publisher to promote to the `stable` channel on
the Snap Store.

Higher risk builds(e.g. beta/edge) will only occur when the lower risk
builds(rc/stable) are promoted.

Refer the following Snapcraft Forum topic for more info:

    Selective-checkout: Check out the tagged release revision if it isn't promoted to the stable channel - doc - snapcraft.io
    https://forum.snapcraft.io/t/selective-checkout-check-out-the-tagged-release-revision-if-it-isnt-promoted-to-the-stable-channel/10617

Here's a snippet of the scriptlet in action:

```
:: selective-checkout: INFO: Detecting stable releases...
:: selective-checkout: determine_stable_release_details: DEBUG: Last stable release tag determines to be "5.4.1".
:: selective-checkout: determine_stable_release_details: DEBUG: Last stable version determines to be "5.4.1".
:: selective-checkout: snap_query_version: DEBUG: Checking what snap revisions are available for the tesseract snap at the stable release channel...
:: selective-checkout: determine_stable_release_details: DEBUG: Last stable version on the snap store determines to be "".
:: selective-checkout: INFO: Detecting release candidate releases...
:: selective-checkout: DEBUG: Last release candidate release tag determines to be "5.4.0-rc2".
:: selective-checkout: DEBUG: Last release candidate version determines to be "5.4.0-rc2".
:: selective-checkout: snap_query_version: DEBUG: Checking what snap revisions are available for the tesseract snap at the candidate release channel...
:: selective-checkout: DEBUG: Last release candidate version on the snap store determines to be "".
:: selective-checkout: INFO: Detecting beta releases...
:: selective-checkout: DEBUG: Last beta release tag determines to be "5.0.0-beta-20210916".
:: selective-checkout: DEBUG: Last beta version determines to be "5.0.0-beta-20210916".
:: selective-checkout: snap_query_version: DEBUG: Checking what snap revisions are available for the tesseract snap at the beta release channel...
:: selective-checkout: DEBUG: Last beta version on the snap store determines to be "".
:: selective-checkout: determine_which_version_to_build: DEBUG: Stable version(5.4.1) is newer than the one on the Snap Store(none), stable release will be built.
:: selective-checkout: Info: The last tagged stable release(5.4.1) hasn't been promoted to the stable channel() on the Snap Store yet, checking out 5.4.1.
```

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
@brlin-tw
Copy link
Contributor Author

brlin-tw commented Aug 9, 2024

@stweil

Maybe libcurl and libarchive could be added, too. Both add more features to tesseract.

I have made the requested change, please review.

@brlin-tw
Copy link
Contributor Author

brlin-tw commented Aug 9, 2024

I noticed that the snap would fail to launch due to link failure:

/snap/tesseract/x1/usr/local/bin/tesseract: error while loading shared libraries: libtesseract.so.5: cannot open shared object file: No such file or directory

Will push a fix soon.

Libraries under $SNAP/usr/local/lib isn't searched by the dynamic linker
by default.

Refer-to: Staging - Build and staging dependencies - doc - snapcraft.io <https://forum.snapcraft.io/t/build-and-staging-dependencies/11451>
Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
This patch addresses the following linter warnings:

```
- library: libicuio.so.70: unused library 'usr/lib/x86_64-linux-gnu/libicuio.so.70.1'. (https://snapcraft.io/docs/linters-library)
- library: libicutest.so.70: unused library 'usr/lib/x86_64-linux-gnu/libicutest.so.70.1'. (https://snapcraft.io/docs/linters-library)
- library: libicutu.so.70: unused library 'usr/lib/x86_64-linux-gnu/libicutu.so.70.1'. (https://snapcraft.io/docs/linters-library
- library: libicui18n.so.70: unused library 'usr/lib/x86_64-linux-gnu/libicui18n.so.70.1'. (https://snapcraft.io/docs/linters-library)
```

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
…act part

libcurl functionality requires network access.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
@egorpugin
Copy link
Contributor

Do we really need this selective-checkout file?

@brlin-tw
Copy link
Contributor Author

brlin-tw commented Aug 9, 2024

I noticed that the snap would fail to launch due to link failure:

/snap/tesseract/x1/usr/local/bin/tesseract: error while loading shared libraries: libtesseract.so.5: cannot open shared object file: No such file or directory

This problem is fixed in the current branch tip.

I also fixed broken libcurl functionality due to missing network snapd security confinement interface connection, please review.

@brlin-tw
Copy link
Contributor Author

brlin-tw commented Aug 9, 2024

@egorpugin

Do we really need this selective-checkout file?

This is required for the downstream snap publisher(currently me, but it can be anyone that has store collaborator access) to have stable(r) snap builds available to promote to the stable(r) channels on the Snap Store. Currently, the snap build is done by periodically(every 6 hours IIRC) importing this repository to the mirror repo on Launchpad, which has a snap package recipe configured to be built automatically whenever the content of the mirror changes. As the build trigger isn't based on release tags we cannot ensure that beta/rc/stable builds will be available to be promoted to the beta/candidate/stable channels on the store backstage unless the Snapcraft part pull logic is customized.

Here is an example of what the snap promote operation looks like:

Screenshot

It is fine to drop this file if the snap maintainer/collaborator has the Git repository's write access and is willing to manually set the release tag to build(e.g. via the source-tag property of the tesseract part) in the snapcraft.yaml as part of the release process.

@brlin-tw
Copy link
Contributor Author

brlin-tw commented Aug 10, 2024

I found a problem that the Tesseract config files shipped within the snap are not loaded due to the lack of data directory fallback logic(the TESSDATA_PREFIX environment variable is set to $SNAP_USER_COMMON to allow the user to install trained data files, however the config files are installed in the readonly $SNAP/usr/local/share/tessdata directory).

tesseract ../eurotext.png - -l eng hocr
read_params_file: Can't open hocr
openat(AT_FDCWD, "/home/brlin/snap/tesseract/common/configs/hocr", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/brlin/snap/tesseract/common/tessconfigs/hocr", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "hocr", O_RDONLY)      = -1 ENOENT (No such file or directory)
write(2, "read_params_file: Can't open hoc"..., 34read_params_file: Can't open hocr
) = 34

Related code: CCUtil::main_setup function of tesseract/src/ccutil/ccutil.cpp at main · tesseract-ocr/tesseract.

I'll make a patch to handle the fallback in the meantime, but it may be desired to implement the fallback logic in the main code(User specified path -> TESSDATA_PREFIX -> hardcoded datadir during the build configuration process).

This patch fixes configuration files in the readonly snap filesystem
not loadable by Tesseract due to missing datadir fallback logic.

If the config file with the same name doesn't appear in the
$SNAP_USER_COMMON directory, the one in the $SNAP/usr/local/share/tessdata
directory will be loaded, allow Tesseract to work properly.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
@brlin-tw
Copy link
Contributor Author

brlin-tw commented Aug 10, 2024

I'll make a patch to handle the fallback in the meantime

A patch(bdccad) is implemented to workaround this issue, I can confirm read-only config files can be accessed normally. I don't really speak C++ so please do review.

@brlin-tw
Copy link
Contributor Author

brlin-tw commented Aug 10, 2024

I noticed several new problems while researching the config loading issue:

  • Currently the snap hardcodes the TESSDATA_PREFIX environment variable to SNAP_USER_COMMON, we should avoid doing so to allow users to set the variable to any accessible, preferable path to have a consistent experience with the native distribution.
  • The TESSDATA_PREFIX hardcoded datadir macro isn't set to a valid value(should be $SNAP/usr/local/share instead of /usr/local/share, this isn't currently used as it is overridden by the hardcoded TESSDATA_PREFIX environment variable, but fixing it can simplify the aforementioned fallback config loading patch which hardcodes the entire path.)

I'll address these issues in the next couple of days.

…t variable

This patch addresses the following Snapcraft linter warnings:

```
CRAFT_ARCH_TRIPLET is deprecated, use CRAFT_ARCH_TRIPLET_BUILD_{ON|FOR}
```

Refer-to: core22 | Parts environment variables | Snapcraft documentation <https://snapcraft.io/docs/parts-environment-variables#heading--snapcraft-configuration-core22>
Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
@egorpugin
Copy link
Contributor

@brlin-tw Is it possible to make tesseract-snap repo and put all snap related files there?

@stweil What do you think?

@brlin-tw
Copy link
Contributor Author

@egorpugin

@brlin-tw Is it possible to make tesseract-snap repo and put all snap related files there?

It's definitely possible, though in this case, I'll have to figure out how to trigger an automatic snap build whenever the main repo has new changes. I'll look into it.

This patch fixes the incorrect compiled-in datadir path and as a
side-effect, simplifies the fallback config loading patch.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
This patch allows the user to store Tesseract data in custom paths
using the TESSDATA_PREFIX environment variable.

Previously it is hardcoded to $SNAP_USER_COMMON.

Signed-off-by: 林博仁(Buo-ren Lin) <[email protected]>
@brlin-tw
Copy link
Contributor Author

  • Currently the snap hardcodes the TESSDATA_PREFIX environment variable to SNAP_USER_COMMON, we should avoid doing so to allow users to set the variable to any accessible, preferable path to have a consistent experience with the native distribution.

  • The TESSDATA_PREFIX hardcoded datadir macro isn't set to a valid value(should be $SNAP/usr/local/share instead of /usr/local/share, this isn't currently used as it is overridden by the hardcoded TESSDATA_PREFIX environment variable, but fixing it can simplify the aforementioned fallback config loading patch which hardcodes the entire path.)

These issues are fixed in the current tip of the branch.

@brlin-tw
Copy link
Contributor Author

brlin-tw commented Aug 23, 2024

Hello, apologies for the ignorance. Here are some progress I'm doing at the moment:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants