Skip to content
This repository has been archived by the owner on Sep 4, 2023. It is now read-only.

Native messaging #105

Closed
wants to merge 87 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
451ed6f
Select HTML nodes
jelmervdl Jan 25, 2022
9f652fd
The essential bit of getting HTML restore to work
jelmervdl Jan 26, 2022
267d24a
Bit better chunking heuristic
jelmervdl Jan 26, 2022
d25fc22
Quick hack to see which elements get send
jelmervdl Jan 26, 2022
598925f
Smaller translation batches
jelmervdl Jan 26, 2022
3ab0cbf
Patch to use testbench build of bergamot-translator
jelmervdl Jan 26, 2022
de45e9e
Try to set up automated build
jelmervdl Jan 26, 2022
59756c5
Attempt to fix build
jelmervdl Jan 26, 2022
78a8e56
Another attempt to fix build
jelmervdl Jan 26, 2022
05086b6
Try v9 again
jelmervdl Jan 26, 2022
2349d70
Remove an macOS-ism in favour of a bash-ism
jelmervdl Jan 27, 2022
2d8c0bb
More element-specific handling
jelmervdl Jan 27, 2022
d469580
Use shipped WASM binary
jelmervdl Jan 27, 2022
1d15003
Fix edge case where empty translation batch was submitted
jelmervdl Jan 27, 2022
c74a507
Add more elements that can be translated
jelmervdl Jan 27, 2022
c7384da
Run workflow under more conditions
jelmervdl Jan 27, 2022
1c9c49f
Use non-experimental language detection
jelmervdl Jan 28, 2022
2b75a74
Replace translation bar experimental UI with just a page popup
jelmervdl Jan 28, 2022
585cae5
Add signing & rename to avoid any name clashes
jelmervdl Jan 28, 2022
d2cca60
Did this break again?
jelmervdl Jan 28, 2022
88552bb
Only sign versioned builds
jelmervdl Jan 28, 2022
8635627
Fix version number adding
jelmervdl Jan 28, 2022
ac306ef
Merge remote-tracking branch 'mozilla/main'
jelmervdl Feb 1, 2022
8e9e844
Remove the telemetry bits
jelmervdl Feb 1, 2022
dbe700c
Rewrite TranslationWorker to run in background_script
jelmervdl Feb 2, 2022
8a52cce
Useful errors plz
jelmervdl Feb 2, 2022
29c1d5d
Revert back to using idle callback
jelmervdl Feb 3, 2022
3857f17
Add priorities to the queue
jelmervdl Feb 3, 2022
36b7440
Add some more performance benchmarking
jelmervdl Feb 3, 2022
ad957cf
Move the heavy lifting into a WebWorker
jelmervdl Feb 3, 2022
83bf025
Multiple workers!
jelmervdl Feb 3, 2022
af9165a
Document the code more
jelmervdl Feb 3, 2022
83a32c6
Fix mistake in tab pruning
jelmervdl Feb 4, 2022
bbca3f3
Rename TranslationWorker and TranslationHelper files to their class n…
jelmervdl Feb 4, 2022
049ff0d
Describe architecture of dev
jelmervdl Feb 4, 2022
37f88bb
Fully replace the UI with a meagre popup
jelmervdl Feb 4, 2022
f6fcafa
💩
jelmervdl Feb 4, 2022
53f7527
Hopefully fix pivoting
jelmervdl Feb 4, 2022
775a837
Remove unsupported files
jelmervdl Feb 4, 2022
89d67d3
I don't like linters
jelmervdl Feb 4, 2022
347f584
Rename mediator
jelmervdl Feb 4, 2022
e9d143d
Use bergamot-translator's updated WASM build pipeline
jelmervdl Feb 4, 2022
e50ca5c
Progress updates in the UI
jelmervdl Feb 4, 2022
79d791c
Be a little bit better at suggesting languages
jelmervdl Feb 4, 2022
691384b
Take suggestions of the page's language from the <html lang> attribute
jelmervdl Feb 4, 2022
b99be53
Add a "debug" toggle to the popup
jelmervdl Feb 4, 2022
c5b8645
Aggressive InPageTranslation debugging
jelmervdl Feb 4, 2022
01d0e00
Handle page navigation inside an existing tab better
jelmervdl Feb 4, 2022
959634a
Try more tag inclusion/exclusion
jelmervdl Feb 4, 2022
fe0beeb
Revert "Use bergamot-translator's updated WASM build pipeline"
jelmervdl Feb 4, 2022
334b01d
Write a bit about TranslationHelper
jelmervdl Feb 4, 2022
39bc7f5
Noticed some elements, like the image caption on wikipedia, not being…
jelmervdl Feb 5, 2022
a2d6ad5
Highlight elements that were but should not be translated
jelmervdl Feb 5, 2022
9e93897
Rename build
jelmervdl Feb 5, 2022
2ab000f
Forgot a pipe op
jelmervdl Feb 5, 2022
2935195
Bit more specific about what tags are actually translated while they …
jelmervdl Feb 5, 2022
75833d3
Remove unused variable
jelmervdl Feb 5, 2022
227f336
Restore test() function
jelmervdl Feb 5, 2022
0b7da85
Also reset counters when navigating to a new page
jelmervdl Feb 5, 2022
71eb606
Use the shared state to toggle translation
jelmervdl Feb 7, 2022
dd82c82
Remove the HTML allowlist in favour of just a HTML blocklist
jelmervdl Feb 7, 2022
1c5db0c
Allow long refs, fix regex
jelmervdl Feb 7, 2022
5da3f88
Remove console.log from lots of places and silence non-errors
jelmervdl Feb 7, 2022
402beae
How about we upload the signed file instead
jelmervdl Feb 7, 2022
a435ba0
Revert "Revert "Use bergamot-translator's updated WASM build pipeline""
jelmervdl Feb 7, 2022
911ecb7
Properly fix workflow
jelmervdl Feb 7, 2022
a27f824
Update WASM build
jelmervdl Feb 7, 2022
ad5c18c
Experiment with HTML reunification
jelmervdl Feb 8, 2022
46946d3
Try to fix mutation observer to also look out for text node changes
jelmervdl Feb 8, 2022
fa8ec20
Add notes to HTML merge strategy
jelmervdl Feb 8, 2022
db5afdc
Use WeakSet for translated node set
jelmervdl Feb 8, 2022
fdc61e9
I only use one build, so only build that one plz
jelmervdl Feb 8, 2022
ae537aa
Move ccache
jelmervdl Feb 8, 2022
ae33571
Update bergamot-translator to use unmerged HTML improvements
jelmervdl Feb 8, 2022
43401f0
Save built ccache into GitHub cache
jerinphilip Feb 8, 2022
3b2ef1d
Push latest development version
jelmervdl Feb 10, 2022
8dd9cbe
Merge pull request #1 from jerinphilip/dev
jelmervdl Feb 10, 2022
ce37664
Update manifest
jelmervdl Feb 10, 2022
9ac8076
Je suis un idiot
jelmervdl Feb 10, 2022
362d695
Fix bad_alloc in bergamot-translator
jelmervdl Feb 10, 2022
4b8a4d6
Update bergamot-translator to latest html improvements branch
jelmervdl Feb 11, 2022
f460c40
Fix case where HTML merge would leave old textnodes in the document
jelmervdl Feb 14, 2022
859c68a
Exclude more elements because they do not guarantee that `Element.inn…
jelmervdl Feb 14, 2022
049beae
Fix bug in exclusion based on `lang` attribute
jelmervdl Feb 14, 2022
349af3d
Update bergamot-translator
jelmervdl Feb 14, 2022
5316b30
Remove HTML from batching key
jelmervdl Feb 14, 2022
17d28c9
Initial implementation of calling out to translateLocally for transla…
jelmervdl Feb 15, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/deploy-tag.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash
cd extension
mv manifest.json{,.bak}
< manifest.json.bak \
sed -r 's/"version": ".+"/"version": "'$(echo "$GITHUB_REF_NAME" | sed -r 's/^(.+\/)?v(.+)$/\2/')'"/' \
| sed 's/"version_name": ".+"/"version_name": "'"$GITHUB_SHA"'"/' \
> manifest.json
12 changes: 12 additions & 0 deletions .github/deploy-wasm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash
WASM_BASE=$1/bergamot-translator-worker
WASM_BIN=$WASM_BASE.wasm
WASM_JS=$WASM_BASE.js

{
echo -e "function loadEmscriptenGlueCode(Module) {\n"
sed -r 's/^(.+)$/ \1/g' < $WASM_JS | sed -e 's/[[:space:]]*$//'
echo -n -e "\n\n return { addOnPreMain, Module };\n}"
} > extension/controller/translation/bergamot-translator-worker.js

mv $WASM_BIN extension/controller/translation/bergamot-translator-worker.wasm
21 changes: 0 additions & 21 deletions .github/workflows/build_main.yml

This file was deleted.

19 changes: 0 additions & 19 deletions .github/workflows/build_pr.yml

This file was deleted.

146 changes: 139 additions & 7 deletions .github/workflows/publish_artifact.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,158 @@
name: Publish release on tag
name: Build release

on:
push:
branches:
- main
tags:
- "v*.*.*"
pull_request:
branches:
- main

env:
emsdk_version: 2.0.9 # For use in emscripten build
ccache_basedir: ${{ github.workspace }}
ccache_dir: "${{ github.workspace }}/.ccache"
ccache_compilercheck: content
ccache_compress: 'true'
ccache_compresslevel: 9
ccache_maxsize: 200M
ccache_cmake: -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache

jobs:
build:
build-wasm:
name: "emscripten"
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
with:
submodules: recursive
- name: Set ccache environment for emcc
run: |
# We are hardcoding this to mtime instead of env pickup. Rest use content.
echo "CCACHE_COMPILER_CHECK=mtime" >> $GITHUB_ENV
echo "CCACHE_BASEDIR=${{ env.ccache_basedir }}" >> $GITHUB_ENV
echo "CCACHE_COMPRESS=${{ env.ccache_compress }}" >> $GITHUB_ENV
echo "CCACHE_COMPRESSLEVEL=${{ env.ccache_compresslevel }}" >> $GITHUB_ENV
echo "CCACHE_DIR=${{ env.ccache_dir }}" >> $GITHUB_ENV
echo "CCACHE_MAXSIZE=${{ env.ccache_maxsize }}" >> $GITHUB_ENV
# https://emscripten.org/docs/compiling/Building-Projects.html#using-a-compiler-wrapper
echo "EM_COMPILER_WRAPPER=ccache" >> $GITHUB_ENV
- name: Generate ccache_vars for ccache based on machine
shell: bash
id: ccache_vars
run: |-
echo "::set-output name=hash::$(echo ${{ env.ccache_compilercheck }})"
echo "::set-output name=timestamp::$(date '+%Y-%m-%dT%H.%M.%S')"
# This need to be run before setup, so ccache build caching doesn't complain.
- name: Obtain emsdk sources
run: |
git clone --depth 1 https://github.com/emscripten-core/emsdk.git
- name: Cache-op for build-cache through ccache
uses: actions/cache@v2
with:
path: |
${{ env.ccache_dir }}
${{ github.workspace }}/emsdk/ccache/git-emscripten_64bit/
key: ccache-${{ github.job }}-${{ env.emsdk_version }}-${{ steps.ccache_vars.outputs.hash }}-${{ github.ref }}-${{ steps.ccache_vars.outputs.timestamp }}
restore-keys: |-
ccache-${{ github.job }}-${{ env.emsdk_version }}-${{ steps.ccache_vars.outputs.hash }}-${{ github.ref }}
ccache-${{ github.job }}-${{ env.emsdk_version }}-${{ steps.ccache_vars.outputs.hash }}
ccache-${{ github.job }}-${{ env.emsdk_version }}
- name: Setup Emscripten toolchain
run: |
(cd emsdk && ./emsdk install ${{ env.emsdk_version }} ccache-git-emscripten-64bit)
(cd emsdk && ./emsdk activate ${{ env.emsdk_version }} ccache-git-emscripten-64bit)
# mtime of this file is checked by ccache, we set it to avoid cache misses.
touch -m -d '1 Jan 2021 12:00' emsdk/.emscripten
# These needs to be done in the activated shell.
eval $(./emsdk/emsdk construct_env \
| sed 's/export PATH=\(.*\);/echo \1 >> $GITHUB_PATH;/' \
| sed 's/export \(.*\);/echo \1 >> $GITHUB_ENV;/' );
# This looks more permanent than version pinned, so keeping temporarily to avoid failures.
echo "${{ github.workspace }}/emsdk/ccache/git-emscripten_64bit/bin" >> $GITHUB_PATH
- name: Verify Emscripten setup
run: |
emcc --version
emcmake cmake --version
emmake make --version
- name: ccache prolog
run: |-
ccache -s # Print current cache stats
ccache -z # Zero cache entry
# WORMHOLE=off
- name: "Configure builds for WORMHOLE=off"
working-directory: bergamot-translator
run: |
mkdir -p build-wasm-without-wormhole
cd build-wasm-without-wormhole
emcmake cmake -DCOMPILE_WASM=on -DWORMHOLE=off ..
- name: "Compile with WORMHOLE=off"
working-directory: bergamot-translator/build-wasm-without-wormhole
run: |
emmake make -j2
- name: ccache epilog
run: |
ccache -s # Print current cache stats
- name: Import GEMM library from a separate wasm module
working-directory: bergamot-translator/build-wasm-without-wormhole
run: bash ../wasm/patch-artifacts-import-gemm-module.sh
- name: Upload wasm artifact
uses: actions/upload-artifact@v2
with:
name: wasm-artefacts
if-no-files-found: error
path: |
# Without wormhole
${{github.workspace}}/bergamot-translator/build-wasm-without-wormhole/bergamot-translator-worker.js
${{github.workspace}}/bergamot-translator/build-wasm-without-wormhole/bergamot-translator-worker.wasm

build-extension:
runs-on: ubuntu-latest
needs:
- build-wasm
steps:
- uses: actions/checkout@v2
- name: Use Node.js
uses: actions/setup-node@v1
with:
node-version: '17.x'
- name: Download wasm build
uses: actions/download-artifact@v2
- name: Import wasm build
run: bash .github/deploy-wasm.sh wasm-artefacts
- name: Write version number
if: startsWith(github.ref, 'refs/tags/v')
run: bash .github/deploy-tag.sh
- name: Install dependencies
run: npm install
- name: Run linter
run: npm run lint:js
continue-on-error: true
- name: Build
run: npm run build
run: npm run build
- name: Update GitHub prerelease
uses: marvinpinto/action-automatic-releases@latest
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
automatic_release_tag: development-latest
prerelease: true
title: "Latest Development Build"
files: |
./web-ext-artifacts/bergamot_translations.xpi
- name: Sign
if: startsWith(github.ref, 'refs/tags/v')
env:
WEB_EXT_API_KEY: ${{ secrets.WEB_EXT_API_KEY }}
WEB_EXT_API_SECRET: ${{ secrets.WEB_EXT_API_SECRET }}
run: |
npm run sign
mv web-ext-artifacts/experimental_bergamot_translations-*.xpi web-ext-artifacts/bergamot_translations-signed.xpi
- name: Create Release
id: create_release
if: startsWith(github.ref, 'refs/tags/v')
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand All @@ -29,13 +161,13 @@ jobs:
release_name: Release ${{ github.ref }}
draft: false
prerelease: true
- name: Upload Release Asset
id: upload-release-asset
- name: Upload extension
if: startsWith(github.ref, 'refs/tags/v')
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ steps.create_release.outputs.upload_url }} # This pulls from the CREATE RELEASE step above, referencing it's ID to get its outputs object, which include a `upload_url`. See this blog post for more info: https://jasonet.co/posts/new-features-of-github-actions/#passing-data-to-future-steps
asset_path: ./web-ext-artifacts/firefox_translations.xpi
asset_name: firefox_translations.xpi
asset_path: ./web-ext-artifacts/bergamot_translations-signed.xpi
asset_name: bergamot_translations.xpi
asset_content_type: application/zip
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
node_modules
web-ext-artifacts
web-ext-artifacts
extension/controller/translation/bergamot-translator-worker.wasm
extension/controller/translation/bergamot-translator-worker.js
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "bergamot-translator"]
path = bergamot-translator
url = https://github.com/browsermt/bergamot-translator.git
127 changes: 127 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Components

## content-script
Entrypoint: contentScript.js

Responsibilities:
- Text sample extraction
- In-page translation processing

## page-action popup
Entrypoint: static/popup.html

Responsibilities:
- UI to trigger translation
- UI to update on translation progress

Notes:
- Does not have state, popup will reset every time it closes or loses focus.

## background-script
Entrypoint: controller/backgroundScript.js

Responsibilities:
- Maintaining the translation state per tab
- Maintain one instance of the translation engine and manage its queues
- Manage downloading & loading of translation models on demand
- Language detection on given samples

Tab state:
```
{
id: Number, // tab id
from: String, // BCP47
to: String, // BCP47
models: [
{
from: String, // BCP47
to: String // BCP47
}
], // All available from->to pairs, sorted by relevance
frames: {
[Number]: Port
}
}
```

# States
This is a list of all possible states a page (or the mediator) can be in.

Todo: Not covered is a way to return to the untranslated page. Might be tricky to implement if we're doing swapping DOM nodes and all that magic. For now I'll assume the user can just reload the page to get the untranslated page back.

## PAGE_LOADING
Page is still in a loading state. Initial state.

Next state:
- content-script will sample some text from the page once it has loaded and switch states to PAGE_LOADED.

## PAGE_LOADED
Page is loaded.

Actions:
- content-script will connect with background-script
- content-script will send sample to background-script

Next state:
- background-script will switch to TRANSLATION_AVAILABLE or TRANSLATION_NOT_AVAILABLE based on the languages detected in the sample and available translation models.

## TRANSLATION_AVAILABLE
Language detector detected the language

Fields: {
from: String, // BCP47, detected from language
to: String, // BCP47, navigator target language)
models: [
{
from: String, // BCP47
to: String // BCP47
}
] // All available from->to pairs, sorted by relevance
}

Actions:
- background-script tells page action icon to show

Next state:
- page-action popup can be opened to select language and trigger translation, which will switch to TRANSLATION_IN_PROGRESS

## TRANSLATION_NOT_AVAILABLE
There is no translation model available to translate this page from the detected language to the navigator's language.

Next state:
- None

## TRANSLATION_IN_PROGRESS

Action:
- popup will show progress bar based on chunks-queued and chunks-total. unless chunks-queued is 0, then it will show "done" or something?
- content-script will start submitting chunks of HTML or text to background-script for translation

Next state:
- background-script will switch to TRANSLATION_IN_PROGRESS (but with updated fields) while it is handling translation requests
- background-script can switch to TRANSLATION_ERROR when an an (irrecoverable) error occurs during translation

## Not used: TRANSLATION_ERROR

## Not used: TRANSLATION_FINISHED

# TranslationHelper & Co

Translations are done by Marian, wrapped in bergamot-translator, which is compiled to WASM. bergamot-translator exposes a `BlockingService` which you can give a bunch of sentences and a translation model, and it will start crunching.

Couple of things to take into account:

- BlockingService blocks while crunching, so it needs to be put in a WebWorker to not block up the current thread.
- BlockingService needs a bunch of text to translate to reach peak performance. If you give it too little chunks, it will run slow.
- It does not do threads (in WASM at least, yet?)

TranslationHelper puts those limitations in a nice interface:

- It provides a simple `translate()` method that you give some text, a source and target language, and it will give you a promise of a translation back.
- It has the option to cancel and prioritize translation requests. You can use these features to optimise for latency like TranslateLocally does!
- It does the grouping of translation requests for you
- It handles downloading translation model data, initialisation, etc. You just need to worry about calling `translate()` and waiting a bit.
- It can use multiple TranslationWorkers in parallel (threads!)

TranslationWorker provides an interface for TranslationHelper to work with the bergamot-translator WASM binary through a message passing interface. It translates the weird emscripten pointer types in native JavaScript types that can be shared through message passing, and takes care of loading and initialisation of the bergamot-translator-native types.

Loading