Setting HTML options towards ease of usage #270

jerinphilip · 2021-11-30T18:10:02Z

So, there are two options for alignment which HTML is dependent on - the first which decides if a (Marian) model is alignment capable or not decided at model-construction, the second which decides if Response - a bergamot primitive should deal with alignments or not effects of which appear at runtime on a translate call.

It is safe to assume Mozilla always will need models capable of HTML translation (a few plain-text without HTML=false options).
What alignment: soft serves is the purpose of a switch to save some computation on the construction of Alignments. This in Marian is managed by the configuration. Currently, firefox through bindings also manages this through configuration.
a. Are there clients who would like to retain an alignment: none setting? Perhaps translateLocally? If not we might default to this always being true and incur any additional computation costs (Updated configuration for html text translation to work in wasm test page #269 (comment)).
We can read the configuration at runtime at the translate(...) call and check if it is capable of providing alignments. This can be used in throwing an error or abort, essentially telling the client "you tried to translate HTML without giving me alignments". We cannot write (set/unset, because we also support a multithreaded setting and the model config is shared among multiple translation requests).
@jelmervdl can have HTML pipeline cry if no alignments. Currently, it provides an illusion that things work if no alignment matrix, we should consider this a bug, given we have something superior to length-based tag transfer which @kpu originally requested.
ResponseOptions is a plain-old-datatype that makes things easy for bindings(?). We can change it to a class with constraints on construction (HTML => alignment true always).
We can at call time edit ResponseOptions.alignment = true if ResponseOptions.HTML = true as we get it with the translation-request. There will be some duplication for a few upcoming PRs, but this is OK.

6 seems to be the easiest from a Mozilla UX point of view, w.r.t not bothering about a parameter named alignment when all they want is HTML translation.

2a will improve the at model construction UX.

Requesting stakeholders to comment, offer any alternate suggestions.

The text was updated successfully, but these errors were encountered:

jerinphilip · 2021-11-30T18:51:01Z

There is also a very specific solution for 2a. The string-based constructor is only used by WebAssembly. We can keep a function that always sets the return from parseOptionsFromString to have alignment: soft.

bergamot-translator/src/translator/translation_model.h

Lines 49 to 50 in e8fd01e

    
           TranslationModel(const std::string& config, MemoryBundle&& memory, size_t replicas = 1) 
        
               : TranslationModel(parseOptionsFromString(config, /*validate=*/false), std::move(memory), replicas){};

abhi-agg · 2021-12-01T10:29:44Z

Adding the rest of the stakeholders to be able to comment/offer suggestions @andrenatal @XapaJIaMnu

It is safe to assume Mozilla always will need models capable of HTML translation

Right on that. Firefox will always call bergamot-translator to perform html-text translation.

2. What alignment: soft serves is the purpose of a switch to save some computation on the construction of Alignments. This in Marian is managed by the configuration. Currently, firefox through bindings also manages this through configuration.
a. Are there clients who would like to retain an alignment: none setting? Perhaps translateLocally? If not we might default to this always being true and incur any additional computation costs (

For Firefox, we don't mind setting alignment: soft explicitly if we have to (since it seems to have speed penalty, it is fine to not set alignment: soft as default and expect users to set this during translator construction). We don't have any strong preference here 👍🏾

3. We can read the configuration at runtime at the translate(...) call and check if it is capable of providing alignments. This can be used in throwing an error or abort, essentially telling the client "you tried to translate HTML without giving me alignments". We cannot write (set/unset, because we also support a multithreaded setting and the model config is shared among multiple translation requests).

Passing ResponseOptions::HTML = true in ResponseOptions of the translate() request for html text translation can throw error if alignment: soft wasn't provided during translator construction. This is what you imply here. Right?

5. ResponseOptions is a plain-old-datatype that makes things easy for bindings(?). We can change it to a class with constraints on construction (HTML => alignment true always).
6. We can at call time edit ResponseOptions.alignment = true if ResponseOptions.HTML = true as we get it with the translation-request. There will be some duplication for a few upcoming PRs, but this is OK.

As I mentioned in #266 (comment), we are misinterpreting the ResponseOptions::alignment option and mixing it somehow with ResponseOptions::HTML (correct me if I am wrong here).

ResponseOptions::alignment: The purpose (when bergamot-translator was doing only plain-text translation) of this option was to return the alignments b/w source and target tokens in the Response so that the users can re-insert html tags themselves.

ResponseOptions::HTML : An option to enable html text translation (introduced recently)

Now that bergamot-translator supports html text translation, we can very well argue whether ResponseOptions::alignment should even exist or not. Some might want to do plain-text translation and do the tag re-insertion themselves and therefore, it might be fine to still keep this option. We don't have strong preference here.

But, please let's not associate/mix ResponseOptions::HTML with ResponseOptions::alignment option as purposes of both are different as far as the API is concerned. (I can imagine that setting anyone of ResponseOptions::HTML or ResponseOptions::alignment to true will require alignment: soft internally)

Please let me know if I missed something here 👍🏾

jerinphilip · 2021-12-01T13:43:37Z

@abhi-agg: Thank you for providing the preferences. I'll try to answer quoting a few queries below.

Passing ResponseOptions::HTML = true in ResponseOptions of the translate() request for HTML text translation can throw error if alignment: soft wasn't provided during translator construction. This is what you imply here. Right?

Yes.

As I mentioned in #266 (comment), we are misinterpreting the ResponseOptions::alignment option and mixing it somehow with ResponseOptions::HTML

I don't fully understand the above, but let me try to add more context which I hope will make things clear. HTML is dependent on alignments. Right now we allow both (HTML, alignments) to be set in a way by the client that is not consistent (HTML=true, alignments=false) and there is no means of recovery - which is the problem. I can sanitize ResponseOptions as demonstrated in 65a14cf (#271), which could be a solution. Options conditioned on other options in full generality is a lot of additional code, I think.

Now that bergamot-translator supports HTML text translation, we can very well argue whether ResponseOptions::alignment should even exist or not.

I believe bergamot-translator as a machine-translator library adds some value other than just an intermediary between Marian-dev and extension. Bergamot's non-HTML pipeline translation capabilities are still intact. There are two more clients which use these now at least - translateLocally and lemonade. There are more applications to alignment for clients who may need it. One quick example is perhaps building something like https://twitter.com/marian_nmt/status/1465710492752646151. I understand translateLocally uses only alignments and not HTML translate at the moment. For an idea of explainable translation to a firefox user, we will be using alignments in a similar setting to show which word mapped to which - if that feature ever comes to implementation.

But, please let's not associate/mix ResponseOptions::HTML with ResponseOptions::alignment option as purposes of both are different as far as the API is concerned.

The intentions resonate with me. But allow me to defer perfecting this until #236 which now needs to support HTML as well gets in. I understand from the current state of main that you've accepted the tradeoff off having two extra key-values over perfect API UX. #271 is something that I expect will not cause delays for me in #236 and follow-up synchronization if you want something urgently. I believe it's best to set the sanitization or defaults once we know how HTML gets moved around for purposes of #236.

kpu · 2021-12-01T13:53:09Z

Kill hard alignments.

jerinphilip · 2022-03-07T23:03:34Z

#296 fixed having both alignments and HTML on. The config UX changes are too cumbersome and I think there's signal enough now (from aborts provided from HTML pipeline):

bergamot-translator/src/translator/html.cpp

Lines 518 to 519 in ab7f84f

    
           ABORT_UNLESS(hasAlignments(response), 
        
                        "Response object does not contain alignments. TranslationModel or ResponseOptions is misconfigured?");

Other bergamot-devs also appear to enjoy the config tweakability - from #361 (comment):

I'm pretty happy with having a std::string (or the Yaml datastructure that marian uses) as the configuration. That way bergamot-translator doesn't need code changes to be synced with every update to marian.

Closing this as resolved.

jerinphilip changed the title ~~Setting HTML options to ease of UX~~ Setting HTML options towards ease of usage Nov 30, 2021

jerinphilip mentioned this issue Nov 30, 2021

Defaults for TranslationModel and sanitize for ResponseOptions #271

Closed

abhi-agg mentioned this issue Dec 1, 2021

Updated configuration for html text translation to work in wasm test page #269

Merged

jerinphilip mentioned this issue Dec 1, 2021

Better command-line with isolation for both Services and co-located defaults and parsing #252

Merged

jerinphilip mentioned this issue Jan 2, 2022

Have alignments placed if HTML is on #296

Merged

jerinphilip closed this as completed Mar 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting HTML options towards ease of usage #270

Setting HTML options towards ease of usage #270

jerinphilip commented Nov 30, 2021 •

edited

Loading

jerinphilip commented Nov 30, 2021 •

edited

Loading

abhi-agg commented Dec 1, 2021

jerinphilip commented Dec 1, 2021

kpu commented Dec 1, 2021

jerinphilip commented Mar 7, 2022 •

edited

Loading

Setting HTML options towards ease of usage #270

Setting HTML options towards ease of usage #270

Comments

jerinphilip commented Nov 30, 2021 • edited Loading

jerinphilip commented Nov 30, 2021 • edited Loading

abhi-agg commented Dec 1, 2021

jerinphilip commented Dec 1, 2021

kpu commented Dec 1, 2021

jerinphilip commented Mar 7, 2022 • edited Loading

jerinphilip commented Nov 30, 2021 •

edited

Loading

jerinphilip commented Nov 30, 2021 •

edited

Loading

jerinphilip commented Mar 7, 2022 •

edited

Loading