Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting HTML options towards ease of usage #270

Closed
jerinphilip opened this issue Nov 30, 2021 · 5 comments
Closed

Setting HTML options towards ease of usage #270

jerinphilip opened this issue Nov 30, 2021 · 5 comments

Comments

@jerinphilip
Copy link
Contributor

jerinphilip commented Nov 30, 2021

So, there are two options for alignment which HTML is dependent on - the first which decides if a (Marian) model is alignment capable or not decided at model-construction, the second which decides if Response - a bergamot primitive should deal with alignments or not effects of which appear at runtime on a translate call.

  1. It is safe to assume Mozilla always will need models capable of HTML translation (a few plain-text without HTML=false options).
  2. What alignment: soft serves is the purpose of a switch to save some computation on the construction of Alignments. This in Marian is managed by the configuration. Currently, firefox through bindings also manages this through configuration.
    a. Are there clients who would like to retain an alignment: none setting? Perhaps translateLocally? If not we might default to this always being true and incur any additional computation costs (Updated configuration for html text translation to work in wasm test page #269 (comment)).
  3. We can read the configuration at runtime at the translate(...) call and check if it is capable of providing alignments. This can be used in throwing an error or abort, essentially telling the client "you tried to translate HTML without giving me alignments". We cannot write (set/unset, because we also support a multithreaded setting and the model config is shared among multiple translation requests).
  4. @jelmervdl can have HTML pipeline cry if no alignments. Currently, it provides an illusion that things work if no alignment matrix, we should consider this a bug, given we have something superior to length-based tag transfer which @kpu originally requested.
  5. ResponseOptions is a plain-old-datatype that makes things easy for bindings(?). We can change it to a class with constraints on construction (HTML => alignment true always).
  6. We can at call time edit ResponseOptions.alignment = true if ResponseOptions.HTML = true as we get it with the translation-request. There will be some duplication for a few upcoming PRs, but this is OK.

6 seems to be the easiest from a Mozilla UX point of view, w.r.t not bothering about a parameter named alignment when all they want is HTML translation.

2a will improve the at model construction UX.

Requesting stakeholders to comment, offer any alternate suggestions.

@jerinphilip jerinphilip changed the title Setting HTML options to ease of UX Setting HTML options towards ease of usage Nov 30, 2021
@jerinphilip
Copy link
Contributor Author

jerinphilip commented Nov 30, 2021

There is also a very specific solution for 2a. The string-based constructor is only used by WebAssembly. We can keep a function that always sets the return from parseOptionsFromString to have alignment: soft.

TranslationModel(const std::string& config, MemoryBundle&& memory, size_t replicas = 1)
: TranslationModel(parseOptionsFromString(config, /*validate=*/false), std::move(memory), replicas){};

@abhi-agg
Copy link
Contributor

abhi-agg commented Dec 1, 2021

Adding the rest of the stakeholders to be able to comment/offer suggestions @andrenatal @XapaJIaMnu

  1. It is safe to assume Mozilla always will need models capable of HTML translation

Right on that. Firefox will always call bergamot-translator to perform html-text translation.

2. What alignment: soft serves is the purpose of a switch to save some computation on the construction of Alignments. This in Marian is managed by the configuration. Currently, firefox through bindings also manages this through configuration.
a. Are there clients who would like to retain an alignment: none setting? Perhaps translateLocally? If not we might default to this always being true and incur any additional computation costs (

For Firefox, we don't mind setting alignment: soft explicitly if we have to (since it seems to have speed penalty, it is fine to not set alignment: soft as default and expect users to set this during translator construction). We don't have any strong preference here 👍🏾

3. We can read the configuration at runtime at the translate(...) call and check if it is capable of providing alignments. This can be used in throwing an error or abort, essentially telling the client "you tried to translate HTML without giving me alignments". We cannot write (set/unset, because we also support a multithreaded setting and the model config is shared among multiple translation requests).

Passing ResponseOptions::HTML = true in ResponseOptions of the translate() request for html text translation can throw error if alignment: soft wasn't provided during translator construction. This is what you imply here. Right?

5. ResponseOptions is a plain-old-datatype that makes things easy for bindings(?). We can change it to a class with constraints on construction (HTML => alignment true always).
6. We can at call time edit ResponseOptions.alignment = true if ResponseOptions.HTML = true as we get it with the translation-request. There will be some duplication for a few upcoming PRs, but this is OK.

As I mentioned in #266 (comment), we are misinterpreting the ResponseOptions::alignment option and mixing it somehow with ResponseOptions::HTML (correct me if I am wrong here).

ResponseOptions::alignment: The purpose (when bergamot-translator was doing only plain-text translation) of this option was to return the alignments b/w source and target tokens in the Response so that the users can re-insert html tags themselves.

ResponseOptions::HTML : An option to enable html text translation (introduced recently)

Now that bergamot-translator supports html text translation, we can very well argue whether ResponseOptions::alignment should even exist or not. Some might want to do plain-text translation and do the tag re-insertion themselves and therefore, it might be fine to still keep this option. We don't have strong preference here.

But, please let's not associate/mix ResponseOptions::HTML with ResponseOptions::alignment option as purposes of both are different as far as the API is concerned. (I can imagine that setting anyone of ResponseOptions::HTML or ResponseOptions::alignment to true will require alignment: soft internally)

Please let me know if I missed something here 👍🏾

@jerinphilip
Copy link
Contributor Author

@abhi-agg: Thank you for providing the preferences. I'll try to answer quoting a few queries below.

Passing ResponseOptions::HTML = true in ResponseOptions of the translate() request for HTML text translation can throw error if alignment: soft wasn't provided during translator construction. This is what you imply here. Right?

Yes.

As I mentioned in #266 (comment), we are misinterpreting the ResponseOptions::alignment option and mixing it somehow with ResponseOptions::HTML

I don't fully understand the above, but let me try to add more context which I hope will make things clear. HTML is dependent on alignments. Right now we allow both (HTML, alignments) to be set in a way by the client that is not consistent (HTML=true, alignments=false) and there is no means of recovery - which is the problem. I can sanitize ResponseOptions as demonstrated in 65a14cf (#271), which could be a solution. Options conditioned on other options in full generality is a lot of additional code, I think.

Now that bergamot-translator supports HTML text translation, we can very well argue whether ResponseOptions::alignment should even exist or not.

I believe bergamot-translator as a machine-translator library adds some value other than just an intermediary between Marian-dev and extension. Bergamot's non-HTML pipeline translation capabilities are still intact. There are two more clients which use these now at least - translateLocally and lemonade. There are more applications to alignment for clients who may need it. One quick example is perhaps building something like https://twitter.com/marian_nmt/status/1465710492752646151. I understand translateLocally uses only alignments and not HTML translate at the moment. For an idea of explainable translation to a firefox user, we will be using alignments in a similar setting to show which word mapped to which - if that feature ever comes to implementation.

But, please let's not associate/mix ResponseOptions::HTML with ResponseOptions::alignment option as purposes of both are different as far as the API is concerned.

The intentions resonate with me. But allow me to defer perfecting this until #236 which now needs to support HTML as well gets in. I understand from the current state of main that you've accepted the tradeoff off having two extra key-values over perfect API UX. #271 is something that I expect will not cause delays for me in #236 and follow-up synchronization if you want something urgently. I believe it's best to set the sanitization or defaults once we know how HTML gets moved around for purposes of #236.

@kpu
Copy link
Member

kpu commented Dec 1, 2021

Kill hard alignments.

@jerinphilip
Copy link
Contributor Author

jerinphilip commented Mar 7, 2022

#296 fixed having both alignments and HTML on. The config UX changes are too cumbersome and I think there's signal enough now (from aborts provided from HTML pipeline):

ABORT_UNLESS(hasAlignments(response),
"Response object does not contain alignments. TranslationModel or ResponseOptions is misconfigured?");

Other bergamot-devs also appear to enjoy the config tweakability - from #361 (comment):

I'm pretty happy with having a std::string (or the Yaml datastructure that marian uses) as the configuration. That way bergamot-translator doesn't need code changes to be synced with every update to marian.

Closing this as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants