Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces before XML tags are randomly removed in the translation and quotes around the tags might end up within the tag #48

Open
funnel20 opened this issue Jun 18, 2024 · 0 comments

Comments

@funnel20
Copy link

funnel20 commented Jun 18, 2024

Describe the bug
When using the latest version 1.13.0 of the Deepl NodeJS lib I notice an issue with XML tags.
When using source text Please start your '<x id=p1>Basic</x>' plan by clicking the button '<x id=p2>Accept</x>'., the translation gets different syntax around the <x></x> tags.

To Reproduce
Steps to reproduce the behavior:

  1. API integrated correctly in NodeJS project
  2. Call:
    const result = await translator.translateText("Please start your '<x id=p1>Basic</x>' plan by clicking the button '<x id=p2>Accept</x>'.", "en", "de", { tagHandling: 'xml' });
  3. The console output with the German translation:
    Bitte starten Sie Ihren<x id=p1>'Basic</x>'-Plan, indem Sie auf die Schaltfläche<x id=p2>'Akzeptieren</x>' klicken.
    
  4. Analysis of translation syntax:
  • The original English tags are surrounded by single quotes: '<x id=p1>Basic</x>', while in the German output the opening quote is moved within the tags: <x id=p1>'Basic</x>'
  • The original English opening tags have a space in front of them: your '<x id=p1>Basic</x>', while in the German output the opening tag is directly concatenated to the previous word: Ihren<x id=p1>'Basic</x>'
    The expected output should be:
    Bitte starten Sie Ihren '<x id=p1>Basic</x>'-Plan, indem Sie auf die Schaltfläche '<x id=p2>Akzeptieren</x>' klicken.
    
  1. Added parameters preserveFormatting: true, outlineDetection: true and nonSplittingTags: ['x'], but each individual or all possible combinations provide the same German output string.

Expected behavior
It's expected that formatting characters (like spaces) and other non-translatable characters (like quotes) around tags are maintained, especially when option preserveFormatting is set to true.

Update
After creating this post I did some more testing. It appears that the (single) quotes might be the issue. When using double quotes, the same issue occurs.
However, when removing the quotes around the XML tags:

const result = await translator.translateText("Please start your <x id=p1>Basic</x> plan by clicking the button <x id=p2>Accept</x>.", "en", "de", { tagHandling: 'xml' });

The output maintains the spaces around the tags ✅:

Bitte starten Sie Ihren <x id=p1>Basic-Plan</x>, indem Sie auf die Schaltfläche <x id=p2>Akzeptieren</x> klicken.

Update 2
After creating this post I noticed that I didn't use quotes for the value of attribute id (see table "With Attributes" at https://developers.deepl.com/docs/xml-and-html-handling/xml). So basically my input string was malformed XML.

However, when applying quotes around p1 and p2, the API still returns the same erroneous output:

const result = await translator.translateText("Please start your '<x id="p1">Basic</x>' plan by clicking the button '<x id="p2">Accept</x>'.", "en", "de", { tagHandling: 'xml' });

Question
Why doesn't the API handle quotes around XML tags properly?

Screenshots
N/A

Desktop (please complete the following information):

  • OS: macOS 14.5

Additional context

  • npm deepl-node 1.13.0
  • NodeJS 16.6.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant