Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Why bsl language is not supported? #3

Open
nixel2007 opened this issue Oct 30, 2023 · 10 comments
Open

[question] Why bsl language is not supported? #3

nixel2007 opened this issue Oct 30, 2023 · 10 comments

Comments

@nixel2007
Copy link

nixel2007 commented Oct 30, 2023

Hello! I've noticed that BSL language is excluded. Could you explain why? Does it use some feature that is not supported in libprisma? We could try to adjust the grammar to make it work with your lib.

Thanks in advance!

@exclued
Copy link

exclued commented Nov 1, 2023

Seems it have something to do with unsupported UTF-16 characters in the language grammar.

function sanitize(pattern) { // Unsupported: // UTF-16 ranges
from generate.js

Compare this with BSL definition in https://github.com/PrismJS/prism/blob/master/components/prism-bsl.js

@nixel2007
Copy link
Author

nixel2007 commented Nov 1, 2023

Thanks, I'll take a look. For most cases these utf-16 sequences can be simplified to [а-яё], which are in utf-8 range

@nixel2007
Copy link
Author

nixel2007 commented Nov 1, 2023

prism.js v1 does not merge any PRs these days. Will you approve a .patch file with patch to bsl grammar and additions to generate.js/github workflow to apply the patch it at place?

@nixel2007
Copy link
Author

nixel2007 commented Nov 27, 2023

Hello, team! I'm looking into the new way to include bsl language into libprisma introduced recently. I want to clarify what does UTF-16 ranges in sanitize mean? Do you not support /uXXXX sequences at all or some concrete range (\uD800+ for example)?
If /uXXXX sequences are not supported at all, is there any way to add cyrillic letters into regex? will regular [а-яё] work?

/cc @FrayxRulez

@nixel2007
Copy link
Author

Anyone? :)

@TelegramMessenger TelegramMessenger deleted a comment from mm8191 Sep 20, 2024
@FrayxRulez
Copy link
Collaborator

Hi @nixel2007 sorry for the delay.
Please, check Boost.Regex specifications in regards of UTF-16 support, as there are limitations in different areas (we use string and not wstring, and Boost regex syntax is slightly different and more limited than JS one).
It's important that this works on all platforms, UNIX and Windows have different behaviors when dealing with strings.
You can probably test this by generating a new grammars.dat file with only bsl enabled (you'll have to edit the generation script) and try to load it using the library. You'll see that Boost will crash right away while trying to interpret the patterns.

@nixel2007
Copy link
Author

Hi, @FrayxRulez !

Thanks for your input, it's clear. I'll take a look.

@bapho-bush
Copy link

bapho-bush commented Sep 22, 2024

we use string and not wstring, and Boost regex syntax is slightly different and more limited than JS one

Could you please describe reasons why don't you use std::wstring and boost::wregex?

@mm8191
Copy link

mm8191 commented Sep 25, 2024

دقیقا یادم نیس پیام مال چی بوده ولی توکن و دارایی من گم شده باید چکارکنم کی مسئول است عایا باید ب پاول دولف پیام بدم

@mm8191
Copy link

mm8191 commented Sep 25, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants