using yara regex rule to scan chinese character, error #1952

hanggao481 · 2023-08-17T10:09:32Z

How to use yara regex rule to scan chinese character? what's the reason of the following error match?

Describe the bug
my yara rule:
rule AsianCharacter : general
{
strings:
$chinese = /[\u8fd9]/
condition:
$chinese
}

match result:
0x1cd:$chinese: u
0x1d2:$chinese: f
0x1dd:$chinese: 8

Expected behavior
expecting match result:
0x1cd:$chinese: 这

Note:
unicode of "这" is \u8fd9

hanggao481 · 2023-08-18T02:37:16Z

another example: I want to scan Chinese character by regex yara rules as beloww:
rule AsianCharacter : general
{
strings:
$chinese = /[\u4e00-\u9fa5]/
condition:
$chinese
}
Problem:
it cannot match Chinese character.

vthib · 2023-08-20T10:14:44Z

Yara does not have unicode handling in strings, and the \u syntax does not exist. What you wrote is actually [u8fd9], so one of those five ascii bytes.

If you want to search for a non ascii character, you will need to search for the bytes that match its encoding in the files you search. For utf-8 files, that would mean something like this:

rule AsianCharacter : general
{
  strings:
    $chinese = /\xe8\xbf\x99/
  condition:
    $chinese
}

For utf-16 encoding, I guess something like that /\x8f\xd9/.

Note that because you need to encode in a given encoding, you cannot use ranges like in your second example.

gaohang · 2023-09-04T01:27:47Z

Yara does not have unicode handling in strings, and the \u syntax does not exist. What you wrote is actually [u8fd9], so one of those five ascii bytes.

If you want to search for a non ascii character, you will need to search for the bytes that match its encoding in the files you search. For utf-8 files, that would mean something like this:
rule AsianCharacter : general
{
  strings:
    $chinese = /\xe8\xbf\x99/
  condition:
    $chinese
}
For utf-16 encoding, I guess something like that /\x8f\xd9/.

Note that because you need to encode in a given encoding, you cannot use ranges like in your second example.

Thanks. Is there any way to use yara to match Chinese characters ? It means that a scope of unicode can be a yara regex like general regex, e.g. [\u4e00-\u9fa5].

hanggao481 added the bug label Aug 17, 2023

dh-orko mentioned this issue Aug 18, 2023

another example: I want to scan Chinese character by regex yara rules as beloww: #1953

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using yara regex rule to scan chinese character, error #1952

using yara regex rule to scan chinese character, error #1952

hanggao481 commented Aug 17, 2023

hanggao481 commented Aug 18, 2023

vthib commented Aug 20, 2023

gaohang commented Sep 4, 2023 •

edited

Loading

using yara regex rule to scan chinese character, error #1952

using yara regex rule to scan chinese character, error #1952

Comments

hanggao481 commented Aug 17, 2023

hanggao481 commented Aug 18, 2023

vthib commented Aug 20, 2023

gaohang commented Sep 4, 2023 • edited Loading

gaohang commented Sep 4, 2023 •

edited

Loading