Ideas to improve "yr fmt" #192

wxsBSD · 2024-09-09T19:36:47Z

plusvic · 2024-09-09T19:53:22Z

I like those and add a few more:

Optional indentation of rule body:

rule foo {
strings:
  $a = ...
condition:
  $a
}

vs

rule foo {
  strings:
    $a = ...
  condition:
    $a
}

Optional alignment of metadata and patterns.

rule foo {
  meta: 
    short = "..."
    very_long = "..."
  strings:
    $short = "..."
    $very_long = "..."
  condition:
    all of them
}

vs

rule foo {
  meta: 
    short     = "..."
    very_long = "..."
  strings:
    $short     = "..."
    $very_long = "..."
  condition:
    all of them
}

My plan was having a TOML configuration file not only for the code formatter, but for the CLI as a whole, where code formatting is a section. This crate could be useful for finding the user's home directory: https://crates.io/crates/home

wxsBSD · 2024-09-09T20:05:18Z

I haven't looked into it yet, but if possible I think all of this should be done in a library so it can be easily used by other tools. This way we get consistent formatting behavior across a variety of tools if they all use the library. I don't know what the equivalent of "libyara" is in YARA-X but it would make sense to expose it there, I think.

plusvic · 2024-09-09T20:35:29Z

I haven't looked into it yet, but if possible I think all of this should be done in a library so it can be easily used by other tools. This way we get consistent formatting behavior across a variety of tools if they all use the library. I don't know what the equivalent of "libyara" is in YARA-X but it would make sense to expose it there, I think.

Yes, of course. The config file will be used only by the CLI, for setting the right configuration for the yara-x-fmt crate. The Formatter object in that crate could have methods for adjusting each setting.

import-pandas-as-numpy · 2024-09-10T02:23:20Z

Control over 'segmentation' of hex identifiers.

Seems reasonable to me that someone might want to have control over newline behavior in hex identifiers, and a rational default of 16 seems non-contemptible to settle on for this behavior (if not already implemented.)

plusvic · 2024-09-10T06:51:11Z

Introduction of line breaks at appropriate places, specially in the case of rule condition, is a pending improvement too. That's a bit more complicated, it's not a low hanging fruit like the rest. At his moment the formatter let you control how the lines are broken.

tlansec · 2024-09-16T08:31:41Z

I just spotted this nd thought I'd add a few things we implement externally to YARA that might be worth considering in a tool like this:

Enforce certain metadata fields (e.g. they must exist and be of type x)
Line breaks between strings using different prefixes, e.g.

strings:
   $s1 = "abc"
   $s2 = "def"
   
   $t1 = "ghi"
   $t2 = "jkl"
...

Alphabeticaly sort metadata.
Remove trailing spaces after defined strings. $s = "abc" <-last space should get stripped here.

I'm sure there's more, but I can't think of any right now.

wxsBSD · 2024-10-12T00:40:58Z

Enforce certain metadata fields (e.g. they must exist and be of type x)

This one is veering more into what I would consider a linter than a formatter. Typically, as long as the rule is syntactically valid the formatter should not care about the specific content of the rule. However, with that said I'm thinking it may be useful to combine this idea into the formatter.

I'm thinking of a config that would be like (python-like syntax below, I'd have to figure out how to do it in toml):

rule.meta.required_fields = [("description", "string"), ("date", "integer")]

The idea would be a list of tuples that define the field name and required type. The linter could then use this to ensure the fields exist and are of the correct type. The default would have to be an empty list.

The idea of bringing in linter ideas here can also be extended to other interesting things that people like to do when writing rules. We could have something like:

rule.name_regex = "FOO"

With this you could define a regex that must match the rule name. Some places encode things in the rule name and having a way to enforce it in the linter would be nice.

What are your thoughts on these @plusvic?

Line breaks between strings using different prefixes, e.g.

I've added this and the required fields one to the list at the top of this thread for tracking.

Neo23x0 · 2024-10-12T07:21:08Z

Introduction of line breaks at appropriate places, specially in the case of rule condition, is a pending improvement too. That's a bit more complicated, it's not a low hanging fruit like the rest.

Formatting rule conditions is indeed more complex. I've outlined the best approach to handle this in my YARA Style Guide. For instance, many beginners tend to place "and" or "or" at the end of a line:

rule RULE_NAME : TAGS {
    ...
    condition:
        uint16(0) == 0x5a4d and
        filesize < 300KB and 
        pe.number_of_signature == 0 and 
        all of ($s*) and 
        not 1 of ($fp*) and 
}

However, through experience, I've found that placing these operators at the start of the line improves readability and reduces the time required to understand the condition:

rule RULE_NAME : TAGS {
    ...
    condition:
        uint16(0) == 0x5a4d 
        and filesize < 300KB 
        and pe.number_of_signature == 0
        and all of ($s*)
        and not 1 of ($fp*)
}

Line breaks between strings using different prefixes, e.g.

This is trickier than it might seem:

   strings:
      $sa1 = "byte[] buff = System.Convert.FromBase64String(Request.Form[\"password\"]);    " ascii fullword
      $sa2 = "<% }if(Request.Form[\"password\"]!=null){" ascii fullword
      $ss1 = "MethodInfo method = type.GetMethod(\"exec\");" ascii fullword

The challenge is that the prefix might not always be obvious. For instance, is it "s," "sa," or "ss"? It's not always straightforward to apply consistent line breaks.

Alphabetically sort metadata.

I wouldn’t recommend sorting metadata alphabetically. While it might seem neat, it’s more important to order metadata by relevance. Just as on an ID card or CV, the key details should appear first. For example, description should come before anything else, and less critical information like hash values can go towards the end. Sorting alphabetically would place less relevant fields like author at the top, even though they are not essential for understanding the rule.

With this you could define a regex that must match the rule name. Some places encode things in the rule name and having a way to enforce it in the linter would be nice.

While my style guide provides recommendations for naming YARA rules, I’m hesitant to enforce a strict naming convention. Internally, our team does apply such standards to maintain consistency in our rule sets. However, outside of a specific team or context, I’d prefer to offer guidance rather than impose mandatory enforcement. Consistency is valuable, but flexibility also has its place.

tlansec · 2024-10-14T08:08:40Z

Hello!

I think that for any formatter these things can be optional, so even if some people might not use them, I think having the option to use them isn't necessarily a bad thing. To give some rationale for some of the points discussed above:

Alphabetically sort metadata.

As in Python (isort), this makes it easiest to figure out what metadata is or is not present, and means I can keyword scan down the left. e.g. I'm looking for an "author" field, I'd quickly find its position regardless of who wrote the rule.

operators at EOL vs beginning

I think it's OK either way. The main thing for me for with this is how easy it is to comment out parts of the condition to test the rule without them. Perhaps due to muscle-memory I find the EOL operator easier to read. In the example you listed where the rule consists of a series of and clauses anyway, I would argue its easier to read with the operator at the end of the line as you can just scan the left hand side -- you'll find that often rules are written this way. If you are mixing and and or you'll need brackets for the rule to work anyway, in which case the brackets do the leg-work in terms of understanding how the rule works.

Cheers,
Tom

wxsBSD mentioned this issue Sep 29, 2024

feat: Implement toml config for fmt command. #205

Merged

wxsBSD mentioned this issue Oct 12, 2024

feat: More fmt improvements #224

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas to improve "yr fmt" #192

Ideas to improve "yr fmt" #192

wxsBSD commented Sep 9, 2024 •

edited

Loading

plusvic commented Sep 9, 2024

wxsBSD commented Sep 9, 2024

plusvic commented Sep 9, 2024 •

edited

Loading

import-pandas-as-numpy commented Sep 10, 2024

plusvic commented Sep 10, 2024

tlansec commented Sep 16, 2024

wxsBSD commented Oct 12, 2024 •

edited

Loading

Neo23x0 commented Oct 12, 2024

tlansec commented Oct 14, 2024 •

edited

Loading

Ideas to improve "yr fmt" #192

Ideas to improve "yr fmt" #192

Comments

wxsBSD commented Sep 9, 2024 • edited Loading

plusvic commented Sep 9, 2024

wxsBSD commented Sep 9, 2024

plusvic commented Sep 9, 2024 • edited Loading

import-pandas-as-numpy commented Sep 10, 2024

plusvic commented Sep 10, 2024

tlansec commented Sep 16, 2024

wxsBSD commented Oct 12, 2024 • edited Loading

Neo23x0 commented Oct 12, 2024

tlansec commented Oct 14, 2024 • edited Loading

wxsBSD commented Sep 9, 2024 •

edited

Loading

plusvic commented Sep 9, 2024 •

edited

Loading

wxsBSD commented Oct 12, 2024 •

edited

Loading

tlansec commented Oct 14, 2024 •

edited

Loading