Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas to improve "yr fmt" #192

Open
7 of 12 tasks
wxsBSD opened this issue Sep 9, 2024 · 9 comments
Open
7 of 12 tasks

Ideas to improve "yr fmt" #192

wxsBSD opened this issue Sep 9, 2024 · 9 comments

Comments

@wxsBSD
Copy link
Contributor

wxsBSD commented Sep 9, 2024

I haven't dug into it in detail yet but as discussed elsewhere, we should add various options to allow people to control the formatting (and even other things, like variable names, maybe?) in "yr fmt" - I'm happy to do the work here but want to open this issue to get input on it.

For example, I'm thinking of things like

  • Use tabs vs spaces
  • Controllable number of spaces for identation
  • Curly brace on new line for rule declaration (eg: rule foo {\n vs rule foo\n{)
  • Newline after section headers
  • How loops should be formatted (eg: for any i in (0..1): (\n vs for any i in (0..1):\n()
  • Optional indentation of rule body
  • Optional alignment of metadata and patterns
  • Alphabetize metadata
  • Alphabetize strings
  • Remove trailing whitespace
  • Enforce certain metadata fields (e.g. they must exist and be of type x)
  • Line breaks between strings using different prefixes, e.g.

I'm sure there are plenty more here to be found, especially as I dig into the code, but hoping to collect your thoughts @plusvic (and others) on ways to improve "yr fmt" so I can go off and start implementing.

EDIT: I'm putting the things mentioned below into the list above so I can edit it to mark the ones I have done.

@plusvic
Copy link
Member

plusvic commented Sep 9, 2024

I like those and add a few more:

  • Optional indentation of rule body:
rule foo {
strings:
  $a = ...
condition:
  $a
}

vs

rule foo {
  strings:
    $a = ...
  condition:
    $a
}
  • Optional alignment of metadata and patterns.
rule foo {
  meta: 
    short = "..."
    very_long = "..."
  strings:
    $short = "..."
    $very_long = "..."
  condition:
    all of them
}

vs

rule foo {
  meta: 
    short     = "..."
    very_long = "..."
  strings:
    $short     = "..."
    $very_long = "..."
  condition:
    all of them
}

My plan was having a TOML configuration file not only for the code formatter, but for the CLI as a whole, where code formatting is a section. This crate could be useful for finding the user's home directory: https://crates.io/crates/home

@wxsBSD
Copy link
Contributor Author

wxsBSD commented Sep 9, 2024

I haven't looked into it yet, but if possible I think all of this should be done in a library so it can be easily used by other tools. This way we get consistent formatting behavior across a variety of tools if they all use the library. I don't know what the equivalent of "libyara" is in YARA-X but it would make sense to expose it there, I think.

@plusvic
Copy link
Member

plusvic commented Sep 9, 2024

I haven't looked into it yet, but if possible I think all of this should be done in a library so it can be easily used by other tools. This way we get consistent formatting behavior across a variety of tools if they all use the library. I don't know what the equivalent of "libyara" is in YARA-X but it would make sense to expose it there, I think.

Yes, of course. The config file will be used only by the CLI, for setting the right configuration for the yara-x-fmt crate. The Formatter object in that crate could have methods for adjusting each setting.

@import-pandas-as-numpy
Copy link

Control over 'segmentation' of hex identifiers.

Seems reasonable to me that someone might want to have control over newline behavior in hex identifiers, and a rational default of 16 seems non-contemptible to settle on for this behavior (if not already implemented.)

@plusvic
Copy link
Member

plusvic commented Sep 10, 2024

Introduction of line breaks at appropriate places, specially in the case of rule condition, is a pending improvement too. That's a bit more complicated, it's not a low hanging fruit like the rest. At his moment the formatter let you control how the lines are broken.

@tlansec
Copy link

tlansec commented Sep 16, 2024

I just spotted this nd thought I'd add a few things we implement externally to YARA that might be worth considering in a tool like this:

  • Enforce certain metadata fields (e.g. they must exist and be of type x)
  • Line breaks between strings using different prefixes, e.g.
strings:
   $s1 = "abc"
   $s2 = "def"
   
   $t1 = "ghi"
   $t2 = "jkl"
...
  • Alphabeticaly sort metadata.
  • Remove trailing spaces after defined strings. $s = "abc" <-last space should get stripped here.

I'm sure there's more, but I can't think of any right now.

@wxsBSD
Copy link
Contributor Author

wxsBSD commented Oct 12, 2024

  • Enforce certain metadata fields (e.g. they must exist and be of type x)

This one is veering more into what I would consider a linter than a formatter. Typically, as long as the rule is syntactically valid the formatter should not care about the specific content of the rule. However, with that said I'm thinking it may be useful to combine this idea into the formatter.

I'm thinking of a config that would be like (python-like syntax below, I'd have to figure out how to do it in toml):

rule.meta.required_fields = [("description", "string"), ("date", "integer")]

The idea would be a list of tuples that define the field name and required type. The linter could then use this to ensure the fields exist and are of the correct type. The default would have to be an empty list.

The idea of bringing in linter ideas here can also be extended to other interesting things that people like to do when writing rules. We could have something like:

rule.name_regex = "FOO"

With this you could define a regex that must match the rule name. Some places encode things in the rule name and having a way to enforce it in the linter would be nice.

What are your thoughts on these @plusvic?

  • Line breaks between strings using different prefixes, e.g.

I've added this and the required fields one to the list at the top of this thread for tracking.

@Neo23x0
Copy link

Neo23x0 commented Oct 12, 2024

Introduction of line breaks at appropriate places, specially in the case of rule condition, is a pending improvement too. That's a bit more complicated, it's not a low hanging fruit like the rest.

Formatting rule conditions is indeed more complex. I've outlined the best approach to handle this in my YARA Style Guide. For instance, many beginners tend to place "and" or "or" at the end of a line:

rule RULE_NAME : TAGS {
    ...
    condition:
        uint16(0) == 0x5a4d and
        filesize < 300KB and 
        pe.number_of_signature == 0 and 
        all of ($s*) and 
        not 1 of ($fp*) and 
}

However, through experience, I've found that placing these operators at the start of the line improves readability and reduces the time required to understand the condition:

rule RULE_NAME : TAGS {
    ...
    condition:
        uint16(0) == 0x5a4d 
        and filesize < 300KB 
        and pe.number_of_signature == 0
        and all of ($s*)
        and not 1 of ($fp*)
}

Line breaks between strings using different prefixes, e.g.

This is trickier than it might seem:

   strings:
      $sa1 = "byte[] buff = System.Convert.FromBase64String(Request.Form[\"password\"]);    " ascii fullword
      $sa2 = "<% }if(Request.Form[\"password\"]!=null){" ascii fullword
      $ss1 = "MethodInfo method = type.GetMethod(\"exec\");" ascii fullword

The challenge is that the prefix might not always be obvious. For instance, is it "s," "sa," or "ss"? It's not always straightforward to apply consistent line breaks.

Alphabetically sort metadata.

I wouldn’t recommend sorting metadata alphabetically. While it might seem neat, it’s more important to order metadata by relevance. Just as on an ID card or CV, the key details should appear first. For example, description should come before anything else, and less critical information like hash values can go towards the end. Sorting alphabetically would place less relevant fields like author at the top, even though they are not essential for understanding the rule.

With this you could define a regex that must match the rule name. Some places encode things in the rule name and having a way to enforce it in the linter would be nice.

While my style guide provides recommendations for naming YARA rules, I’m hesitant to enforce a strict naming convention. Internally, our team does apply such standards to maintain consistency in our rule sets. However, outside of a specific team or context, I’d prefer to offer guidance rather than impose mandatory enforcement. Consistency is valuable, but flexibility also has its place.

@tlansec
Copy link

tlansec commented Oct 14, 2024

Hello!

I think that for any formatter these things can be optional, so even if some people might not use them, I think having the option to use them isn't necessarily a bad thing. To give some rationale for some of the points discussed above:

Alphabetically sort metadata.

As in Python (isort), this makes it easiest to figure out what metadata is or is not present, and means I can keyword scan down the left. e.g. I'm looking for an "author" field, I'd quickly find its position regardless of who wrote the rule.

operators at EOL vs beginning

I think it's OK either way. The main thing for me for with this is how easy it is to comment out parts of the condition to test the rule without them. Perhaps due to muscle-memory I find the EOL operator easier to read. In the example you listed where the rule consists of a series of and clauses anyway, I would argue its easier to read with the operator at the end of the line as you can just scan the left hand side -- you'll find that often rules are written this way. If you are mixing and and or you'll need brackets for the rule to work anyway, in which case the brackets do the leg-work in terms of understanding how the rule works.

Cheers,
Tom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants