Replies: 5 comments
-
Hi @plusvic, and thanks for the comment! I took the liberty of moving the discussion here, I did know about YARA-X yes, and seen you talk about Rust on twitter as well, but I hadn't look into it too much. For boreal, the only really missing part is performance improvements. Was the condition evaluation a big factor in the end in YARA? I might need to revisit this, but there's a lot of big gains to get in the string scanning implem with many open questions to answer. Do you know how you plan to implement this string scanning in YARA-X? |
Beta Was this translation helpful? Give feedback.
-
Regarding, the weight of condition evaluation in overall performance, it depends a lot on the rules. In the vast majority of rules the scanning phase out-weights the condition evaluation by a large factor, but in with certain rules that depend heavily on loops (even nested loops) the condition evaluation becomes the bottleneck. In VirusTotal we had this problem many times, and even had to forbid rules with loops like Regarding the plans for implementing string scanning in YARA-X.. the first important goal is fixing the issue that you solved with boreal too, not scanning the file if not strictly required. The idea is start evaluating the condition first, and triggering the scanning phase lazily, when the condition needs to know if some pattern is present or not. At that moment it looks for all the patterns at the same time as in the current implementation. I plan to use the same technique for string scanning, atom extraction + aho-corasick. But I want to introduce a few improvements, like eliminating duplicated patterns (something that also helps in our VirusTotal use case, as we have a lot duplicated patterns). I explored the possibility of using https://github.com/intel/hyperscan, but has some major drawbacks that make it unsuitable for YARA's needs. Some other goals for YARA-X:
In this discussion there are many other ideas for the long term: If you want to collaborate with YARA-X let me know, there are a few areas in which we could find ways for working in parallel. Also, an overall review of the code would be great, I'm new to Rust, this is my first project with the language and I would like to hear about other ways of doing things. And finally, and out of curiosity, what moved you to start the |
Beta Was this translation helpful? Give feedback.
-
I've being looking around and found this comment in the README file:
I'm curious about this because I'm using Also, thanks to |
Beta Was this translation helpful? Give feedback.
-
Thanks for the explanation, I can see how rules using I also had a look at hyperscan as I originally intended to try it, and I ended up dropping as well although I don't remember exactly the details :D It could be interesting to revisit this, it shouldn't be too hard to try and use it in boreal. It definitely looks like we share a lot of goals with our projects! I spent quite some time trying to find a design for slimmer modules, and originally went with one design that I dropped. I currently have a half working solution, but have some ideas on how it could be improved. It could definitely be interesting to share ideas on this. I also share almost all of the other goals you mentioned. I guess the main goal difference of the two projects is in compatibility with existing rules. I designed mine for full compatibility with YARA, even to a fault, so that it can be used in place of it with full confidence that it would work. I feel that this is needed to help people make the change, but it also brings all the existing issues of YARA into the project. From what I can tell, you plan to make YARA-X an evolution of YARA, that would remove some syntax or reject some rules, in order to improve it? It definitely makes sense after so much time working on YARA. Thanks for the link to VirusTotal/yara#1781, I did not know about it, there are a lot of great ideas in it. As for my motivations, I wanted to do a well-sized non trivial personal Rust project for some time, and YARA (which i used at work) seemed like a great project to rewrite. Non trivial parsing, widely used which gives a lot of testing opportunities, opportunities to improve performances, and a quite open subject to experiment on (how to scan for all the strings & evaluate all the rules as fast as possible). There is no plan to use it in larger projects on my end, it is a personal project, although my goal is to see it used by other people. I'll look into the YARA-X code in more details. It is a bit of a shame we ended up starting the two projects in parallel, given we have most of the same goals. Maybe there are opportunities to share efforts. Finally, about the mmap paragraph. There are a lot of issues in using mmap in Rust, BurntSushi mentions some here https://users.rust-lang.org/t/how-unsafe-is-mmap/19635/5 (the whole thread is interesting). This is the reason the |
Beta Was this translation helpful? Give feedback.
-
If I recall right, the main issue I found with Hyperscan was that it is designed for returning the ending offset of each match. It has an option for returning the left-most possible offset where the match starts, but you won't get all the possible offsets where a matching pattern starts, and that's essential for supporting Regarding backward compatibility, I didn't mention it, but that's also a goal for YARA-X. Without backward compatibility it would be very hard for users to adopt YARA-X, so my plan is achieving 99% backward compatibility with existing rules (API compatibility is not a goal, and the CLI won't be a drop-in replacement for the existing one). I won't pursue 100% compatibility, which means that certain rules can fail to compile in YARA-X or work differently, but only on extreme cases where existing rules are actually wrong or rely on implementation details. For example, in this thread we are talking about a case where the user was using About mmap... the curious thing is that yesterday I did some tests by modifying |
Beta Was this translation helpful? Give feedback.
-
@vthib, by following a link in YARA's repository I landed here and discovered this project. I don't know if you have seen https://github.com/VirusTotal/yara-x, but the goals are very similar: a complete implementation of YARA in Rust, that solves some of the pain points experienced by YARA users (like slow rules that should be actually fast) and developers (like the parser tightly coupled with YARA itself that disallows its reuse).
The approach however is different. YARA-X converts rule conditions into WebAssembly code, that in turns gets translated into native code. This makes conditions that rely heavily on loops much faster. The goal is that condition evaluation is 10x faster than the existing YARA on average. I see that you seem to have experience in Rust (more than myself, I'm gradually learning Rust with the YARA-X project), so if you would like to contribute to YARA-X in some way you are really welcomed. The project is still immature and unstable, but I've made a lot of progress already, and have implemented most of the condition evaluation logic.
Originally posted by @plusvic in #6 (comment)
Beta Was this translation helpful? Give feedback.
All reactions