Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement Lib/Set.hpp as a wrapper of std::unordered_set #551

Open
inpefess opened this issue May 8, 2024 · 9 comments
Open

Reimplement Lib/Set.hpp as a wrapper of std::unordered_set #551

inpefess opened this issue May 8, 2024 · 9 comments

Comments

@inpefess
Copy link
Contributor

inpefess commented May 8, 2024

Why?

And why have original basic container implementations instead of standard ones?

Why not replace Vampire's set with std::unordered_set?

Many more changes at once. This issue proposes to begin with only one file (Lib/Set.hpp).

We have Vampire's set rewritten as a wrapper around std::unordered_set, so what?

We can replace the wrapper's methods calls with standard ones of the wrapped set case by case wherever they appear in the code (some parts might need more effort to refactor than others). Ideally, this process ends with a total replacement of the original implementation by the standard.

We can also encourage new code to use standard sets.

If such an approach works, we can do the same with other containers.

Why start with sets?

We already have unit tests for this class.

@inpefess
Copy link
Contributor Author

inpefess commented May 8, 2024

I can work on it if there are no objections against the very idea.

@mezpusz
Copy link
Contributor

mezpusz commented May 8, 2024

I'm not against the idea, just wanted to note that Vampire's Set is closer to std::unordered_set than std::set to my knowledge, using hashing rather than red-black trees/heaps/whatnot. Also note that we use open addressing, and STL could be using something more sophisticated (and more efficient). But in general, especially in critical code such as in TermSharing, we should be careful with the change.

@MichaelRawson
Copy link
Contributor

I am very excited to see this kind of change proposed! Thanks for filing this. If you are sufficiently brave to tackle this please go ahead and try it once there are no more comments here, otherwise I will do it at some point.

It may be that either

  1. It is not possible to use std::unordered_set to implement Lib::Set because of the interface and how it is used.
  2. We lose performance this way.

The former can probably be solved by changing how we use sets on a case-by-case basis. The latter I could forgive if the penalty is minimal, as in my experience the straight-line speed rarely translates to many solved problems.

Another thorny aspect may be the Hash parameter, which is used somewhat inconsistently. This can and should be cleaned up anyway.

@JakobR
Copy link
Member

JakobR commented May 8, 2024

When I last tested std::unordered_map, it was quite a bit slower than Vampire's DHMap, at least on my machine. I would assume the implementations for sets are similar. (It seems to be a common criticism of std::unordered_map that it is relatively slow because of some API requirements of the C++ standard. But I don't remember the details.)

@inpefess inpefess changed the title Reimplement Lib/Set.hpp as a wrapper of std::set Reimplement Lib/Set.hpp as a wrapper of std::unordered_set May 8, 2024
@inpefess
Copy link
Contributor Author

inpefess commented May 8, 2024

Great! Thank you for your comments. I will give it a try.

@MichaelRawson
Copy link
Contributor

A solution if std::unordered_map turns out to be too slow is to rewrite the container as simply as possible. We only need one set/map structure, not two, and I personally doubt double-hashing (the "DH" in DHMap/DHSet) is worth the extra complexity.

@inpefess
Copy link
Contributor Author

inpefess commented May 9, 2024

I see at least one non-standard method, rawFindOrInsert

* Checks whether a value with a given hashCode is in the map.
, giving a user a surprising amount of liberty (to use different hash functions for different items stored, for example). Happily, it appears only in Kernel/Term.cpp (e.g.
Term::termHash(function, [&](auto i){ return args[i]; }, arity),
), and I don't see any per-item hash functions there. I assume this issue to be blocked by that. I will propose a possible refactoring in a separate issue, e.g. defining a pertinent hash function at a moment of terms set creation instead of passing hash functions when inserting.

@MichaelRawson
Copy link
Contributor

MichaelRawson commented May 9, 2024

Fine by me - go straight to PR if you like, rawFindOrInsert is not ideal and I would be gladly rid of it.

@quickbeam123
Copy link
Collaborator

quickbeam123 commented May 15, 2024

This is definitely a nice exercise!

However, please try to approach this locally first (if at all possible - maybe focusing on an expected hot spot), so that you don't spend too much time on it, before we can compare the performance.

I wouldn't be too happy to accept a large refactor which on top of things slows Vampire down a bit, just to be closer to using standard classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants