Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On the GC vs noGC divide #34

Open
MarvinHannott opened this issue Jan 30, 2024 · 18 comments
Open

On the GC vs noGC divide #34

MarvinHannott opened this issue Jan 30, 2024 · 18 comments

Comments

@MarvinHannott
Copy link

I think this divide is completely avoidable if data structures just accept an allocator argument. That would also make it much easier to mix strategies. Sure, that might mean that builtin dynamic arrays and hashmaps have to go because they don't have a destructor. But to be honest, thanks to operator overloading, library solutions are just as ergonomic. In practice you don't feel any difference. And maybe it would be a good thing to have fewer primitives builtin.

@Connor-GH
Copy link

perhaps D operator new should have a default argument of the allocator.

C c = new C(); currently gets lowered to something like C *c = _d_newitemT!C;
C c = new C(); should get lowered to something like C *c = _d_newitemT!C(/*defaultAllocator: &GC.qalloc */);
specifying a new allocator (for nogc or embedded) would look like C c = new(C(), &my_malloc);. The syntax is to be debated on since new is a sensitive structure to change/overload in the language, but please let me know what you think.

Also, a global allocator would be set like the following: override systemAlloc = /* something */;. The structure of the /* something */ can be determined at a later date, but it seems that this would be a good use of overloading override.

@adamdruppe
Copy link
Contributor

Yeah, I've been thinking a little about this again too, what I'm leaning toward right now is a push/pop global allocator facility. I'll write more later, gotta run again.

@adamdruppe
Copy link
Contributor

out of time again but remind me later, the idea is you can push/pop things to work with an arena allocator while still being able to force the full gc for something you know will escape. emscripten is my goal rn, got a druntime running w/o a c library on plain linux too, but this is pretty soon on the list so ill write about it in... ikdk a week or two if nothign else comes up frist

@Connor-GH
Copy link

Would it be too much to ask to have an opNew or would this open up too many (broken) doors?

You technically can overload new on baremetal right now using _d_newitemT and such, but that isn't documented, like at all. You aren't meant to do it I believe.

opNew would be placed in class Object, and any class that wants to use new has to implement it. Otherwise it's a compiler error: opNew(T)() is not implemented for struct <struct name>. Dynamic arrays don't have to be a first-class language item for nogc, but it'd be nice if they were turned on if opNew is implemented in Object or something (this part is to figure out later).

@adamdruppe
Copy link
Contributor

opNew I say is a bad idea. D actually used to have it, and it was removed (this is the reason you could pass parameters to new, like new(x, y) MyClass(a, b), the x, y params went to opNew

void main() {
        Object o = new(5, 3) Object;
}

is a parse error nowdays, but try it with D1 and you'll get opnew.d(2): Error: no allocator for Object. Been so long, I think it was deprecated even before D1.0 but here's how you'd use it:

import std.c.stdlib;
class A {
        new(size_t sz, int a, int b) {
                return malloc(sz);
        }
}

void main() {
        A o = new(5, 3) A;
}

The new thing has a mandatory size_t argument, which is the compiler telling you how much memory it needs for the object. The other arguments optional. There is a corresponding delete you could overload too.

Basically a clone of C++'s thing, just without the option of a global overload. (D never does global overloads.)

All that remains of this in current day D is the ability to @disable new(); in an object.

(if you've never tried D1, i do kinda suggest trying it, not just for this historical curiosity, but also knowing how freaking fast it can compile. It is a nice little language. I like modern day D better, but that old D really wasn't bad.)

Anyway, this was removed - a decision i agree with - because it doesn't really work, and even if it did, the use cases are iffy:

  • If you don't pair new with delete, you'll likely leak memory. In C++, it is specified that new must be paired with delete, so this is pretty common. In D, this is not the case - in fact, D deprecated delete! You're not expected to use it at all. So it would be a big surprise if that rule was different for some classes but not others.

  • Overloading new on a per-class basis is kinda weird. How many cases can you think of where the class definition knows better than the usage point as to how it will be used, and thus how best to allocate it?

  • Suppose you do have a case like that. The per-class definition is still pretty limited to something that fits that new/delete paradigm. There's not much that really work here - doing malloc and free works, but the default new/delete do that same thing anyway (maybe even literally calling the same allocator!). So what do you gain from that? The only other real option available is some kind of object pool with a freelist. And indeed, if you ask C++ programmers how they use overloaded new/delete, they'll almost always say that's exactly how it is used, since that's just about the only practical use that fits the rules that provides a concrete benefit. (feel free to correct me if you have another use...)

Given those constraints, and D's general lack of delete calls at all, I agree with the decision to remove it since it didn't do much good and if used, had the potential to do a fair amount of bad (memory leaks, or GC use-after-free troubles when the allocator didn't scan it, etc.)

What we do instead in D is you can @disable new() to prod people away from that operator, then instead point them to a different factory function.

class A {
        @disable/*("Use `A.make` instead")*/ new();

        static A make() {
                return null; // FIXME: impl
        }
}

void main() {
        auto A = new A;
}
opnew.d(10): Error: cannot allocate `class A` with `new` because it is annotated with `@disable new()`

That commented string is not supported... but should be. I need to add that to the work list, but even without that, the disabled new at least prompts them to check the documentation, like for example here: https://opendlang.org/library/arsd.simpleaudio.AudioOutputThread.html the first note you see is "DO NOT USE NEW ON THIS" and tells you what to do instead. So the error message saying it would be nice, but at least the error message prevents you from making a mistake and the next natural place to check, the docs, tell you what yo do.

I'm pretty happy with this right now for the case where you want do something different for specific objects. (You might note it is fairly rare I actually implement some other thing, usually the answer is to just disable it in favor of a stack construction).

But what about places where you can't edit the source? You're right you can overload the _d_newitemT but that's not really supported, those interfaces can break at random. The documented way to do it is to implement a new instance of the GC interface, like this one:

https://github.com/opendlang/opend/blob/master/druntime/src/core/internal/gc/impl/manual/gc.d

GCs currently must be set before calling main and not changed after then (indeed, druntime frees its list of options), so it is kinda silly that it goes through a runtime interface, but that's how you'd do it for something like a bare metal target without memory to support the conservative GC. (Note that the conservative GC does work just find on bare metal, it only really needs malloc/free analogs and is otherwise internally pretty contained. OpenD's new version (FreeStanding) druntime already supports it like this too, the main thing it costs you is extra memory.)

I'm fairly happy with this too. You still have the trouble of not typically calling delete, but at least it is a global rule for your program that you know, not differing by random types that might have been overloaded by libraries, so you can handle it if you choose to globally use the manual GC... and it will be easier if you can swap in a pool...

.... because swapping it could be really useful. Consider a case like this:

string fancyProcess() pure;
/*
     // it has big implementation that does tons of internal work
     // but you have no control over this, you can't edit the source
     // and it might call other functions that you also can't edit
*/

void main() {
     writeln(fancyProcess());
}

Or heck:

void requestHandler(Cgi cgi) {
    // do stuff knowing none of it survives beyond this call
}

For these usage points, you know all the memory used in a block of code can be freed at once; it is a nice place to use an arena allocator of some sort. You set it at the beginning, then free everything all at once when you're finished, similar to automatic stack variables when the function ends. But since you can't modify the called functions, the most realistic way to do this is with some kind of global overload.

What I propose is something like:


void main() {
     import core.memory;

     GC.push(Arena()); // override all `new`, `~`, `[array]` etc  use to use the Arena object
     writeln(fancyProcess());
     GC.pop(); // go back to the previous GC, decrementing the ref count of the popped thing
}

When the Arena refcount goes back to zero, it can free everything at once, done automatically here on the call to pop.

Now, what if the function needs something to survive long term? They can push the main GC back on for a while:

void implementation_thing() {
    import core.memory;
    GC.push(GC()); // explicitly go back to the main GC for a while
    globalCache[x.idup] = new Thing;
    GC.pop(); // go back to whatever the parent had set
}

So that's your escape when you know something must have a long lifetime, even when the user might have overloaded it.

(Instead of push/pop calls, it might be a RAII thing, or a function that takes a lambda with the overridden GC present inside. This way it can automatically set up scope exit for you).

This would be useful both on a bare metal case and in regular programs where you need to adapt some existing code to the new usage. Downside is if a function does a global cache thing when you've overridden it and they didn't know so you end up with a use after free situation... that's why I said pure in that one function, but for the most part, you'd be doing this global thing at your own risk.

Have any thoughts on this?

@Connor-GH
Copy link

(D never does global overloads.)

This kind of implies then that it would either have to be either a compiler builtin or a keyword/trait/etc.

(if you've never tried D1, i do kinda suggest trying it, not just for this historical curiosity, but also knowing how freaking fast it can compile. It is a nice little language. I like modern day D better, but that old D really wasn't bad.)

wait..how much are we talking? It's far too late for me to go back to D1 for osdev but I am very interested in fast-compiling languages. dmd no optimizations currently compiles faster than or as fast as gcc on O0 (tested with a hello world program with dmd in betterC). I have some cases where D is actually faster to compile than C as well. If it in any way competes with tcc, I see it as a win. TCC should be the end goal.

There's not much that really work here - doing malloc and free works, but the default new/delete do that same thing anyway (maybe even literally calling the same allocator!). So what do you gain from that?

Syntactic sugar (and maybe some safety). e.g. MyStruct *ms = new MyStruct(6); vs the user-facing side allocating memory, fighting with pointers, etc. Would a random D programmer tell you how to manually allocate memory for a struct to be put on the heap? They'd likely respond with "why don't you just put it on the stack?", to which I say that these structs can get very expensive, very fast. Eventually, though, they probably will arrive at the right answer, but probably not from just one try.

The documented way to do it is to implement a new instance of the GC interface

This requires a class and an interface. This is a chicken-and-egg problem.

(Note that the conservative GC does work just find on bare metal, it only really needs malloc/free analogs and is otherwise internally pretty contained. OpenD's new version (FreeStanding) druntime already supports it like this too, the main thing it costs you is extra memory.)

Don't tempt me to do a garbage-collected kernel and write a paper on it, because now I really want to.

Now, what if the function needs something to survive long term? They can push the main GC back on for a while...

Now this is getting messy. In kernelspace, if something lives "long term", that means we never free it. You don't ever free the memory used by the various platform-dependent structures because they are constantly used throughout the system's life. Take this example from SerenityOS: https://github.com/SerenityOS/serenity/blob/6123113255b7aec0d7f9e81041c3229f534b49ba/Kernel/Devices/PCISerialDevice.cpp#L36

Downside is if a function does a global cache thing when you've overridden it

Also a threading issue, but I'm not terribly worried about this because I wrap things in locks.

@maxsamukha
Copy link
Contributor

Have any thoughts on this?

  1. Per-thread or global, or both (if the thread's stack is empty, use global)?
  2. Maybe worth taking a look at Jai's "context" thingy https://github.com/Ivo-Balbaert/The_Way_to_Jai/blob/main/book/25A_Context.md. They do a similar thing, but lumped everything (allocator, logger, thread id, user data, etc) into one place.

@adamdruppe
Copy link
Contributor

yeah, thread local, don't wanna have to take the lock to make the switch. (one of the biggest problems with the current GC is a lot of things take the global lock. steve and amaury upstream are working on changing that, im following their progress (it is on the SDC repository: https://github.com/snazzy-d/sdc/commits/master/ and ill pull here when it shows results too...))

that jai thing looks similar, druntime does a lot of that already too, just more scattered, keeping it close is a good idea. and he also does push/pop allocators so that's a good sign im not completely off track lol

@maxsamukha
Copy link
Contributor

(it is on the SDC repository: https://github.com/snazzy-d/sdc/commits/master/ and ill pull here when it shows results too...))

Yeah, I'm aware of that. It's nice.

that jai thing looks similar, druntime does a lot of that already too, just more scattered, keeping it close is a good idea. and he also does push/pop allocators so that's a good sign im not completely off track lol

It seems you can push/pop the entire context or just allocators. The macros also do poor man's RAII by automatically popping at scope exit.

@adamdruppe
Copy link
Contributor

wait..how much are we talking? I

My RTS game does a full recompile and link in 1/10th of a second in D1.

D2 can replicate this too; in fact I think most the slowness in a standard D2 build is linking Phobos rather than anything else. My current freestanding D2 program - which uses druntime mind you - has similar timing, 96 milliseconds, with ldc (no optimizations enabled). Though that's just a sample program and not a whole working game!

Still very promising and makes me think we can have those good times again. Of course, compiling and not linking something like arsd.minigui eats 580ms, so i don't think we'll ever get amazing but to be fair, minigui is by itself bigger than that old game... (the game about 15k lines, all D dependencies included except phobos, minigui is 54k lines including D dependencies except druntime (minigui does not actually require phobos!))

anyway, moving on, it isn't that important just ive never seen a D1 program compile more slowly than a D2 program and i liked it.

MyStruct *ms = new MyStruct(6); vs the user-facing side allocating memory, fighting with pointers, etc.

The alternative to a struct-provided opNew is a struct-provided factory, so the user side would look like MyStruct* ms = MyStruct.alloc(6); (or whatever you call it) and the documentation/error message should tell them immediately about this solution.

Don't tempt me to do a garbage-collected kernel and write a paper on it, because now I really want to.

do it! Actually I think a kernel providing GC as a service to all applications is potentially interesting for a few reasons - it has some global knowledge about available memory on the system, dirty pages for knowing what to scan, etc. to schedule it smarter than an application.

I think Microsoft Research did some studies about it.

In kernelspace, if something lives "long term", that means we never free it

Yeah, in those cases you'd similarly avoid the arena/pool...

@Connor-GH
Copy link

MyStruct *ms = new MyStruct(6); vs the user-facing side allocating memory, fighting with pointers, etc.

The alternative to a struct-provided opNew is a struct-provided factory, so the user side would look like MyStruct* ms = MyStruct.alloc(6); (or whatever you call it) and the documentation/error message should tell them immediately about this solution.

This, although not first pioneered in that language, is a popular choice in the Rust ecosystem. I really thought about this approach, but .alloc would be a static method, which means it wouldn't be able to autofree in the destructor. even if it could, RAII-style, why not then put the allocation in the constructor? This is a real-world example of a nogc struct that has to build from the ground-up: https://github.com/Connor-GH/xv6-public/blob/relics/kernel/d/kobject.d#L78-L125. inherit is just a cheap macro that does an alias this. This is it being used: https://github.com/Connor-GH/xv6-public/blob/relics/kernel/d/kobject.d#L152. new would provide some annotation here that "hey, this allocates" because that is very important in kernelspace. Ideally, I could do new without having to manually call delete, similar to garbage collecting. Except it acts more like lifetime analysis since the delete calls would be inserted at compile-time.

do it! Actually I think a kernel providing GC as a service to all applications is potentially interesting for a few reasons - it has some global knowledge about available memory on the system, dirty pages for knowing what to scan, etc. to schedule it smarter than an application.

I mean that the kmalloc implementation will be a garbage collector. Currently in xv6, userspace malloc calls sbrk, which calls growproc, which calls allocuvm, which calls kpage_alloc(), which, as you may guess, allocates a page. The more I think about this though, the worse of an idea it becomes: a large, fat guard in the way of every allocation, stopping the world sometimes to collect memory.....sigh. Yeah, scratch that. I can just deal with memory lifetimes but first I need the proper features in D.

@adamdruppe
Copy link
Contributor

I really thought about this approach, but .alloc would be a static method, which means it wouldn't be able to autofree in the destructor.

You can make the alloc function return a smart pointer or similar that deallocates the rest in its dtor. And yeah, once you have two pieces like that, it could indeed alloc in its constructor too, but since D has no default constructors, the named static method is more reliable for it. It works ok for something like your KArray which has to have elements to allocate anyway, but less reliable for smaller individual objects.

Destructors aren't called on pointers without either a wrapper or explicit call anyway.... we could prolly make scope Struct* s = new Struct; automatically call .destroy(s) at end of scope though, but the compiler is also liable to just move that scope new to a stack alloc anyway which you're trying to avoid, so again, not sure it is reliable for your use case.

I mean that the kmalloc implementation will be a garbage collector.

You might be able to make it work better using MMU protected pages, but yeah idk the kernel space is a different zone.

You might be able to make use of the "manual gc" which hooks new into your custom kalloc and depends on you to delete it though.

@Connor-GH
Copy link

D has no default constructors...

I didn't say a default constructor. I meant a constructor for structs in the sense of just this with parameters.

You might be able to make it work better using MMU protected pages, but yeah idk the kernel space is a different zone.

you overestimate my allocator severely. It literally just page maps memory up to 224M and userspace steals memory from the kernel as it needs to. (Not used pages but just free one. Yes, a rogue process could cause a kernel panic, but I could so easily make an OOM killer. The problem is that I don't have e820 memory maps, or 64 bit support for that matter.)

we could prolly make scope Struct* s = new Struct; automatically call .destroy(s) at end of scope though

This seems pretty ideal. I can implement destroy(typeof(this)) in KObject and have it get inherit-macro'd all the way up to other objects.

the compiler is also liable to just move that scope new to a stack alloc anyway which you're trying to avoid

If I say "new", I want this to be a heap-allocated object, period. The only exception is compile time objects. How hard could this be? I would probably be willing to implement it myself if the code around new isn't total garbage from ancient cruft gathering.

@adamdruppe
Copy link
Contributor

i don't think it'd be hard to change the definition of new to always call the allocator, but the optimizer might still inline and dead code remove things too. the problem is that is a breaking change to existing code so it'd be kinda iffy to change at this point.

@maxsamukha
Copy link
Contributor

One problem with new/delete is delete needs to be called if the constructor fails. D didn't do it automatically, and people didn't care to do it manually.

@Connor-GH
Copy link

So there's this:

Apparently registerGCFactory exists if you overload interface GC, but in order to register it, you need to pass an initialization function pointer that calls new on that class... it's a chicken-and-egg problem.

I did try taking this approach and making a manual GC, but it would not only be difficult, but also not easy to understand and also adds a lot of cruft.

@adamdruppe
Copy link
Contributor

The chicken and egg problem is not really a problem - eggs obviously came first, since many other animals can also lay eggs.

for example, the static-saurus was known to lay eggs in the CTFEassic period, long before the first chicken was around.

The malloc-gator, still around to our current run time, is also capable of laying eggs.

Though I'd note that there is already a manual GC, you can see it for yourself in the Museum of druntime core/impl/gc/manual. It wasn't functional at all until we did some repairs to its decayed structure but it looks like a pretty study skeleton now.

@Connor-GH
Copy link

druntime core/impl/gc/manual.

core.internal.gc.impl.manual

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants