-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
My two cents #1
Comments
The idea is actually to have something that does not allocate anything apart from the TCP connection structure itself. No strings would be stored, but would be available as some kind of input range. It may end up having to make little compromises in that regard, but anyway that would be the goal. But the high-level API would then indeed be built with the goals that you outlined (which is already the idea for the current implementation, but there are still a number of exceptions).
Enforcement is not the goal here, if you want you can of course always override the safety measures. The real idea here just is to make it (close to) impossible to make accidental mistakes that open up security holes. As long as we rely on the GC that is not so problematic, but the moment we start to do manual memory management, something like DIP1000 or an approximation thereof using API design facilities is essential in that regard. The main problem is that using IMO, the tradeoff of this implicit approach is pretty good, in the sense that it makes it seamless to escape request data to a background task or for "permanent" storage, for example in an instance or global variable of some kind, while still keeping the risk of accidentally producing a performance hazard low (the request data actually has to outlive the handler, just copying it around will not do any harm). The reference counting is thread-local, so while the performance implications need to be observed carefully, it's unlikely to have a big impact. Like for the current issue with If the automatic approach turns out to be harmless w.r.t. performance, I'd pick that, to avoid the rift that is otherwise created for all the existing code. Getting people to make such big upgrades is sometimes rather difficult and even now there are still many that for example use the 0.7.x vibe.d branch due to the mild breaking changes that 0.8.x introduces.
The thing to keep in mind is that if the templated stream is handed over into user code without a proxy of some sort with a fixed type, then the number of combinations may still amplify there. But apart from that I agree, these combinations will only be there once for the whole application, so like this it's nothing to worry about yet.
True, the high-level API will need this in all likeliness. A possibility for the low-level API would still be to either invert control or to define the request handler callbacks as template alias arguments to let them accept different types for those streams. This may be a good candidate for a quick benchmark experiment. (I really disagree with the "only dump programmers make mistakes" philosophy - reality simply tells a different story, namely that such mistakes are made frequently in practice, and in the end it doesn't really matter how good the programmer was. Since the implications of making such a mistake can be disastrous for public services, I'm convinced that we should do everything that we can to mechanically make it as difficult as possible to make them.)
I'm not really happy with the outlined draft either. The idea is just to enforce a proper order one way or another, because such mistakes surfaced multiple times in the project's history. But I'd value the general usability of the API higher than statically enforcing this, so if there is no nice solution to it in D, I don't mind letting this requirement go.
Another possible approach could be to use the return value for the response, as some frameworks do (and also the |
This is pretty much impossible due to HTTP. No apriori-known length, tonns of ambiguity in the spec, streams. State is required, it's size is dynamic. State lies on the heap or fiber stack (wich is on the heap as well). Allocation is imperative. But it should be efficient and non-excessive.
Reference-counted variables have reference-counted wrapper type. It will break user code just like the 'scoped' attribute or RequestScoped wrapper will. You are either explicitly reference-counted, or you are in GC realm. Or are you calling GC implicit?
If it's thread-local, you just introduce different front for mistakes while thinking that you are reducing it. Also, think about this example: And GC will be faster, it's ought to. GC pauses for whole process took a small fracture of a time RC inside vibe-d took in that test we talked about.
I agree, it should be proxied in external API.
All three approaches (GC, scoped, RC) are practically safe and make it nontrivial to dereference incorrect pointer. All three allow to shoot yourself in the foot in D. Last two of them require user code rewrite, the first one is implemented currently. RC problem domain is the situation when object lifetime is unknown. That is not the case for a lot of state in HTTPRequest\Response. All stream cascades die when request is served. It doesn't matter if you RC them, or place them on GC - no data should be read from the socket or written to it after the request is served. Lifetime is strictly defined and static. Their memory lifetime, if guarded by RC, just creates the illusion that this object makes any sense even after the request is served, wich is a lie. The fact that their memory can still exists due to RC is a logical mistake on it's own. If the type system\language support is not to your liking, maybe descriptor approach will satisfy you. Just for example, I dunno: struct req_id
{
package void* ptr; // used internally by the framework
const ulong id; // additional safeguard to make sure the same address is not occupied by another request after some time
}
// returns GC-allocated copy of the actual header value wich lies on task-local buffer
string getRequestHeader(req_id req, scope string key) @safe;
bool setResponseHeader(req_id req, string key, string value) @safe;
size_t writeResponseBody(req_id req, scope ubyte[] data) @safe @nogc;
// finalizes request, after this call this req will always throw in any framework functions
void finalizeResponse(req_id req) @safe; |
While stack allocated memory is technically also dynamically allocated, in reality it just reuses the same fixed buffer for each run, so I'm excluding that from the list. Apart from that, only the lower level stream types that are defined in terms of other libraries (TLS and ZLIB) have uncontrolled dynamic allocations. Everything else, at least at the basic HTTP protocol level gets away with small fixed size buffers. Only higher level concepts may need to store dynamically sized data that would then simply be transient in the low level API.
Consider this:
With reference counted structs, the code above could stay exactly like it is, while with
Almost everything in vibe.d is thread-local and it's part of the type system to disallow moving thread-local data between threads (even if Druntime and Phobos unfortunately provide some means to circumvent this, but vibe.d actually enforces it as far as possible). This is mainly a question of performance and having to synchronizing everything would simply yield unacceptable performance.
You can simply use a pay-for-what-you-use approach here. The strings would initially reside in a single buffer. But once (and only then) a string is still referenced after the handler is left, it would be re-allocated individually, while the request buffer gets freed. I have to admit though that D's movable struct semantics make this less efficient that it could be, which could be a deciding factor. Regarding the safety of the three approaches - the GC approach is in theory perfectly safe (apart from a few practical holes), and you are right that the other two approaches require additional type system support (DIP1000) to really be safe. But it simply is my assumption that this support will materialize over the course of this year, so the RC approach would be more backwards compatible and reasonably safe for now, while the But the ID based approach will most likely be slower than the RC approach, because of the additional indirection and the need to manage IDs (while they were for free in the form of an anready allocated pointer in the other approaches), and then it still escapes memory just like the other approaches. The moment you receive a reference to the underlying data (e.g. |
void handleRequest(HTTPServerRequest req, HTTPServerResponse res) {
res.writeBody("Hello");
}
Escaping either request or response object is not normal, it is rare case. Escaping the string from the header of 'req' is however pretty common. Do you plan to escape it as RC-string? Because if yes - it will break interfaces (it's not a string type anymore, at most it will be a string subtype), and it will have separate semantics for duplication and scope-checked usage. If not - then it is once again a mystery for me, why wrap objects like streams wich rarely (never) escape, and not wrap request body or headers, wich are common to escape\outlive request.
Remember "the truth about shared"? It's still very applicable. =)
Does not sound possible with D structs, nor does it sound feasible. Were you meaning, that when response is finalized, all structs that wrap the escaped object, on all stacks of all threads and fibers, and every instance of such reference on the heap get the updated reference? Sound really bad. And if you make every such reference a class, you get mandatory double dereferencing on every usage of every string. Such object should have been GC-allocated from the start.
My point was to always escape only gc-allocated copies, but I do not like it just as much as you do, it's bad as well, I agree. |
Agreed.
Yes, it would have to, and would be implemented as a sub type (i.e. still not safe until DIP1000 is there). But you are right, this may be a critical flaw of this approach - either the conversion to
Yes, So the essential property is just that as long as you don't (ab)use the holes in
It certainly is possible and feasible, but I would tend to agree that it's probably not possible to actually make it perform better than GC allocating in the first place. A trivial way to implement it would be to use an ID based approach (IDs staying alive until all references are gone, but point to different memory after the handler leaves). Alternatively, keeping a doubly linked list of all references would work in C++, with the advantage that there are no additional dereference costs, while the reference copy cost increases. Doesn't work in D, though, anyway. So summing this together, the most promising approach so far appears to be to rely on |
Agreed. |
I'm been working on the approach, at least in my application code, to try to make all allocations request local and use DIP1000 to make sure nothing escapes. To me it sounds very good, at least in theory, all allocations, both library and application, were request local. If all the data of a request was allocated on a stack buffer or thread local buffer in the form of a static array and the use it as a bump-the-pointer allocator it should be very efficient. When freeing the memory the pointer/counter just needs to be reset to its initial state. Using To me I think it would be very rare to escape any request data and I would be fine by explicitly copy the data in the few cases it would be needed. If significant performance improvements can be show, I think people would be ok with breaking the API, i.e. need to add |
In continuation of our discussion here: vibe-d/vibe-core#48
Well, it can't be completely avoided. I'd rather formulate in like this:
I'm strongly against implicit copying. Marked with scope Non-copyable HTTPRequest and HTTPResponse structures that implement Stream interfaces themselves (no easily-accessable socket leakage this way, so no refcounting needed under the hood) as parameters to user callback, all "dangerous" fields and properties return something like
You cannot force user to write memory safe code. D does not support this. No, DIP1000 + "@safe" is not an enforcement. When something cannot be enforced, it is better to provide convenient and non-encumbering conventions, documentation and examples.
encoding - transport
encoding - compression - transport
encoding - compression - encryption - transport
Looks rather modest, templates should not be that terrible.
Let's say dynamic dispatch between streams in the pipeline was collapsed, but this is the end. You need dynamic dispatch for HTTPInterchange.BodyReader\Writer, one way or another. Class will allow socket leak (again, only if you think a programmer is a baboon. Seriously, there is no hope if you escape a variable of heavily-documented type scope RequestScoped!BodyWriter). Or you can emulate VFT in a struct through function pointers, wich will keep uncopyability, but will look questionable.
I also don't think unified HTTPInterchange object is something good. Chained calls with staged callbacks:
I think the current form of HTTPServerRequestHandlerS callback is very nice.
The text was updated successfully, but these errors were encountered: