A big part of FCS analysis time is spent in GC #13039
Replies: 8 comments 14 replies
-
I'd encourage a focus on performance, not allocations. Allocations happen, they are not necessarily bad. Often the gains come from avoiding running code at all more than from removing allocations Historically the biggest gains by @TIHan have been from removing transient LOH allocations. Those are really important to find and nail down. |
Beta Was this translation helpful? Give feedback.
-
I agree that performance in general should be a high priority. Allocations aren't bad until there're too many of them. We often see about 30% of time spent on GC and that does indeed look bad. It's tens seconds to minutes that our users have to wait for analysis to finish in an IDE.
Yes, that would improve analysis time, but if the remaining code still allocates too much, it'd still spend a considerable amount of time doing GC. I think we should tackle both of these things. @dsyme I agree that in most cases allocations aren't that bad, but in case of FCS analysis we'd like to cut down every wasted second as we expect the analysis to be as fast as possible (as our users do as well). Could you reconsider the importance of doing less allocations? I think we should try to not ignore this problem. |
Beta Was this translation helpful? Give feedback.
-
Here's another recent snapshot, where GC spent 26%, about 6 seconds out of ~22 seconds:: It looks similar to this in the most of snapshots we look at. |
Beta Was this translation helpful? Give feedback.
-
One thing that I'd like to add is that I believe node should move away from async to some custom lightweight statemachine type. I tried to use a funky () -> Task<'a> type instead of async and it really improved compiler stacks eg. from this: to this: It's impossble to see what is going on because of async's bindings everywhere. I couldn't get this () -> Task thing to work properly tho because of CancellationTokens and async aware behaviour of existing code. But I believe custom type should fix this problem. Also, there are places where F# is emitting ETW events and even tho event could be skipped, there is still string concatenation and allocations. Many of the allocations in compiler come from Lists, Tuples and Lambdas/Closures. Maybe a custom allocation-less seq builder can help. Also, array and list equality using core "=" operator also allocates. A custom equality operator for internal usage might be better. |
Beta Was this translation helpful? Give feedback.
-
@auduchinok The point I'm making is that it's not allocations that matter, it's performance. If we find places where reducing allocations causes better performance that's great. But don't take the approach that reducing allocations is a good in itself - there's a long history of trying to do that in the compiler and it rarely gave measurable benefits (because Gen1 allocations and GC are rarely a bottleneck problem), and plenty of times the allocation reductions led to either more complicated code or more copying or less data-structure-sharing. LOH allocations were a significant exception to this rule, plus of course "stupid" allocations in tight loops, @En3Tho Yes agreed. https://github.com/TheAngryByrd/IcedTasks shows how to define async-like cold-start tasks that pass cancellation tokens explicitly. Likewise synchronous code that passes cancellation tokens. We should make an internal copy of this that gives the better debugging for async/cancellable code. |
Beta Was this translation helpful? Give feedback.
-
On (imm)arrays:
Other:
I don't agree with only looking for bad cases and then optimizing those. I think there are a lot of classes of performance hygiene which do not increase complexity or result in significantly more copying and less sharing. |
Beta Was this translation helpful? Give feedback.
-
I agree with the sentiment @auduchinok and others present. While I completely agree that allocations per se are nothing bad, I think everyone who mentions the goal of reducing allocations means speeding up overall performance by reducing the total time blocked by GC. As mentioned in #12526 I think working on a framework&CI for benchmarking the FSharp.Compiler.Service codebase would be very useful mid- and long-term. It would help support these kind of discussions where there is no obvious answer and back them with easily-available objective data. Naively and without any results to prove it, I would think that using more optimized (time and memory wise) data structures widely in the codebase could have the following benefits:
|
Beta Was this translation helpful? Give feedback.
-
Here's the issue in performance repo dotnet/performance#2457 |
Beta Was this translation helpful? Give feedback.
-
I'm looking at various perf snapshots and seeing that in many of them about 20-30% of the time that FCS takes to analyse a project graph is spent in GC. Here's the last example, GC takes 41 seconds:
Could we work on finding a systematic approach that would allow us to reduce allocations where possible? Possible things are:
Some of these things aren't very idiomatic in a functional language, but when talking about compiler and editor analysis engine, I think a better performance should matter more than being completely idiomatic.
Beta Was this translation helpful? Give feedback.
All reactions