Replies: 2 comments 2 replies
-
I wouldn’t have known you were French if you didn’t say so. I have no problem understanding what you’re saying any more than another English speaker, and I can guarantee that your English is far better than my French. 😊 I’ll share my thoughts, but I think we really need to get Andrew (@lums658) involved with this because he has far more experience than I with graph algorithms. He has made changes to the P1709 proposal around algorithms, but they're not available yet. #1 When you say “dijkstra should be considered like other traversals”, are you saying it should be a View like we’ve done for DFS & BFS? One thing we’re looking at is to have dijkstra_shortest_paths, and dijkstra_shortest_distances, where the shortest_paths allows the caller to get the shortest path, while shortest_distances doesn’t include that. He has also pointed out that if the path length is always constant (e.g. 1), BFS is sufficient. Is this along the lines of what you are saying? #2a: Abstracting values outside of the vertices and edges using a customization point is an interesting thought that hasn't come up before. For your idea, it’s specifically for algorithm use and doesn’t refer to the user-defined values. For that to be useful, there would need to be common functionality that many algorithms could use. A consequence is that increases the size of the API we would have to get approved, but it isn’t a deal breaker. Are there other things besides colors that might use that? #2b: “requiring the vertices of a graph to be consecutive integers starting at zero is too strict”. I’ve been advocating for sparse vertex id’s, even non-numeric id’s, because I have experienced situations where it would be useful. The reality we face is that the proposal is going to be very large, which will make it harder to get approved. Keeping the proposal focused on integral, consecutive vertex id’s helps keep it constrained while still giving value for the initial step. One of my side projects is the dynamic_graph, which I hope to extend to support sparse id’s (e.g. in a map) and string id’s to validate the design will support them. It will likely require algorithm specializations to accommodate them. My main concern is having something in the design that will prevent us from allowing the features in the future. That will allow us to make future proposals to allow them. #3: Andrew has said he’d like to have a BFS algorithm that can be used in different contexts, with the ability to define the queue that is used as a parameter. If that were available, I think that addresses your concern about priority_queue vs. binary_heap. For the P1709 standard proposal, I think we ought to use the priority_queue because it’s already in the standard. That doesn’t mean we couldn’t use your d_ary_heap, and it sounds like it would be nice if we could make it a default for our library, but it would have to wait a while before we could propose it for the standard. #4: You’re getting into the optimization of the algorithm with this. I think performance and optimization are important, but right now getting the API defined for all the functions is a higher priority for me. Assuming we can get P1709 approved, individual standard library developers may do their own implementation, even if we provide a full reference implementation (something I just found out). That doesn’t mean we shouldn't do what you say, but that it may not translate into the final implementations in the standard libraries. #5: A change Andrew has in process for dijkstra is to include parameters for combine (plus) and compare (less). He hasn’t included a concept for zero, so that may be of interest to him. I think he’s thinking in the same direction as you. I hope this makes sense to you, and that I haven’t strayed from your intent. I'll try to get Andrew to comment. |
Beta Was this translation helpful? Give feedback.
-
#1. You can't really do Dijkstra as a traversal in the same way as BFS since the same vertex can be visited multiple times in Dijkstra. You have to compute the solution to completion to have the actual shortest paths. One useful interface (for any path algorithm) would be an adapter that turned the predecessor array computed by a shorter paths algorithm into a traversal. We haven't contemplated such a thing to this point but now that you mention it... #2. We had this issue in Boost.Graph -- it actually came up quite a while after Boost.Graph was released, which seems to indicate that contiguous integers are actually by far the most common use case. There are a couple of approaches to sparsifying indices, and the best probably depends on specific use case. Most straightforward is to use an indirection vector and relabel vertex numbers in the graph. That retains constant time lookup, as opposed to, say, using a map. But again, context is important. We have generalized the notion of "attaching" properties to edges / vertices. In general best performance is in using implicit vertices and edges (they are just indices that are not materialized) and then look up properties in, say, an array. We created "property maps" in Boost.Graph to provide a uniform syntax for having properties as part of a node/edge data structure as well as having them as an indexed array with an implicit vertex/edge (vertex_descriptor / edge_descriptor). Using an array was generally much more efficient (this is the struct-of-arrays vs array-of-structs question -- struct of arrays is almost always more efficient -- particular on modern architectures and even more particularly with GPUs -- YMMV of course). And the property map machinery was extremely complex and confusing. #3. The interface for std::graph won't specify any particular priority queue for the implementation, so that a vendor could use something like, say, a Fibonacci heap (which tends to work well for Dijkstra's algorithm). There will also be a variant that allows the user to specify a custom heap type. Using algorithms available from the standard library makes development of a model implementation much more efficient. #4. Yes. #5. We haven't introduced the notion of a semiring yet, though that does seem to be a common generalization for many things in graphs and databases. The "GraphBLAS" characterize their graph algorithms in terms of operations over semi rings. It is also important to note that the standard library, although it does searching, sorting, inner products, and so forth, does not specify things like monoid or semi rings as actual abstractions at the algorithm interfaces. Making that machinery an actual requirement for specifying graph algorithms is probably not necessary. In terms of performance -- one thing that is an absolute non-negotiable principle is that generalization at the cost of performance is not acceptable and that users should never pay for what they don't use. Adding semi rings and so forth can be nice, but 99% of users (IMO) won't need that generality. Similarly, an interface should not be so general that it becomes more work to use it than to just write the algorithm yourself. These were some of the issues with Boost.Graph that were often reported. |
Beta Was this translation helpful? Give feedback.
-
Hi all, I've also been carrying a project of a graph library using C++20 functionalities here: https://github.com/fhamonic/melon. I spent much time implementing the Dijkstra's algorithm, hereafter dijkstra, in the seek of genericity and performance. I open up this to discuss API design and assess how my implementation can be ported to graph-v2.
Here are interesting points of discussion to me:
It seems to me that dijkstra should be considered like other traversals since it maintains a queue of vertices and pop a vertex at each iteration (by the way, dijkstra with unitary weights is exactly a breadth first search). The traversal algorithms may also propose to retrieve the depth of vertices I think so. Users may want to do computation using the distances between the vertices of the graph without requiring to store them (this is also suggested by a remark of Andrew in
dijkstra_clrs
: "Do we want to allow null distance? What about if both are null? Still run algorithm at all?").A problem I faced in my graph library is abstracting the way data can be attached to vertices and edges. I think this should be delegated outside the algorithm and be a customization point. For example, in
bfs_base
, colors are attached to vertices with a std::vertor of sizenb_vertices
and I think that requiring the vertices of a graph to be consecutive integers starting at zero is too strict. By the way, customizing the allocator of this specific vector can be interesting.Using std::priority_queue for dijkstra is not ideal. For the dijkstra algorithm the classical data structure used is a binary heap, which is exactly a std::priority_queue allowing to increase or decrease the priority of a vertex instead of adding an entry. This can be done by adding the requirement of the methods:
promote
, anddemote
together with a methodpriority
that returns the current distance estimation of a vertex. I have a good implementation of heaps of arbitrary degree (not only 2) using template meta-programming that would do the work. Selecting the degree can be advantageous depending on the CPU cache (on mine 8-heaps are performing the best). I suggest then importing thed_ary_heap
class of my library.For dijkstra we may want to access explicitly the inner ranges of the graph instead of doing
views::incidence(g, uid, weight)
. Indeed, if the edges range is acontiguous_range
or aiota_view
, we can prefetch the edges, targets and weights between theQ.top()
andQ.pop()
instructions.Q.pop()
is one of the most expensive operation of the Dijkstra's algorithm and there is plenty of time here to pefetch data (it is at least 10% free performances on big graphs). However I fear there is no standard way of doing it...The Dijkstra's algorithm works for other mathematical structures than shortest paths. For example, when the edges are weighted with probabilities, we can slightly modify dijkstra to compute the "most reliable path", that is, the path whose product of the probabilities of its edges is the highest. By the way, with the same kind of modification, we can adapt dijkstra into the Prim's algorithm for computing minimum spanning trees. This only requires specifying a function
plus
for composing the weight of arcs along a path, a functionless
for comparing the weights of paths, a constantzero
that makes identity when applied toplus
and absorbant forless
and a constantinfinity
that make identity forless
. This is called atropical_semiring
and I also suggest adding this in template parameters (or traits, along with the priority_queue type).I hope this isn't too messy and sorry if my english is a little awkward, I'm french lol.
Beta Was this translation helpful? Give feedback.
All reactions