-
Notifications
You must be signed in to change notification settings - Fork 2
Representational Limits
Faunus elements, vertices, and edges all implement Hadoop’s Writable
interface. Certain assumptions are made in order to ensure an efficient representation while at the same time supporting what is expected to be the most realistic use cases. This section outlines the representational assumptions/limits currently in Faunus.
Note that these limits are theoretical and that physical limits may be reached well before theoretical limits are hit. For instance, one important physical limitation to be wary of is the fact that a single vertex is represented in memory. A vertex is its id, properties, incident edges, and path information. Any MapReduce job making use of FaunusVertex
must be able to hold the entire vertex in memory.
-
id: a vertex id is a positive
long
value and therefore, a graph in Faunus can not have more than 9,223,372,036,854,775,807 vertices. -
properties: the size of the properties map is denoted by a positive
short
and therefore there can not exist more than 32,767 properties per vertex. -
edges:
-
unique labels: edges are indexed by their label using a
short
and therefore, there can not be more than 32,767 unique labels for the incoming (or outgoing) edges of a vertex. -
total edges: the edge size for any one label is represented by an
int
and therefore, for any direction and any label, there can not be more than 2,147,483,647 edges.
-
unique labels: edges are indexed by their label using a
- paths: See note below.
-
id: an edge id is a positive
long
value and therefore, a graph in Faunus can not have more than 9,223,372,036,854,775,807 edges. -
properties: the size of the properties map is denoted by a positive
short
and therefore there can not exist more than 32,767 properties per edge. - paths: See note below.
NOTE ON PATHS: When traversing a graph, it is important to know how many traversers are at any one element (vertex or edge) at any step in the computation. For most computations, this is simply a count and in fact, Faunus represents the number of traversers at an element with a long
value. However, there are some graph traversals that require not only how many traversers are at a particular element, but for each traverser, what elements have they touched prior to the current element. This representation can not be captured by a single long
. Instead, a list of a list of long
ids is used to represent such historic path information. For traversals that make use of path histories, it is very easy to run into space and time issues. There are few steps that incur this type of cost and they are articulated in Gremlin steps.