Skip to content
This repository has been archived by the owner on Aug 19, 2020. It is now read-only.

Representational Limits

okram edited this page Sep 22, 2012 · 5 revisions

Faunus elements, vertices, and edges all implement Hadoop’s Writable interface. Certain assumptions are made in order to ensure an efficient representation while at the same time supporting what is expected to be the most realistic use cases. This section outlines the representational assumptions/limits currently in Faunus.

Note that these limits are theoretical and that physical limits may be reached well before theoretical limits are hit. For instance, one important physical limitation to be wary of is the fact that a single vertex is represented in memory. A vertex is its id, properties, incident edges, and path information. Any MapReduce job making use of FaunusVertex must be able to hold the entire vertex in memory.


Faunus Vertex

  • id: a vertex id is a positive long value and therefore, a graph in Faunus can not have more than 9,223,372,036,854,775,807 vertices.
  • properties: the size of the properties map is denoted by a positive short and therefore there can not exist more than 32,767 properties per vertex.
  • edges:
    • unique labels: edges are indexed by their label using a short and therefore, there can not be more than 32,767 unique labels for the incoming (or outgoing) edges of a vertex.
    • total edges: the edge size for any one label is represented by an int and therefore, for any direction and any label, there can not be more than 2,147,483,647 edges.
  • paths: See note below.

Faunus Edge

  • id: an edge id is a positive long value and therefore, a graph in Faunus can not have more than 9,223,372,036,854,775,807 edges.
  • properties: the size of the properties map is denoted by a positive short and therefore there can not exist more than 32,767 properties per edge.
  • paths: See note below.

NOTE ON PATHS: When traversing a graph, it is important to know how many traversers are at any one element (vertex or edge) at any step in the computation. For most computations, this is simply a count and in fact, Faunus represents the number of traversers at an element with a long value. However, there are some graph traversals that require not only how many traversers are at a particular element, but for each traverser, what elements have they touched prior to the current element. This representation can not be captured by a single long. Instead, a list of a list of long ids is used to represent such historic path information. For traversals that make use of path histories, it is very easy to run into space and time issues. There are few steps that incur this type of cost and they are articulated in Gremlin steps.