Changes tree and such

Jump to bottom

Constantine edited this page Feb 8, 2017 · 5 revisions

This pages describes how changes to file are presented in internals of Spade. Changes describe basic actions (insert, replace and erase) that have been done to files in a project. The reason one might need such a tree instead of simply writing changes to file immediately is because other than having a nice undo tree we don't want to or might not be able to write to files directly. Changes can be applied to data starting at certain node -- both backwards and forward in time.

Changes are represented as a tree. Each change is a child node of each previous node. There are 3 atomic actions each change can be, those are: insert, replace and erase. Each change carries some binary data with certain meaning (explained down below). Summing up, each change consists of its hash, parent change hash, position (relative to parent change) where change happened in file, its type (insert, replace erase) and change data with some kind of meaning.

Three types of changes

There are 3 types of actions that spade works with. Those are:

insert
replace
erase

insert is an action of inserting data. data field in a change carries is data that is being inserted. Example: "\x00\x08\x0f" insert of bytes "\x01\x02" at position 1 gives us result "\x00\x01\x02\x08\x0f".

replace is an action of replacing data with another data. The data carried in a change is bit-difference (xor) between data that was there and new data that has been put. Position change happened at may not be further than length of file - length of data replaced. Replace data may not be longer than length of file it is being applied to. Example: "\x03" replace with "\x01" yields us data "\x02" (xor between 3 and 1). Replacing "\x00\x01" with "\x00\x01\x02" is illegal and so is an attempt to replace anything beyond file boundaries: "\x00\x01" with "\x01\x00" at position 1.

erase is an action of erasing data. It acts pretty much as insert, but reversed, but data field for erase carries data that has been removed instead of inserted.

Looking up path between two changes

During work on a file tree may grow large and it will be mainly left-weighted. Sometimes user might want to quickly switch from one change to the other so we need to find a path in history from one change to another or apply all changes from the beginning. While applying changes from file beginning is pretty straightforward, it is rather slower than the first way in many use cases. So far the only case it is suitable is when we are switching from one long branch to another short one where distance between our change and that branch is larger than distance between file's head and that branch.

Since the last topic is now cleared up, let's get to the path finding point. In a tree there always is exactly one path between two nodes. We have two ways of finding a path between two nodes: first one is to take one change, do depth search for second change from its point, if we fail to find any, we go one level up and repeat. This is fairly expensive and requires loading of whole tree in memory though at expense of just one SELECT query. The second way is to take both changes, trace their parent nodes up to root and see which nodes they share. The first node they both share that we find is the one that is the point we turn around in tree and start going towards the target change. We are going to use the second method because of its simplicity and it can be achieved at expense many small SELECTs queries which is fine in our case, since our database is local.