Skip to content
This repository has been archived by the owner on Oct 7, 2022. It is now read-only.

Tracking changes

Nicolas Sebrecht edited this page Apr 1, 2016 · 15 revisions

Tracking changes

A syncer needs to track changes in order to know what was changed on either side and propagate them on the other side. E.g.:

  • new mail arrived;
  • new flag added.

In order to know what was changed, offlineimap makes a 3-way merge from

  • the current left state;
  • the current right state;
  • the latest sync state.

In imapfw, this is the role of the "state" backend to provide the "latest sync state" information which is called "local cache" in offlineimap.

In imapfw we intend to support more than one state backend. For example, the offlineimap action requires handling the legacy sqlite database. Mbsync compatibiliy requires to handle their own database. A backend storing the data within the Maildir would be great, too (e.g. in a maildir/mailbox/.state path).

Glossary

  • storage: database (Maildir, IMAP, etc).
  • driver, D: enables basic operations on a storage.
  • stateController, sC: controller for a driver to help syncing.
  • state Backend, sB: a driver for the previous sync state implementing support for one format.
  • current state: the data we get via a driver at some point in time.
  • previous sync state, PSS: recorded data of previous synced state serving as a base for the next sync.

The current architecture

To allow aynchronous I/O, imapfw relies on workers. The engine queries the drivers.


  {worker}                        {worker}                         {worker}
+----------+                    +----------+                     +----------+
|          |      (drives)      |          |      (drives)       |          |
|  driver  |<-------------------|  engine  +-------------------->|  driver  |
|          |                    |          |                     |          |
+----------+                    +----------+                     +----------+

Each driver is a "full flavored" driver, with all the chained controllers. The most important objects passed between the engine and the drivers are the messages imapfw/types/message.py.

Implementation design

If required, the Message objects can actually do more than just holding email data and metadata. The drivers are designed to provide the high-level Message objects. So, they are available very soon. They are already supposed to go from one end to the other.

Message objects have collection support via the Messages class.

The idea is to enable the Messages and Message object to support Python comparison either directly or by casting the types: While still subject to changes and discussions, more high-level type might help to compare messages:

syncMessage = SyncMessage(message)

Known pre-requisites for the state backend

  • Depends on the combination of both the left and right repositories. Changing of repository on one side is not expected (or must be handled right) because the "previous state" makes only sense according to both sides.
  • All state backends must share the same internal API so that introducing new backends is easy.
  • Backends are supposedly only about the underlying database format (text, sqlite, etc).
  • With a correct state backend we should be able to prevent from unnecessary remote requests.

Designs

  • sync-01, the 3-way merge approach (like in offlineimap)
  • sync-02, state worker and controller
    • sync-07, two state workers and controllers
    • sync-08, two log state workers and controllers
  • sync-03, one side state controller
  • sync-04, local state on each driver
  • sync-05, state for remotes
  • sync-06, no merge

Requires further thinking

  • Failures on writes for the drivers and the state backend.
  • Required internal APIs.
  • Events/messages overheads.
  • Synchronization locks and performance penalties.
  • Simplicity for each component.
  • Pro and cons for each approach.
  • Do we use uids? what about imap to imap syncs? what about non-imap drivers?
Clone this wiki locally