-
Notifications
You must be signed in to change notification settings - Fork 96
Email aggregation
In order to cut-down on the amount of e-mails Hilary sends the following is proposed:
The user can declare an e-mail preference. His options are:
- Don't send me e-mails
- Send me a daily aggregate overview (if there is anything to send)
- Immediate
The first two are fairly self-explanatory, but the immediate one might not be super obvious. When this option is selected, the user will receive an e-mail shortly after the activity occurs, but a slight delay might be introduced.
Each activity type is able to declare how long the email aggregator should hold off on sending e-mails via the ActivityAPI.registerActivityType
.
The aggregation rules are as follows:
- The first activity determines when the next e-mail will be sent out
- All activities between the first activity and the moment the mails gets sent, should aggregate
- An e-mail can contain multiple activity types
In all the examples, there are 2 configured activity types
- comments (green)
- content-shares with a longer aggregation time (blue)
For example, imagine the following situation:
- 7 activities occur
- 3 comments
- 4 content shares
Each increment on the t
-axis is a possible point where Hilary will send e-mails. The lines above the axis are activities. The width of the activity determines how long it can aggregate with other activities. In the above example, you can see that Hilary will send 2 e-mails.
- At
t = 4
- One aggregate containing 4 content-shares
- One comment activity
- At
t = 6
.- One aggregate containing 2 comments
- 4 activities occur
- 1 comments
- 3 content shares
Because it's the first activity that determines when the next mail point will be, 2 e-mails will be sent out:
- At
t = 3
- One comment
- One content-share
- At
t = 6
- One aggregate containing 2 content-shares
Much like activity aggregation, e-mail aggregation would run on a time-based interval. This would be a relatively small interval, e.g.: every 5 minutes
The data needs to be stored in such a way that multiple nodes can perform email aggregation/sending. If we consider loosing e-mails when a machine drops out acceptable, we can probably get away with storing this information in Redis.
-
Who we need to send an e-mail at point t:
oae-activity:mail:#bucket:#t = set(users who we need to mail at t = #t)
-
What an e-mail for a user should contain:
oae-activity:mail:#bucket:#t:#userIdA = set(IDs of activities that need to go into this mail) oae-activity:mail:#bucket:#t:#userIdB = set(IDs of activities that need to go into this mail) ..
-
What the next e-mail point is for a user
oae-activity:mail:#bucket:next = { userIdA => 5, userIdB => 7, ... }
When an activity gets delivered to an activity stream that requires an e-mail we need to do the following (in pseudocode)
queueMailActivity = function(userId, activity):
// Get the mail preferences for this user as it might not be necessary to queue anything
mailPreferences = MailPreferencesDAO.getPreferences(userId);
if mailPreferences.never:
// We're done here
return
// A user always goes in the same bucket
bucket = _getBucket(userId);
// Determine if we've already scheduled a delivery for this user
var nextScheduledDelivery = Redis.get(oae-activity:mail:#bucket:next[#userId])
// If the user already had an activity scheduled, we can add this one in the set
if nextScheduledDelivery is defined:
Redis.insertInSet(oae-activity:mail:#bucket:#nextScheduledDelivery:#userId, activity.id)
// If the user had no mail scheduled yet, we need to figure out for when we should schedule it.
// This depends a bit on the mail preferences
else:
// If the user wants his mail "immediately" and this is the first activity that triggers an e-mail
// we need to get the timeout from the registered activity type and schedule delivery in that timeout
if mailPreference.immediate:
mailTimeout = ActivityRegistery.getActivtyType(activity.type).mailTimeout
nextScheduledDelivery = now + mailTimeout
// If the user prefers daily aggregates, we schedule mail delivery for a fixed point during the day
elif mailPreference.daily:
nextScheduledDelivery = 0
// Schedule the user for mail delivery at that point in time
Redis.insertInSet(oae-activity:mail:#bucket:#nextScheduledDelivery, userId)
// Add the activity for that user at that point in time
Redis.insertInSet(oae-activity:mail:#bucket:#nextScheduledDelivery:#userId, activity.id)
The email aggregator should:
- grab an unallocated bucket number
- get the set of users that need e-mails at that point in time
- Grab all the activity IDs for those users that need e-mail at that point in time
- Grab all the activities from the users their activity streams in Cassandra (this could be skipped if we decided to store the full activity in Redis in stead)
- Perform aggregation on the collected activities (as per existing rules)
- Format the e-mail
- Possibly embed any preview images (maybe out-of-scope here)
- Clear all the values from Redis which are no longer relevant, maybe we could use a Redis TTL (where appropriate) to avoid this?