-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ContentSteering #1172
base: dev
Are you sure you want to change the base?
Conversation
Impressive work ! Let's assume content steering priority switches from CDN A to CDN B while a segment request is pending on CDN A, and then segment request on A fails ... In that case, is the request re-started immediately on CDN B (and, if so, is the count of attempts on CDN B cleared ?) or is the request delayed according to its exponential backoff state on CDN B ?
I think you meant "monotonically" ;-) |
@lfaureyt Thanks!
I have to re-check/test that this is what's really going on but I would say that the count of attempts on CDN B for that segment (as exponential backoff is still per-segment) is not reset until the segment has been loaded. Also, when you begin to enter cases where CDN referenced in a steering manifest have failed at least once for a given segment, the algorithm behind CDN choice becomes a little more complex: the remaining backoff time is taken into account first, then the steering manifest prioritization (CDN unlisted in that manifest are still not requested). |
236e27e
to
6f24680
Compare
60a9f65
to
5923fbd
Compare
a91f71f
to
642425d
Compare
5923fbd
to
a973a07
Compare
1adfb5e
to
4101cde
Compare
a973a07
to
c80a4d1
Compare
c80a4d1
to
ad492cb
Compare
489f7a3
to
0ab12e7
Compare
c08e41d
to
787d37f
Compare
3dea214
to
4d08be7
Compare
fa598ec
to
80330a8
Compare
2e58dd6
to
cc6a502
Compare
e46c7d4
to
ca7b77c
Compare
ca7b77c
to
816af7a
Compare
420d99e
to
036ea44
Compare
036ea44
to
dbdde0f
Compare
391eb0c
to
15a55dc
Compare
15a55dc
to
78b1759
Compare
78b1759
to
aa1e9ef
Compare
Status: It should work with the current draft of the Content Steering specification for DASH contents. There are still some missing features (proxy handling, bandwidth reporting...) but the main chunk of the logic should already be there.
Preliminary notes
What is Content Steering?
Content Steering is a mechanism allowing to prioritize CDN over others from the server-side for a given content, allowing thus to deterministically reorient requests done by several player instances.
One of the use case would be to adaptively redistribute load between multiple CDN as playback is still going on in the users' device, though they are several other use cases that can rely on this mechanism.
This mechanism is standardized and is a associated with the streaming protocol chosen: HLS now includes a chapter and attributes on it and the DASH-IF is currently drafting another for DASH based on the HLS specification (though slightly different), here.
It is the latter that this PR is trying to implement.
The DASH' Content Steering mechanism work by declaring the presence of "DASH Content Steering Manifest", or "DCSM", requestable through an URL which returns a JSON giving the current priorities.
This DCSM has its own "TTL" (time to live) which is the time in seconds after which it should be refreshed.
Implementation
The implementation was unexpectedly pretty complex. I will start describing on a higher level before going down in the details.
Macro-architecture
The idea was to add a
CdnPrioritizer
class in thefetchers
' code, whose role would be to put in order the CDN that should be requested for each segment.That
CdnPrioritizer
would also handle the refreshing logic of DASH's Content Steering Manifest, through a new fetcher element: theSteeringManifestFetcher
.Here is how the different blocks depend on one another:
CDN identification
Different ways to access a content, what is called "ServiceLocations" in DASH' content steering spec (but what we abusively called the available "CDN" in the current implementation), need here to be clearly identified, to allow easy re-prioritization.
However in the old RxPlayer code, those ServiceLocations were not clearly identified and grouped:
Instead each segment was associated directly to one or several absolute URL, with no relation created between segments. For example, detecting whether 2 segments shared a common ServiceLocation/base URL was difficult to do without resorting to substring comparison.
This caused implementation difficulties when it comes to prioritization-handling and "downgrading" (our terms for when a specific ServiceLocation is avoided for some time due to an observed issue with it).
The proposed implementation now only associates a relative URL to each segment, corresponding to the segment's unique filename. The part common between all segments from a given
Representation
(the "ServiceLocations") are moved at theRepresentation
-level instead, through a property calledcdnMetadata
.As a special case, the segment's relative URL could be set to
null
or to the empty string when theRepresentation
's URL(s) found incdnMetadata
was sufficient to load the data.This only works if all ServiceLocations follow a logic of concatenation between a base URL per-ServiceLocation and a segment's common relative URL. Thankfully, it appears for now to always be the case in transport protocols where multiple ServiceLocations for a given resource is possible.
We also could have moved a property doing ServiceLocation-identification on each segment;s URL and keep them absolute, but it seemed less practical while I was writing it
The
cdnMetadata
property present onRepresentation
s takes the form of an array of all detected ServiceLocations. Each elements of this array contains information on a single available ServiceLocation:id
, used for identification purposes, for example when compared with the output of a Content Steering Manifest.This is based on the value of the
serviceLocation
<BaseURL>
attribute found in the MPDHandling of the
queryBeforeStart
attributeThe MPD may indicate that the Content Steering Manifest should either be requested before any segment or may be loaded later, so the stream can begin playback more shortly.
This is done through an MPD attribute on the
<ContentSteering>
element, calledqueryBeforeStart
.Handling this attribute has been somewhat of a pain, because its before-or-not nature under the current RxPlayer architecture would mean that it could not always be cleanly and opaquely done in the Manifest-parsing logic.
If the request needed to be performed after (or parallely to when) segments are first loaded, we had to involve some other core logic in this process of starting and handling this request.
I finally decided to only handle this initial fetch in one place (through the
fetchers
'CdnPrioritizer
) and not repeat it in the Manifest-parsing code, for simplicity's sake.Though I now observed a new problem: we had to communicate in some ways when the segments can actually be loaded:
queryBeforeStart
is not set or set tofalse
queryBeforeStart
attribute is set totrue
This could easily be done through a new event, but I disliked the opt-in nature of adding an event listener for this, as forgetting it was very simple to do and would be considered a big-enough bug.
What I preferred to do is to make the
CdnPrioritizer
's callback used to prioritize ServiceLocations between one another asynchronous: if the Content Steering Manifest was fetched or ifqueryBeforeStart
was not set / set tofalse
, it would return directly. But if bothqueryBeforeStart
was set to true and the Content Steering Manifest was not yet fetched, it would await that request to finish, before giving an educated answer.I prefer that solution because it opaquely forced the right "queryBeforeStart" implementation when a
CdnPrioritizer
is used to order ServiceLocations - this is even nicer when considering that theCdnPrioritizer
also is the class fetching and refreshing the Content Steering Manifest, meaning that forgetting to use it would also mean not relying on a Content Steering Manifest anyway.This also means that no outside block need to understand this intricacy: only the
CdnPrioritizer
does, which is also one of the [very rare] blocks implementing most of the Content Steering mechanisms.Handling of the refreshing logic
The refreshing logic of the Content Steering Manifest is also performed by the
CdnPrioritizer
.The implementation is somewhat simple: after the previous Steering Manifest's TTL (in seconds), we refresh it.
There is additional logic for if a
<ContentSteering>
appears or disappear after a MPD update. But what to do in those case appeared relatively straightforward.In huge parts because of this refreshing logic, I also had to implement a system of events on the
CdnPrioritizer
for the following events:a Content Steering Manifest request/parsing operation error arised, so it can be translated into a player event through our API. This is communicated through a "warnings" event
More importantly, a
priorityChange
event has been added, for when the order of priorities between ServiceLocations changed.This was added to work-around a subtle but complex-enough situation where the priority between ServiceLocations changed while the player is waiting to retry requesting a segment through another now non-prioritized ServiceLocations.
More details on the next chapter.
Request scheduling modifications
Another specificity to take into account was how the Content Steering mechanism interacts with our request scheduling logic, especially with what we call the "exponential backoff".
This concept designates the notion that we might want to wait a delay before re-attempting a request that previously failed on a server, progressively raising that delay after each consecutive unsuccessful attempt to avoid overwhelming the server.
When considering multiple server for each resource and - even more complex - when considering that the priority between those can change while a delay is awaited, properly handling this exponential backoff mechanism became a little more complex.
What I ended-up to do was to register in an object a per-CDN (monotonically raising) timestamp at which the last request was done for a particular resource, alongside the amount of attempts already done on that same CDN.
This way, exponential backoff could be applied per-CDN and even be interrupted and restarted at any time if the priority between CDN changed in the meantime. This change of priority is known of when the
CdnPrioritizer
sends thepriorityChange
event.Moreover, CDN on which the request fails are temporarily "downgraded" - meaning moved at the end of the priority list - for a period of time equal to the Steering Manifest's TTL (as it is specified in the DASH's Content Steering spec) - or for 60 seconds if no such TTL exists.
This also automatically allows to nicely test the second most prioritary CDN when a request through the first one fails, and still allows to loop over once all CDNs are downgraded.