Code Anatomy on TLFragmenter, TLFIFOFixer, TLWidthWidget and TLSourceShrinker #3084
DecodeTheEncoded
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
TLFragmenter
The TLFragmenter is used to fragment a larger scale transaction into smaller ones so that the downstream manager can accept it normally. Specifically, downstream devices may specify max transfer size it can accept, it will fail to accept any transactions whose request
size
exceeds the specified max transfer size. The reasons why devices specifies max acceptable transfer size are varied, maybe the devices doesn't support Tilelink burst transactions(in this case,minSize
of TLFragmenter should be equal to out portbeatByte
), or it failed by functionality related reasons(minSize
of TLFragmenter may be larger than out portbeatByte
in this situation). When connected such a device to a bus in which masters connected in the bus may initiate larger size request, a TLFragmenter widget will be needed.For a specific request, it will decide whether that request should be fragmented(
val aFrag = Mux(aOrig > limit, limit, aOrig)
), the fragmenter will divide the original request into a series of request with managers-acceptablesize
and send each of them downwards, some auxiliary info will be recorded to make sure all sub-chunked requests has been sent:For a specific request, the destination manager is decided(
val find = manager.findFast(edgeIn.address(in_a.bits))
) , corresponding max supported transfer size for a specific operation(decided byin_a.bits.opcode
) will be calculated, the fragmenter may choose that size or thelgMinSize
when thealwaysMin
istrue.B
to decided whether fragmentation is needed.alwaysMin
indicates we should always usealwaysMin
specified by configuration as the width of fragmentation(and also the criteria whether fragmentation is needed)instead of the max transfer size a destination manger can support for specific operations. Some key signals the fragmented maintains are as follows:aFragnum
aFragnum
encodes the fragmentation index in a subtle way, I feel amazed and meanwhile confused about the calculation details. It indexed a specific chunk among all fragmentation, andaFragnum
being 0 indicates the last fragmentation sending downwards. The dance amongold_gennum1
,new_gennum
andgennum
is esoteric and I will skip delineating them for now. Anyway, size of a fragmentation is multiples ofMinSize
, whileMinSize
itself is multiples ofBeatByte
. In a specific example documented in the codebase(max=256, min=8, beat=4 and a device supporting 16), a get(put)64 will be fragmented into 64/16=4 chunks, and the corresponding fragmentation index(aFragnum
)seq are 6,4,2,0. The fragmenter will send each fragmentation using appropriate A channel configuration:out.a.bits.size
is now size of a fragmentation(log),address
should also be juxtaposed to the sub-chunk(skip the esoteric calculation for now). The rationale forout.a.bits.source := Cat(Seq(in_a.bits.source) ++ aFull ++ Seq(aToggle.asUInt, aFragnum))
is well documented in the codebase, I will just copy them here:A Repeater is placed along the A channel so that once a request is applied in channel A, we should hold that request in effect until the last fragmentation when the request doesn't bring data itself. For data-full request like Put, the data field is constantly changing, the client has to hold the request until all data beat has been sent(it's the client's duty), therefore there is no need to hold the request by repeater for data-full request. So the repeat criteria is
repeater.io.repeat := !aHasData && aFragnum =/= UInt(0)
. It's very important to note that unless specified explicitly(see the TLWidthWidget case:managerFn = { case m => m.v1copy(beatBytes = innerBeatBytes) }
) the actual width of a tilelink interface is normally configured by managers, therefore thein
port of TLFragmenter has the same beatByte width with theout
port.When D response reaches at the
out
port, we need to send the response back toin
port with some of the fields correctly modified. Such as thesource
field:in.d.bits.source := out.d.bits.source >> addedBits
;size
field sent back toin.d
should be the original size initially specified inin.a
request:in.d.bits.size := Mux(dFirst, dFirst_size, dOrig)
(dOrig
is the registered value ofdFirst_size
:when(dFirst){dOrig := dFirst_size}
), the original size(it'sdFirst_size
) is calculated based on index of the very first fragmentation initially sent viaout.a
:val dFirst_size = OH1ToUInt((dFragnum << log2Ceil(minSize)) | dsizeOH1)
. Since response of all managers are mutually FIFO, therefore thedFragnum
whendFirst
beingtrue.B
(indicating the first beat response fromout.a
) is index of the very first fragmentation initially sent viaout.a
. Index of the fragmentation to which each of theout.d
response belongs can be obtained byval dFragnum = out.d.bits.source(fragmentBits-1, 0)
:in.d
otherwise there will be unacceptable data loss. However, if the d channel response is data-less, the number of d channel response is equal to number of fragmentation initially sent viaout.a
. However, we need only send one single d channel response throughin.a
to echo the initial single in.a request. Therefore we maydrop
some of the out.d responses:val drop = !dHasData && !Mux(doEarlyAck, dFirst, dLast)
.doEarlyAck
indicates whether a specific multi-fragmentationPut
should be acknowledged on the first fragmentation or the last, ifdoEarlyAck
is asserted, we should ack toin.d
at the very firstout.d
response and drop all other d upcoming responses. Normally, we should always ack toin.d
when the lastout.d
response comes. In terms of tile link error signal:corrupt
anddenied
, no extra handling is needed forcorrupt
becausecorrupt
is only meaningful when data is present and it's beat specific, every beat of data-full response should be sent back toin.d
. Forcorrupt
, we should registercorrupt
signal of the droppedout.d
responses, send ORed value of the registeredcorrupt
andout.d.bits.denied
backed toin.d
. Also note that normally the manager is not allowed to deny aGet
request, because the TLFragmenter fragments the initial request, and ifout.d
response of some fragments iscorrupt
while others are not, it's hard to specify what we should reply toin.d
. This commit introducedholdFirstDeny
so that the TLFragmenter will takedenied
signal of the very first beat response as a representation of all the bursts. Consequently, the logic fordenied
forPUT
andGET
responses are different, and they are never overlapped sincedHasData
dichotomizes these two cases.The final note: if minSize == maxSize, that means the max transfer size the upstream clients will send downwards is the transfer size that all downward managers can support, therefore logic for fragmentation is not needed. Check this pr
TLFIFOFixer
Normally, TileLink managers that declare a FIFO domain(fifo id) must ensure that all requests to that domain from clients which have requested FIFO ordering see responses in order. But a manager can only control its own response order, when we want a collection of managers sends their response in FIFO order, that's what the
TLFIFOFixer
comes to play. The TLFIFOFixer filters the downward managers using specificpolicy
(TLFIFOFixer.Policy
), and applies FIFO order constraints to responses from all these selected managers to a specific client.The TLFIFOFixer uses a configured
policy
to dichotomize all managers:flatManagers
andkeepManagers
.flatManagers
contains all managers that satisfy the policy, whilekeepManagers
are the excluded ones. TLFIFOFixer also maps manager fifo id using two different schemes,fixMap
andsplatMap
:fixMap
All managers which fulfill the
TLFIFOFixer.Policy
(flatManagers
) will have theirfifoId
mapped to 0 infixMap
(even when initially a manager is not assigned a fifo id), all other managers(keepManagers
) which have theirfifoId
specified(instead of beingNone
) will map theirfifoId
to continuous integer starting from 1 infixMap
, managers inkeepManagers
which doesn't have afifoId
will remainNone
infixMap
. Note that if a manager inkeepManagers
shares a fifo domain with managers in theflatMAnagers
,fifoId
of this manager will be mapped like the ones inflatManagers
, that is 0(val keepDomains = Set(keepManagers.flatMap(_.fifoId):_*) -- flatDomains
).splatMap
In
splatMap
, fifo Id will remainNone
for all managers if they do not have fifo Id originally specified. All managers(with fifo id) inflatManagers
will map their fifo Id to integers starting from 0. Managers inkeepManagers
which share same fifo Id with ones inflatManagers
will map fifo Id the same way as the ones inflatManagers
(integers starting from 0). Fifo Id of ALL others inkeepManagers
will beNone
insplatMap
.See
managerFn
below, clients will see managers specified bypolicy
(including managers inkeepManagers
that share a fifo domain with managers in theflatMAnagers
, when we mention selected managers below, these managers are included) share the same fifo domain(fifo id being 0). In short words,TLFIFOFixer
just creates an unified fifo domain for all selected managers, and makes sure fifo order response from all these selected managers to a specific client, it's worth noting that for a client that only occupies one single sourceId(that is: only one items in the IdRange), there is only one ongoing transaction allowed, therefore it's not necessary to conduct fifo-fixing.For a specific request, TLFIFOFixer will decide whether this request goes to a policy selected manager(by checking whether that destination manager has its fifo id mapped to 0 in
fixMap
, that's selected managers by policy), all selected managers should respond to requests in FIFO order, and it'sTLFIFOFixer
's duty to maintain this order.!a_notFIFO
beingtrue.B
indicates that this request goes to a selected manager and therefore expect its responses come back in fifo order. The wayTLFIFOFixer
maintains fifo order response among all selected managers is that it serializes the request to selected managers and normally(exceptions will be depicted later) only allows one ongoing transactions from a specific client to selected managers. Specifically, for a client,TLFIFOFixer
maintains a bit for each of the assigned sourceIds of that client to record whether there is an ongoing transaction using a specific source ID that goes to the selected manager:If a client has already initiated an ongoing request to selected managers(
val track = flight.slice(c.sourceId.start, c.sourceId.end).reduce(_ || _)
being asserted), a new request from that exact client is only allowed to forward in following scenarios:a_notFIFO
being asserted.a_noDomain
being asserted, note thata_noDomain
is also asserted for those managers that's not selected by policy, buta_notFIFO
is also asserted in this case, therefore the corresponding request will not be serialized). If the destinations(managers) of two requests share the same fifoId,TLFIFOFixer
will presume that either these two requests go to the same manager, or they go to different managers but there are corresponding downstream logic to maintain fifo order response for these two requests, in either caseTLFIFOFixer
will not serialize this second request and will let it flow downwards.Specifically, note
stall
signal above, for a specific client(selected byval a_sel = c.sourceId.contains(in.a.bits.source)
) that requests fifo order for its response(c.requestFifo
being asserted) and has more than one sourceId assigned(c.sourceId.size > 1
,rationales has been depicted above),TLFIFOFixer
tracks if there is any ongoing request to selected managers previously initiated :val track = flight.slice(c.sourceId.start, c.sourceId.end)
and registeredfifoId
of that previous destination manager:val id = RegEnable(a_id, in.a.fire() && a_sel && !a_notFIFO)
, if fifo id(a_id
)of the manager where the current request goes is not equal to that previous registered id, or the current request doesn't go to a manager that's assigned with a fifoId, then it's TLFIFOFixer's duty to maintain the fifo order response, not the downstream managers, therefore this new request will be stalled.We are almost done for
TLFIFOFixer
. There is one last thing to note: when calculatingval id = RegEnable(a_id, in.a.fire() && a_sel && !a_notFIFO)
anda_id
, we actually use the compacted version of managers:That is: all the actual
fifoId
of downstream managers has been mapped viasplatMap
so that they varies in a small integer range, but I don't know clearly the rationale for this. I will modify this post once I figure this out. TODOTLWidthWidget
Normally, the width of a TileLink interface is configured by managers. If there are scenarios that we want to connect clients and managers with different interface width(that is: the tilelink interfaces have different
data
fieldsize:beatByte
), aTLWidthWidget
is needed. I was ever confused about the difference betweenTLWidthWidget
andTLFragmenter
. As far as I know,TLFragmenter
is used to fragment original request because the downstream managers themselves may not support the original request size(maxSize
>=aOrig
>=aFrag
>=minSize
>=beatByte
). For example, if a device can't handle a request with size=4(16Byte), when connecting it to a bus that masters within the bus may initiate maximum size=16 requests,TLFragmenter
is needed, although the data lane(beatByte
) of in&out interface ofTLFragmenter
are actually the same.TLWidthWidget
however handles the situation where physical data lane between client and manager is different, while the manager can accept request with arbitrary size,TLWidthWidget
is needed in this situation whereasTLFragmenter
is not.There are two scenarios that TLWidthWidget needs to consider: inport.beatBytes > outport.beatBytes and inport.beatBytes < outport.beatBytes(when these two
beatBytes
are the same, nothing more should be processed, justin<>out
). It's worth noting that when the request doesn't carry any data, theTLWidthWidget
just directly send the original request downwards in either cases above, since width of all other tilelink fields(except formask
either) remains identical between clients and managers, this is guaranteed by diplomacy negotiation(note that in normal case withoutTLWidthWidget
, thebeatBytes
between client and manager is also identical and decided by the manager). On the other hand, if the request does carry data and inport.beatBytes > outport.beatBytes, we need to divide the original transaction into separate sub-transactions, and each of the sub-transaction carries corresponding sub-chunk of data that fits in outport.beatBytes, the mask field also has to be adjusted to indicate the current data chunk, all other fields are unchanged. See code below:Specifically in code above, we hold
in
port signal until the last sub-transaction has been sent. The commit message said this was for allow preemption, I have trouble understanding how this is related to preemption, maybe the Repeater will register this request so that the initiating client can therefore lower the valid and conduct other activities? This commit changed the logic slightly so that some registers will be saved in WidthWidget, which again confuses me a lot. The Repeater registers all fields of thein
port request, including thedata
field. Maybe it's because the first data beat is replaced with the direct input instead of the registered one so that some downstream tooling can be aware of this therefore prune the unnecessary Repeater registers? Anyway, the rationale for the first data beat from in port io can be used instead of the registered first data beat is that the client has to hold the request signal until the interface is ready, the timingready
signal is asserted is when the downwards devices is ready to accept the request, therefore the first beat is always valid in this very cycle that the client<->TLWidthWidget interface is firstly fired.In
split
method, signallimit
is the count of sub-transactions that an original data-full request should be divided into. It's calculated based on the original transaction's actualsize
andbeatBytes
ofTLWidthWidget
client&manager interfaces. It's worth noting that when 2^size
outnumbersin
portbeatBytes
(therefore the client will send burst requests),limit
is justratio
instead of 2^size
/outBytes,val keepBits = log2Ceil(inBytes)
prescribes a maximum tolimit
(this isratio
).TLWidthWidget
is only for converting transactions among differentbeatBytes
, it doesn't care about number of the burst a request will initiated. The client may send burst of beats whereasTLWidthWidget
will just split each single burst based onbeatBytes
of in&out interface. An important note: 2^size
will always be multiples of thein
portbeatBytes
for burst request, thereforelimit
always remainsratio
for every beat of the burst, this is why we can use the value insize
(according to tilelink protocol:size
value among different burst for a transaction is unchanged) field to calculate thelimit
. TLWidthWidget will send each sub-transaction one by one, thehelper
method insplit
will decide the right chunk ofdata
(mask
) for a specific sub-request, the impl is straightforward. Beyond that, we also has to decide the starting index where we obtain data chunk for a specific sub-transaction, which is indexed by theaddress(keepBits-1, dropBits)
. Normallyaddress(keepBits-1, dropBits)
is 0 for burst request(2^size
outnumbersin
port beatBytes), therefore the starting index(sel
) will be 0. For requests whose 2^size
is less thanin
portbeatBytes
, starting index where the valid data lies isaddress(keepBits-1, dropBits)
, since expected data block that will be sent downwards is the size ofout
portbeatBytes
. Consequently the starting index of data chunk for a sub-transaction isval index = sel | count
.Now, let's clarify the inport.beatBytes < outport.beatBytes case. In this situation, we need to collect each of the in port response, merge them, and send them downwards through the larger outport interface. The corresponding logic is expressed in merge method:
The logic in
merge
case is a lot likesplit
, in more or less opposite way. We have to calculate how many sub-transactions(sub-transactions in this case are the original transactions coming to thein
port) we should await, the maximum count is decided bylimit
, merge them into a single transactions whose carried data fits into the larger inport.beatBytes and send them through theout
port interface(in.ready := out.ready || !last
andout.valid := in.valid && last
). In the merge case, if a original sub-transaction has itscorrupt
field asserted, we need to assert the correspondingcorrupt
in merged transactions. For a specific transaction, we need to decide the correct location this data chunk should be merged into(val enable = Seq.tabulate(ratio) { i => !((count ^ i.U) & limit).orR }
).val rdata = Reg(Vec(ratio-1, chiselTypeOf(idata)))
is defined to temporarily store data of previous sub-transaction. Note that data carried in the last sub-transaction doesn't have to be registered: when the last sub-transaction fires, we should send the merged transaction throughout
port, the last sub-transaction should holdvalid
high until that it can be handled(in this case, we can send the merged transaction through out port). The corresponding implementation is a little bit subtle, the dance betweenodata
,rdata
,pdata
andmdata
are just not intuitive. I need to improve my chisel mentality:We are almost done with
TLWidthWidget
. But there are some extra notes here:TLWidthWidget
works not only forA
andD
channel,split
andmerge
logic can be applied to any data-carry channels, includingB
andC
.split
case, when deciding the starting indexsel
, we need to takeaddress(keepBits-1, dropBits)
. However, there isn't address field for TileLink D channel. Comment in the original code base clarified this issue clearly:TLWidthWidget
has a clear separation of concern.splice
method acts as a holistic hub for each tilelink channel,split
ormerge
will be applied based on thebeatByte
difference between the in&out port of a specific channel insidesplice
.TLWidthWidget
generalizesA
,B
,C
,D
tilelink channels based on two aspects: 1, what's the specific in&out port for that channel(For example, thein
port interface forD
channel isTLWidthWidget<-> manager
, whereasin
port interface for A channel isTLWidthWidget<-> client
). 2. thebeatBytes
difference between the in&out port:TLSourceShrinker
I will skip the details for TLSourceShrinker. The rationale of TLSourceShrinker has been clarified clearly in Chipyard Documentation:
The implementation is straightforward. When
maxInFlight
(The maximum number of source IDs that will be sent from the TLSourceShrinker to the manager.) is larger thanclient.endSourceId
, no source shrinking is needed. Also note that TLSourceShrinker only works for TL-UL and TH-UH transactions, Acquires cannot pass this adapter; it makes Probes impossible. InsideTLSourceShrinker
, these is aval sourceIdMap = Mem(maxInFlight, in.a.bits.source)
tracking the sourceId mapping, when there is a client fires channle A, the original sourceId will be registered in a specific index(nextFree
) insourceIdMap
, then thatin.A
request will be send throughout.A
using the corresponding index as sourceId. Once a D-channel response fires, that response will send back through in.D with the original sourceId so that this response can correctly go to the original initiator. If all slot insourceIdMap
has been occupied(full
), a newly initiated in.A request will be blocked until there is some slot available insourceIdMap
(nextFree
). Also noteTLSourceShrinker
supportssourceId
bypass if managers support zero latency response(minLatency == 0
: the response may come back the same cycle the request is initiated), if the D channel response'ssourceId
is the same as the one in pending A channel request, thein.a.bits.source
should be directly bypassed toin.d.bits.source
, note thatminLatency == 0
doesn't always guarantee a same-cycle response, therefore we still need to register the original sourceId. The corresponding logic is as follows:Beta Was this translation helpful? Give feedback.
All reactions