Code Anatomy on TLFragmenter, TLFIFOFixer, TLWidthWidget and TLSourceShrinker #3084

DecodeTheEncoded · 2022-10-01T03:25:40Z

DecodeTheEncoded
Oct 1, 2022

TLFragmenter

The TLFragmenter is used to fragment a larger scale transaction into smaller ones so that the downstream manager can accept it normally. Specifically, downstream devices may specify max transfer size it can accept, it will fail to accept any transactions whose request size exceeds the specified max transfer size. The reasons why devices specifies max acceptable transfer size are varied, maybe the devices doesn't support Tilelink burst transactions(in this case, minSize of TLFragmenter should be equal to out port beatByte), or it failed by functionality related reasons(minSize of TLFragmenter may be larger than out port beatByte in this situation). When connected such a device to a bus in which masters connected in the bus may initiate larger size request, a TLFragmenter widget will be needed.
For a specific request, it will decide whether that request should be fragmented(val aFrag = Mux(aOrig > limit, limit, aOrig)), the fragmenter will divide the original request into a series of request with managers-acceptable size and send each of them downwards, some auxiliary info will be recorded to make sure all sub-chunked requests has been sent:

// What maximum transfer sizes do downstream devices support?
val maxArithmetics = managers.map(_.supportsArithmetic.max)
val maxLogicals    = managers.map(_.supportsLogical.max)
val maxGets        = managers.map(_.supportsGet.max)
val maxPutFulls    = managers.map(_.supportsPutFull.max)
val maxPutPartials = managers.map(_.supportsPutPartial.max)
val maxHints       = managers.map(m => if (m.supportsHint) maxDownSize else 0)

// We assume that the request is valid => size 0 is impossible
val lgMinSize = UInt(log2Ceil(minSize))
val maxLgArithmetics = maxArithmetics.map(m => if (m == 0) lgMinSize else UInt(log2Ceil(m)))
val maxLgLogicals    = maxLogicals   .map(m => if (m == 0) lgMinSize else UInt(log2Ceil(m)))
val maxLgGets        = maxGets       .map(m => if (m == 0) lgMinSize else UInt(log2Ceil(m)))
val maxLgPutFulls    = maxPutFulls   .map(m => if (m == 0) lgMinSize else UInt(log2Ceil(m)))
val maxLgPutPartials = maxPutPartials.map(m => if (m == 0) lgMinSize else UInt(log2Ceil(m)))
val maxLgHints       = maxHints      .map(m => if (m == 0) lgMinSize else UInt(log2Ceil(m)))

val limit = if (alwaysMin) lgMinSize else
  MuxLookup(in_a.bits.opcode, lgMinSize, Array(
    TLMessages.PutFullData    -> maxLgPutFull,
    TLMessages.PutPartialData -> maxLgPutPartial,
    TLMessages.ArithmeticData -> maxLgArithmetic,
    TLMessages.LogicalData    -> maxLgLogical,
    TLMessages.Get            -> maxLgGet,
    TLMessages.Hint           -> maxLgHint))

val aOrig = in_a.bits.size
val aFrag = Mux(aOrig > limit, limit, aOrig)
val aOrigOH1 = UIntToOH1(aOrig, log2Ceil(maxSize))
val aFragOH1 = UIntToOH1(aFrag, log2Up(maxDownSize))
val aHasData = edgeIn.hasData(in_a.bits)
val aMask = Mux(aHasData, UInt(0), aFragOH1)

val gennum = RegInit(UInt(0, width = counterBits))
val aFirst = gennum === UInt(0)
val old_gennum1 = Mux(aFirst, aOrigOH1 >> log2Ceil(beatBytes), gennum - UInt(1))
val new_gennum = ~(~old_gennum1 | (aMask >> log2Ceil(beatBytes))) // ~(~x|y) is width safe
val aFragnum = ~(~(old_gennum1 >> log2Ceil(minSize/beatBytes)) | (aFragOH1 >> log2Ceil(minSize)))
val aLast = aFragnum === UInt(0)
val aToggle = !Mux(aFirst, dToggle, RegEnable(dToggle, aFirst))
val aFull = if (earlyAck == EarlyAck.PutFulls) Some(in_a.bits.opcode === TLMessages.PutFullData) else None

when (out.a.fire()) { gennum := new_gennum }
// hjr when
repeater.io.repeat := !aHasData && aFragnum =/= UInt(0)
out.a <> in_a
//hjr todo huge confusion
out.a.bits.address := in_a.bits.address | ~(old_gennum1 << log2Ceil(beatBytes) | ~aOrigOH1 | aFragOH1 | UInt(minSize-1))
out.a.bits.source := Cat(Seq(in_a.bits.source) ++ aFull ++ Seq(aToggle.asUInt, aFragnum))
out.a.bits.size := aFrag

For a specific request, the destination manager is decided(val find = manager.findFast(edgeIn.address(in_a.bits))) , corresponding max supported transfer size for a specific operation(decided by in_a.bits.opcode) will be calculated, the fragmenter may choose that size or the lgMinSize when the alwaysMin is true.B to decided whether fragmentation is needed. alwaysMin indicates we should always use alwaysMin specified by configuration as the width of fragmentation(and also the criteria whether fragmentation is needed)instead of the max transfer size a destination manger can support for specific operations. Some key signals the fragmented maintains are as follows:

index the fragmentation: aFragnum
aFragnum encodes the fragmentation index in a subtle way, I feel amazed and meanwhile confused about the calculation details. It indexed a specific chunk among all fragmentation, and aFragnum being 0 indicates the last fragmentation sending downwards. The dance among old_gennum1, new_gennum and gennum is esoteric and I will skip delineating them for now. Anyway, size of a fragmentation is multiples of MinSize, while MinSize itself is multiples of BeatByte. In a specific example documented in the codebase(max=256, min=8, beat=4 and a device supporting 16), a get(put)64 will be fragmented into 64/16=4 chunks, and the corresponding fragmentation index(aFragnum)seq are 6,4,2,0. The fragmenter will send each fragmentation using appropriate A channel configuration:

when (out.a.fire()) { gennum := new_gennum }
// hjr when
repeater.io.repeat := !aHasData && aFragnum =/= UInt(0)
out.a <> in_a
//hjr todo huge confusion
out.a.bits.address := in_a.bits.address | ~(old_gennum1 << log2Ceil(beatBytes) | ~aOrigOH1 | aFragOH1 | UInt(minSize-1))
out.a.bits.source := Cat(Seq(in_a.bits.source) ++ aFull ++ Seq(aToggle.asUInt, aFragnum))
out.a.bits.size := aFrag

The out.a.bits.size is now size of a fragmentation(log), address should also be juxtaposed to the sub-chunk(skip the esoteric calculation for now). The rationale for out.a.bits.source := Cat(Seq(in_a.bits.source) ++ aFull ++ Seq(aToggle.asUInt, aFragnum)) is well documented in the codebase, I will just copy them here:

// Consider the following waveform for two 4-beat bursts:
// ---A----A------------
// -------D-----DDD-DDDD
// Under TL rules, the second A can use the same source as the first A,
// because the source is released for reuse on the first response beat.
//
// However, if we fragment the requests, it looks like this:
// ---3210-3210---------
// -------3-----210-3210
// ... now we've broken the rules because 210 are twice inflight.
//
// This phenomenon means we can have essentially 2*maxSize/minSize-1 --hjr todo why?
// fragmented transactions in flight per original transaction source.
//
// To keep the source unique, we encode the beat counter in the low
// bits of the source. To solve the overlap, we use a toggle bit.
// Whatever toggle bit the D is reassembling, A will use the opposite.

repeat the in.a request
A Repeater is placed along the A channel so that once a request is applied in channel A, we should hold that request in effect until the last fragmentation when the request doesn't bring data itself. For data-full request like Put, the data field is constantly changing, the client has to hold the request until all data beat has been sent(it's the client's duty), therefore there is no need to hold the request by repeater for data-full request. So the repeat criteria is repeater.io.repeat := !aHasData && aFragnum =/= UInt(0). It's very important to note that unless specified explicitly(see the TLWidthWidget case: managerFn = { case m => m.v1copy(beatBytes = innerBeatBytes) }) the actual width of a tilelink interface is normally configured by managers, therefore the in port of TLFragmenter has the same beatByte width with the out port.
the D channel return path
When D response reaches at the out port, we need to send the response back to in port with some of the fields correctly modified. Such as the source field: in.d.bits.source := out.d.bits.source >> addedBits; size field sent back to in.d should be the original size initially specified in in.a request: in.d.bits.size := Mux(dFirst, dFirst_size, dOrig) (dOrig is the registered value of dFirst_size: when(dFirst){dOrig := dFirst_size}), the original size(it's dFirst_size) is calculated based on index of the very first fragmentation initially sent via out.a: val dFirst_size = OH1ToUInt((dFragnum << log2Ceil(minSize)) | dsizeOH1). Since response of all managers are mutually FIFO, therefore the dFragnum when dFirst being true.B(indicating the first beat response from out.a) is index of the very first fragmentation initially sent via out.a. Index of the fragmentation to which each of the out.d response belongs can be obtained by val dFragnum = out.d.bits.source(fragmentBits-1, 0):
```
// First, handle the return path
val acknum = RegInit(UInt(0, width = counterBits))
val dOrig = Reg(UInt())
val dToggle = RegInit(Bool(false))
val dFragnum = out.d.bits.source(fragmentBits-1, 0)
val dFirst = acknum === UInt(0)
val dLast = dFragnum === UInt(0) // only for AccessAck (!Data)
val dsizeOH  = UIntToOH (out.d.bits.size, log2Ceil(maxDownSize)+1)
val dsizeOH1 = UIntToOH1(out.d.bits.size, log2Up(maxDownSize))
val dHasData = edgeOut.hasData(out.d.bits)

// calculate new acknum
val acknum_fragment = dFragnum << log2Ceil(minSize/beatBytes)
val acknum_size = dsizeOH1 >> log2Ceil(beatBytes)
assert (!out.d.valid || (acknum_fragment & acknum_size) === UInt(0))
val dFirst_acknum = acknum_fragment | Mux(dHasData, acknum_size, UInt(0))
val ack_decrement = Mux(dHasData, UInt(1), dsizeOH >> log2Ceil(beatBytes))
// calculate the original size
val dFirst_size = OH1ToUInt((dFragnum << log2Ceil(minSize)) | dsizeOH1)

when (out.d.fire()) {
  acknum := Mux(dFirst, dFirst_acknum, acknum - ack_decrement)
  when (dFirst) {
    dOrig := dFirst_size
    dToggle := out.d.bits.source(fragmentBits)
  }
}

// Swallow up non-data ack fragments
val doEarlyAck = earlyAck match {
  case EarlyAck.AllPuts  => Bool(true)
  case EarlyAck.PutFulls => out.d.bits.source(fragmentBits+1)
  case EarlyAck.None     => Bool(false)
}
val drop = !dHasData && !Mux(doEarlyAck, dFirst, dLast)
out.d.ready := in.d.ready || drop
in.d.valid  := out.d.valid && !drop
in.d.bits   := out.d.bits // pass most stuff unchanged
in.d.bits.source := out.d.bits.source >> addedBits
in.d.bits.size   := Mux(dFirst, dFirst_size, dOrig)
//hjr todo what if both mayDenyPut and mayDenyGet being true.B
if (edgeOut.manager.mayDenyPut) {
  val r_denied = Reg(Bool())
  val d_denied = (!dFirst && r_denied) || out.d.bits.denied
  when (out.d.fire()) { r_denied := d_denied }
  in.d.bits.denied := d_denied
}
if (edgeOut.manager.mayDenyGet) {
  // Take denied only from the first beat and hold that value
  val d_denied = out.d.bits.denied holdUnless dFirst
  when (dHasData) {
    in.d.bits.denied  := d_denied
    in.d.bits.corrupt := d_denied || out.d.bits.corrupt
  }
}
```
There is one important thing to pay attention to when handling the D channel response, when the d channel response brings data, we need to send each beat to in.d otherwise there will be unacceptable data loss. However, if the d channel response is data-less, the number of d channel response is equal to number of fragmentation initially sent via out.a. However, we need only send one single d channel response through in.a to echo the initial single in.a request. Therefore we may drop some of the out.d responses: val drop = !dHasData && !Mux(doEarlyAck, dFirst, dLast). doEarlyAck indicates whether a specific multi-fragmentation Put should be acknowledged on the first fragmentation or the last, if doEarlyAck is asserted, we should ack to in.d at the very first out.d response and drop all other d upcoming responses. Normally, we should always ack to in.d when the last out.d response comes. In terms of tile link error signal: corrupt and denied, no extra handling is needed for corrupt because corrupt is only meaningful when data is present and it's beat specific, every beat of data-full response should be sent back to in.d. For corrupt, we should register corrupt signal of the dropped out.d responses, send ORed value of the registered corrupt and out.d.bits.denied backed to in.d. Also note that normally the manager is not allowed to deny a Get request, because the TLFragmenter fragments the initial request, and if out.d response of some fragments is corrupt while others are not, it's hard to specify what we should reply to in.d. This commit introduced holdFirstDeny so that the TLFragmenter will take denied signal of the very first beat response as a representation of all the bursts. Consequently, the logic for denied for PUT and GET responses are different, and they are never overlapped since dHasData dichotomizes these two cases.
The final note: if minSize == maxSize, that means the max transfer size the upstream clients will send downwards is the transfer size that all downward managers can support, therefore logic for fragmentation is not needed. Check this pr

TLFIFOFixer

Normally, TileLink managers that declare a FIFO domain(fifo id) must ensure that all requests to that domain from clients which have requested FIFO ordering see responses in order. But a manager can only control its own response order, when we want a collection of managers sends their response in FIFO order, that's what the TLFIFOFixer comes to play. The TLFIFOFixer filters the downward managers using specific policy(TLFIFOFixer.Policy), and applies FIFO order constraints to responses from all these selected managers to a specific client.
The TLFIFOFixer uses a configured policy to dichotomize all managers: flatManagers and keepManagers. flatManagers contains all managers that satisfy the policy, while keepManagers are the excluded ones. TLFIFOFixer also maps manager fifo id using two different schemes, fixMap and splatMap:

fixMap
All managers which fulfill the TLFIFOFixer.Policy(flatManagers) will have their fifoId mapped to 0 in fixMap(even when initially a manager is not assigned a fifo id), all other managers(keepManagers) which have their fifoId specified(instead of being None) will map their fifoId to continuous integer starting from 1 in fixMap, managers in keepManagers which doesn't have a fifoId will remain None in fixMap. Note that if a manager in keepManagers shares a fifo domain with managers in the flatMAnagers, fifoId of this manager will be mapped like the ones in flatManagers, that is 0(val keepDomains = Set(keepManagers.flatMap(_.fifoId):_*) -- flatDomains).
splatMap
In splatMap, fifo Id will remain None for all managers if they do not have fifo Id originally specified. All managers(with fifo id) in flatManagers will map their fifo Id to integers starting from 0. Managers in keepManagers which share same fifo Id with ones in flatManagers will map fifo Id the same way as the ones in flatManagers(integers starting from 0). Fifo Id of ALL others in keepManagers will be None in splatMap.

See managerFn below, clients will see managers specified by policy(including managers in keepManagers that share a fifo domain with managers in the flatMAnagers, when we mention selected managers below, these managers are included) share the same fifo domain(fifo id being 0). In short words, TLFIFOFixer just creates an unified fifo domain for all selected managers, and makes sure fifo order response from all these selected managers to a specific client, it's worth noting that for a client that only occupies one single sourceId(that is: only one items in the IdRange), there is only one ongoing transaction allowed, therefore it's not necessary to conduct fifo-fixing.

val node = TLAdapterNode(
  clientFn  = { cp => cp },
  managerFn = { mp =>
    val (fixMap, _) = fifoMap(mp.managers)
    mp.v1copy(managers = (fixMap zip mp.managers) map { case (id, m) => m.v1copy(fifoId = id) })
  })

For a specific request, TLFIFOFixer will decide whether this request goes to a policy selected manager(by checking whether that destination manager has its fifo id mapped to 0 in fixMap, that's selected managers by policy), all selected managers should respond to requests in FIFO order, and it's TLFIFOFixer's duty to maintain this order.

// Do we need to serialize the request to this manager?
val a_notFIFO = edgeIn.manager.fastProperty(in.a.bits.address, _.fifoId != Some(0), (b:Boolean) => Bool(b))

!a_notFIFO being true.B indicates that this request goes to a selected manager and therefore expect its responses come back in fifo order. The way TLFIFOFixer maintains fifo order response among all selected managers is that it serializes the request to selected managers and normally(exceptions will be depicted later) only allows one ongoing transactions from a specific client to selected managers. Specifically, for a client, TLFIFOFixer maintains a bit for each of the assigned sourceIds of that client to record whether there is an ongoing transaction using a specific source ID that goes to the selected manager:

// Count beats
val a_first = edgeIn.first(in.a)
val d_first = edgeOut.first(out.d) && out.d.bits.opcode =/= TLMessages.ReleaseAck
// Keep one bit for each source recording if there is an outstanding request that must be made FIFO
// Sources unused in the stall signal calculation should be pruned by DCE-hjr DCE-> dead code elimination
val flight = RegInit(Vec.fill(edgeIn.client.endSourceId) { Bool(false) })
when (a_first && in.a.fire()) { flight(in.a.bits.source) := !a_notFIFO }
when (d_first && in.d.fire()) { flight(in.d.bits.source) := Bool(false) }

If a client has already initiated an ongoing request to selected managers(val track = flight.slice(c.sourceId.start, c.sourceId.end).reduce(_ || _) being asserted), a new request from that exact client is only allowed to forward in following scenarios:

This new request doesn't go to selected managers, that is a_notFIFO being asserted.
This new request shares the same actual fifo id with the previous ongoing request. This is a little bit subtle, all selected managers have their own fifoIds, and some of them may even not specify it explicitly(a_noDomain being asserted, note that a_noDomain is also asserted for those managers that's not selected by policy, but a_notFIFO is also asserted in this case, therefore the corresponding request will not be serialized). If the destinations(managers) of two requests share the same fifoId, TLFIFOFixer will presume that either these two requests go to the same manager, or they go to different managers but there are corresponding downstream logic to maintain fifo order response for these two requests, in either case TLFIFOFixer will not serialize this second request and will let it flow downwards.

val stalls = edgeIn.client.clients.filter(c => c.requestFifo && c.sourceId.size > 1).map { c =>
  val a_sel = c.sourceId.contains(in.a.bits.source)
  val id    = RegEnable(a_id, in.a.fire() && a_sel && !a_notFIFO)
  val track = flight.slice(c.sourceId.start, c.sourceId.end)

  a_sel && a_first && track.reduce(_ || _) && (a_noDomain || id =/= a_id)
}

val stall = stalls.foldLeft(Bool(false))(_||_)

out.a <> in.a
in.d <> out.d
out.a.valid := in.a.valid && (a_notFIFO || !stall)
in.a.ready := out.a.ready && (a_notFIFO || !stall)

Specifically, note stall signal above, for a specific client(selected by val a_sel = c.sourceId.contains(in.a.bits.source)) that requests fifo order for its response(c.requestFifo being asserted) and has more than one sourceId assigned(c.sourceId.size > 1,rationales has been depicted above), TLFIFOFixer tracks if there is any ongoing request to selected managers previously initiated : val track = flight.slice(c.sourceId.start, c.sourceId.end) and registered fifoId of that previous destination manager: val id = RegEnable(a_id, in.a.fire() && a_sel && !a_notFIFO), if fifo id(a_id)of the manager where the current request goes is not equal to that previous registered id, or the current request doesn't go to a manager that's assigned with a fifoId, then it's TLFIFOFixer's duty to maintain the fifo order response, not the downstream managers, therefore this new request will be stalled.
We are almost done for TLFIFOFixer. There is one last thing to note: when calculating val id = RegEnable(a_id, in.a.fire() && a_sel && !a_notFIFO) and a_id, we actually use the compacted version of managers:

// Compact the IDs of the cases we serialize
val compacted = ((fixMap zip splatMap) zip edgeOut.manager.managers) flatMap {
  case ((f, s), m) => if (f == Some(0)) Some(m.v1copy(fifoId = s)) else None
}

That is: all the actual fifoId of downstream managers has been mapped via splatMap so that they varies in a small integer range, but I don't know clearly the rationale for this. I will modify this post once I figure this out. TODO

TLWidthWidget

Normally, the width of a TileLink interface is configured by managers. If there are scenarios that we want to connect clients and managers with different interface width(that is: the tilelink interfaces have different data field size：beatByte), a TLWidthWidget is needed. I was ever confused about the difference between TLWidthWidget and TLFragmenter. As far as I know, TLFragmenter is used to fragment original request because the downstream managers themselves may not support the original request size(maxSize>=aOrig>=aFrag>=minSize>=beatByte). For example, if a device can't handle a request with size=4(16Byte), when connecting it to a bus that masters within the bus may initiate maximum size=16 requests, TLFragmenter is needed, although the data lane(beatByte) of in&out interface of TLFragmenter are actually the same. TLWidthWidget however handles the situation where physical data lane between client and manager is different, while the manager can accept request with arbitrary size, TLWidthWidget is needed in this situation whereas TLFragmenter is not.
There are two scenarios that TLWidthWidget needs to consider: inport.beatBytes > outport.beatBytes and inport.beatBytes < outport.beatBytes(when these two beatBytes are the same, nothing more should be processed, just in<>out). It's worth noting that when the request doesn't carry any data, the TLWidthWidget just directly send the original request downwards in either cases above, since width of all other tilelink fields(except for mask either) remains identical between clients and managers, this is guaranteed by diplomacy negotiation(note that in normal case without TLWidthWidget, the beatBytes between client and manager is also identical and decided by the manager). On the other hand, if the request does carry data and inport.beatBytes > outport.beatBytes, we need to divide the original transaction into separate sub-transactions, and each of the sub-transaction carries corresponding sub-chunk of data that fits in outport.beatBytes, the mask field also has to be adjusted to indicate the current data chunk, all other fields are unchanged. See code below:

else if (edgeIn.manager.beatBytes > edgeOut.manager.beatBytes) {
  // split input to output
  val repeat = Wire(Bool())
  val repeated = Repeater(in, repeat)
  val cated = Wire(chiselTypeOf(repeated))
  cated <> repeated
  edgeIn.data(cated.bits) := Cat(
    edgeIn.data(repeated.bits)(edgeIn.manager.beatBytes*8-1, edgeOut.manager.beatBytes*8),
    edgeIn.data(in.bits)(edgeOut.manager.beatBytes*8-1, 0))
  repeat := split(edgeIn, cated, edgeOut, out, sourceMap)
}

def split[T <: TLDataChannel](edgeIn: TLEdge, in: DecoupledIO[T], edgeOut: TLEdge, out: DecoupledIO[T], sourceMap: UInt => UInt) = {
   val inBytes = edgeIn.manager.beatBytes
   val outBytes = edgeOut.manager.beatBytes
   val ratio = inBytes / outBytes
   val keepBits  = log2Ceil(inBytes)
   val dropBits  = log2Ceil(outBytes)
   val countBits = log2Ceil(ratio)

   val size    = edgeIn.size(in.bits)
   val hasData = edgeIn.hasData(in.bits)
   val limit   = UIntToOH1(size, keepBits) >> dropBits

   val count = RegInit(0.U(countBits.W))
   val first = count === 0.U
   val last  = count === limit || !hasData

   when (out.fire()) {
     count := count + 1.U
     when (last) { count := 0.U }
   }

   // For sub-beat transfer, extract which part matters
   val sel = in.bits match {
     case a: TLBundleA => a.address(keepBits-1, dropBits)
     case b: TLBundleB => b.address(keepBits-1, dropBits)
     case c: TLBundleC => c.address(keepBits-1, dropBits)
     case d: TLBundleD => {
       val sel = sourceMap(d.source)
       val hold = Mux(first, sel, RegEnable(sel, first)) // a_first is not for whole xfer
       hold & ~limit // if more than one a_first/xfer, the address must be aligned anyway
     }
   }

   val index  = sel | count
   def helper(idata: UInt, width: Int): UInt = {
     val mux = VecInit.tabulate(ratio) { i => idata((i+1)*outBytes*width-1, i*outBytes*width) }
     mux(index)
   }

   out.bits := in.bits
   out.valid := in.valid
   in.ready := out.ready

   // Don't put down hardware if we never carry data
   edgeOut.data(out.bits) := (if (edgeIn.staticHasData(in.bits) == Some(false)) 0.U else helper(edgeIn.data(in.bits), 8))

   (out.bits, in.bits) match {
     case (o: TLBundleA, i: TLBundleA) => o.mask := helper(i.mask, 1)
     case (o: TLBundleB, i: TLBundleB) => o.mask := helper(i.mask, 1)
     //hjr there is no mask field in C and D
     case (o: TLBundleC, i: TLBundleC) => () // replicating corrupt to all beats is ok
     case (o: TLBundleD, i: TLBundleD) => ()
     case _ => require(false, "Impossbile bundle combination in WidthWidget")
   }

   // Repeat the input if we're not last
   !last
 }

Specifically in code above, we hold in port signal until the last sub-transaction has been sent. The commit message said this was for allow preemption, I have trouble understanding how this is related to preemption, maybe the Repeater will register this request so that the initiating client can therefore lower the valid and conduct other activities? This commit changed the logic slightly so that some registers will be saved in WidthWidget, which again confuses me a lot. The Repeater registers all fields of the in port request, including the data field. Maybe it's because the first data beat is replaced with the direct input instead of the registered one so that some downstream tooling can be aware of this therefore prune the unnecessary Repeater registers? Anyway, the rationale for the first data beat from in port io can be used instead of the registered first data beat is that the client has to hold the request signal until the interface is ready, the timing ready signal is asserted is when the downwards devices is ready to accept the request, therefore the first beat is always valid in this very cycle that the client<->TLWidthWidget interface is firstly fired.
In split method, signal limit is the count of sub-transactions that an original data-full request should be divided into. It's calculated based on the original transaction's actual size and beatBytes of TLWidthWidget client&manager interfaces. It's worth noting that when 2^size outnumbers in port beatBytes(therefore the client will send burst requests), limit is just ratio instead of 2^size/outBytes, val keepBits = log2Ceil(inBytes) prescribes a maximum to limit(this is ratio). TLWidthWidget is only for converting transactions among different beatBytes, it doesn't care about number of the burst a request will initiated. The client may send burst of beats whereas TLWidthWidget will just split each single burst based on beatBytes of in&out interface. An important note: 2^size will always be multiples of the in port beatBytes for burst request, therefore limit always remains ratio for every beat of the burst, this is why we can use the value in size(according to tilelink protocol: size value among different burst for a transaction is unchanged) field to calculate the limit. TLWidthWidget will send each sub-transaction one by one, the helper method in split will decide the right chunk of data(mask) for a specific sub-request, the impl is straightforward. Beyond that, we also has to decide the starting index where we obtain data chunk for a specific sub-transaction, which is indexed by the address(keepBits-1, dropBits). Normally address(keepBits-1, dropBits) is 0 for burst request(2^size outnumbers in port beatBytes), therefore the starting index(sel) will be 0. For requests whose 2^size is less than in port beatBytes, starting index where the valid data lies is address(keepBits-1, dropBits), since expected data block that will be sent downwards is the size of out port beatBytes. Consequently the starting index of data chunk for a sub-transaction is val index = sel | count.
Now, let's clarify the inport.beatBytes < outport.beatBytes case. In this situation, we need to collect each of the in port response, merge them, and send them downwards through the larger outport interface. The corresponding logic is expressed in merge method:

def merge[T <: TLDataChannel](edgeIn: TLEdge, in: DecoupledIO[T], edgeOut: TLEdge, out: DecoupledIO[T]) = {
   val inBytes = edgeIn.manager.beatBytes
   val outBytes = edgeOut.manager.beatBytes
   val ratio = outBytes / inBytes
   val keepBits  = log2Ceil(outBytes)
   val dropBits  = log2Ceil(inBytes)
   val countBits = log2Ceil(ratio)

   val size    = edgeIn.size(in.bits)
   val hasData = edgeIn.hasData(in.bits)
   val limit   = UIntToOH1(size, keepBits) >> dropBits

   val count  = RegInit(0.U(countBits.W))
   val first  = count === 0.U
   val last   = count === limit || !hasData
   val enable = Seq.tabulate(ratio) { i => !((count ^ i.U) & limit).orR }

   val corrupt_reg = RegInit(false.B)
   val corrupt_in = edgeIn.corrupt(in.bits)
   val corrupt_out = corrupt_in || corrupt_reg

   when (in.fire()) {
     count := count + 1.U
     corrupt_reg := corrupt_out
     when (last) {
       count := 0.U
       corrupt_reg := false.B
     }
   }

   def helper(idata: UInt): UInt = {
     //hjr this pr https://github.com/chipsalliance/rocket-chip/pull/2815
     // rdata is X until the first time a multi-beat write occurs.
     // Prevent the X from leaking outside by jamming the mux control until
     // the first time rdata is written (and hence no longer X).
     val rdata_written_once = RegInit(false.B)
     val masked_enable = enable.map(_ || !rdata_written_once)

     val odata = Seq.fill(ratio) { WireInit(idata) }
     val rdata = Reg(Vec(ratio-1, chiselTypeOf(idata)))
     val pdata = rdata :+ idata
     val mdata = (masked_enable zip (odata zip pdata)) map { case (e, (o, p)) => Mux(e, o, p) }
     when (in.fire() && !last) {
       rdata_written_once := true.B
       (rdata zip mdata) foreach { case (r, m) => r := m }
     }
     Cat(mdata.reverse)
   }

   in.ready := out.ready || !last
   out.valid := in.valid && last//hjr the out port is valid only when all the packet are constructed(last being true.B)
   out.bits := in.bits

   // Don't put down hardware if we never carry data
   edgeOut.data(out.bits) := (if (edgeIn.staticHasData(in.bits) == Some(false)) 0.U else helper(edgeIn.data(in.bits)))
   edgeOut.corrupt(out.bits) := corrupt_out

   (out.bits, in.bits) match {
     case (o: TLBundleA, i: TLBundleA) => o.mask := edgeOut.mask(o.address, o.size) & Mux(hasData, helper(i.mask), ~0.U(outBytes.W))
     case (o: TLBundleB, i: TLBundleB) => o.mask := edgeOut.mask(o.address, o.size) & Mux(hasData, helper(i.mask), ~0.U(outBytes.W))
     case (o: TLBundleC, i: TLBundleC) => ()
     case (o: TLBundleD, i: TLBundleD) => ()
     case _ => require(false, "Impossible bundle combination in WidthWidget")
   }
}

The logic in merge case is a lot like split, in more or less opposite way. We have to calculate how many sub-transactions(sub-transactions in this case are the original transactions coming to the in port) we should await, the maximum count is decided by limit, merge them into a single transactions whose carried data fits into the larger inport.beatBytes and send them through the out port interface(in.ready := out.ready || !last and out.valid := in.valid && last). In the merge case, if a original sub-transaction has its corrupt field asserted, we need to assert the corresponding corrupt in merged transactions. For a specific transaction, we need to decide the correct location this data chunk should be merged into(val enable = Seq.tabulate(ratio) { i => !((count ^ i.U) & limit).orR }). val rdata = Reg(Vec(ratio-1, chiselTypeOf(idata))) is defined to temporarily store data of previous sub-transaction. Note that data carried in the last sub-transaction doesn't have to be registered: when the last sub-transaction fires, we should send the merged transaction through out port, the last sub-transaction should hold valid high until that it can be handled(in this case, we can send the merged transaction through out port). The corresponding implementation is a little bit subtle, the dance between odata, rdata, pdata and mdata are just not intuitive. I need to improve my chisel mentality:

def helper(idata: UInt): UInt = {
  //hjr this pr https://github.com/chipsalliance/rocket-chip/pull/2815
  // rdata is X until the first time a multi-beat write occurs.
  // Prevent the X from leaking outside by jamming the mux control until
  // the first time rdata is written (and hence no longer X).
  val rdata_written_once = RegInit(false.B)
  val masked_enable = enable.map(_ || !rdata_written_once)

  val odata = Seq.fill(ratio) { WireInit(idata) }
  val rdata = Reg(Vec(ratio-1, chiselTypeOf(idata)))
  val pdata = rdata :+ idata
  val mdata = (masked_enable zip (odata zip pdata)) map { case (e, (o, p)) => Mux(e, o, p) }
  when (in.fire() && !last) {
    rdata_written_once := true.B
    (rdata zip mdata) foreach { case (r, m) => r := m }
  }
  Cat(mdata.reverse)
}

We are almost done with TLWidthWidget. But there are some extra notes here:

TLWidthWidget works not only for A and D channel, split and merge logic can be applied to any data-carry channels, including B and C.

In the split case, when deciding the starting index sel, we need to take address(keepBits-1, dropBits). However, there isn't address field for TileLink D channel. Comment in the original code base clarified this issue clearly:

// If the master is narrower than the slave, the D channel must be narrowed.
// This is tricky, because the D channel has no address data.
// Thus, you don't know which part of a sub-beat transfer to extract.
// To fix this, we record the relevant address bits for all sources.
// The assumption is that this sort of situation happens only where
// you connect a narrow master to the system bus, so there are few sources.
def sourceMap(source: UInt) = {
  require (edgeOut.manager.beatBytes > edgeIn.manager.beatBytes)
  val keepBits = log2Ceil(edgeOut.manager.beatBytes)
  val dropBits = log2Ceil(edgeIn.manager.beatBytes)
  val sources  = Reg(Vec(edgeIn.client.endSourceId, UInt((keepBits-dropBits).W)))
  val a_sel = in.a.bits.address(keepBits-1, dropBits)
  when (in.a.fire()) {
    sources(in.a.bits.source) := a_sel
  }

  // depopulate unused source registers:
  edgeIn.client.unusedSources.foreach { id => sources(id) := 0.U }
  //hjr I have confusion on the combinational loop comment made in this commit:
  //hjr https://github.com/chipsalliance/rocket-chip/commit/87205bb14c3af38ec5dc2afb76384401ec989ea3
  //hjr the source field below actually comes from the d channel response. Therefore the manager that
  //hjr supports the response coming back the same cycle(minLatency being 0) with the request(in A channel) will have the addr bits
  //hjr obtained directly from A channel, instead of the registered one.
  //hjr It's worth noting that a manager with minLatency=0 may also respond to request cycles later, in this scenario, the source field
  //hjr at D can't be same as that at A.
  val bypass = in.a.valid && in.a.bits.source === source
  if (edgeIn.manager.minLatency > 0) sources(source)
  else Mux(bypass, a_sel, sources(source))
}

The TLWidthWidget has a clear separation of concern. splice method acts as a holistic hub for each tilelink channel, split or merge will be applied based on the beatByte difference between the in&out port of a specific channel inside splice. TLWidthWidget generalizes A,B,C,D tilelink channels based on two aspects: 1, what's the specific in&out port for that channel(For example, the in port interface for D channel is TLWidthWidget<-> manager, whereas in port interface for A channel is TLWidthWidget<-> client). 2. the beatBytes difference between the in&out port:

def splice[T <: TLDataChannel](edgeIn: TLEdge, in: DecoupledIO[T], edgeOut: TLEdge, out: DecoupledIO[T], sourceMap: UInt => UInt) = {
  if (edgeIn.manager.beatBytes == edgeOut.manager.beatBytes) {
    // nothing to do; pass it through
    out.bits := in.bits
    out.valid := in.valid
    in.ready := out.ready
  } else if (edgeIn.manager.beatBytes > edgeOut.manager.beatBytes) {
    // split input to output
    val repeat = Wire(Bool())
    val repeated = Repeater(in, repeat)
    val cated = Wire(chiselTypeOf(repeated))
    cated <> repeated
    edgeIn.data(cated.bits) := Cat(
      edgeIn.data(repeated.bits)(edgeIn.manager.beatBytes*8-1, edgeOut.manager.beatBytes*8),
      edgeIn.data(in.bits)(edgeOut.manager.beatBytes*8-1, 0))
    repeat := split(edgeIn, cated, edgeOut, out, sourceMap)
  } else {
    // merge input to output
    merge(edgeIn, in, edgeOut, out)
  }
}

splice(edgeIn,  in.a,  edgeOut, out.a, sourceMap)
splice(edgeOut, out.d, edgeIn,  in.d,  sourceMap)

if (edgeOut.manager.anySupportAcquireB && edgeIn.client.anySupportProbe) {
  splice(edgeOut, out.b, edgeIn,  in.b,  sourceMap)
  splice(edgeIn,  in.c,  edgeOut, out.c, sourceMap)
  out.e.valid := in.e.valid
  out.e.bits := in.e.bits
  in.e.ready := out.e.ready
} else {
  in.b.valid := false.B
  in.c.ready := true.B
  in.e.ready := true.B
  out.b.ready := true.B
  out.c.valid := false.B
  out.e.valid := false.B
}

TLSourceShrinker

I will skip the details for TLSourceShrinker. The rationale of TLSourceShrinker has been clarified clearly in Chipyard Documentation:

The number of source IDs that a manager sees is usually computed based on the clients that connect to it. In some cases, you may wish to fix the number of source IDs. For instance, you might do this if you wish to export the TileLink port to a Verilog black box. This will pose a problem, however, if the clients require a larger number of source IDs. In his situation, you will want to use a TLSourceShrinker.

The implementation is straightforward. When maxInFlight(The maximum number of source IDs that will be sent from the TLSourceShrinker to the manager.) is larger than client.endSourceId, no source shrinking is needed. Also note that TLSourceShrinker only works for TL-UL and TH-UH transactions, Acquires cannot pass this adapter; it makes Probes impossible. Inside TLSourceShrinker, these is a val sourceIdMap = Mem(maxInFlight, in.a.bits.source) tracking the sourceId mapping, when there is a client fires channle A, the original sourceId will be registered in a specific index(nextFree) in sourceIdMap, then that in.A request will be send through out.A using the corresponding index as sourceId. Once a D-channel response fires, that response will send back through in.D with the original sourceId so that this response can correctly go to the original initiator. If all slot in sourceIdMap has been occupied(full), a newly initiated in.A request will be blocked until there is some slot available in sourceIdMap(nextFree). Also note TLSourceShrinker supports sourceId bypass if managers support zero latency response(minLatency == 0: the response may come back the same cycle the request is initiated), if the D channel response's sourceId is the same as the one in pending A channel request, the in.a.bits.source should be directly bypassed to in.d.bits.source, note that minLatency == 0 doesn't always guarantee a same-cycle response, therefore we still need to register the original sourceId. The corresponding logic is as follows:

// State tracking
//hjr sourceIdMap stores the original source id of the input channel
val sourceIdMap = Mem(maxInFlight, in.a.bits.source)
val allocated = RegInit(UInt(0, width = maxInFlight))
val nextFreeOH = ~(leftOR(~allocated) << 1) & ~allocated
val nextFree = OHToUInt(nextFreeOH)
val full = allocated.andR()

val a_first = edgeIn.first(in.a)
val d_last  = edgeIn.last(in.d)

val block = a_first && full
in.a.ready := out.a.ready && !block
out.a.valid := in.a.valid && !block
out.a.bits := in.a.bits
out.a.bits.source := nextFree holdUnless a_first
//hjr todo why :https://github.com/chipsalliance/rocket-chip/commit/54bf1ad3db8c1e072a0409eea082e51cdea6630f
//hjr minLatency = 0 means the response(D) can be asserted the same cycle as the request(A)-Finally I know what minLatency means.
val bypass = Bool(edgeOut.manager.minLatency == 0) && in.a.valid && !full && a_first && nextFree === out.d.bits.source
in.d <> out.d
in.d.bits.source := Mux(bypass, in.a.bits.source, sourceIdMap(out.d.bits.source))

when (a_first && in.a.fire()) {
  //hjr when input a channels fires, store the source id at a specific index of sourceIdMap,
  //hjr that index is the new source id that's sent to teh out a channel
  sourceIdMap(nextFree) := in.a.bits.source
}

val alloc = a_first && in.a.fire()
val free = d_last && in.d.fire()
val alloc_id = Mux(alloc, nextFreeOH, UInt(0))
val free_id = Mux(free, UIntToOH(out.d.bits.source), UInt(0))
allocated := (allocated | alloc_id) & ~free_id

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Anatomy on TLFragmenter, TLFIFOFixer, TLWidthWidget and TLSourceShrinker #3084

{{title}}

Replies: 0 comments

Select a reply

Code Anatomy on TLFragmenter, TLFIFOFixer, TLWidthWidget and TLSourceShrinker #3084

DecodeTheEncoded Oct 1, 2022

TLFragmenter

TLFIFOFixer

TLWidthWidget

TLSourceShrinker

Replies: 0 comments

DecodeTheEncoded
Oct 1, 2022