Multicore Communication

Collection of different communication methods for chip mulitprocessors.

This repo shall include all the multicore communication we have done in T-CREST as a standalone repo to make the work more useful.

We use a simple rd/wr/address/data/rdy interface, which maps dirctly to the Patmos OCPcore interface (A command is acked in the next clock cycle or later, IO devices need to be ready to accept a command during the ack cycle).

The S4NOC has currently a slightly different interface (no rdy needed).

The repo also contains a Wishbone wrapper.

We plan to provide a bridge to AXI, as AXI is not so super nice.

Usage

This project is published with Maven Central. Add following line to your build.sbt

libraryDependencies += "io.github.t-crest" % "soc-comm" % "0.1.5"

Dependency

This project depends on ip-contributions, which is resolved in build.sbt

Setup

The hardware is described in Chisel and needs just a Java JDK (version 8 or 11) and sbt installed. All other needed packages will be automatically downloaded by sbt.

On a Mac with OS X sbt can be installed, assuming using homebrew as your package management tool, with:

brew install sbt

On a Linux machine, install sbt according to the instructions from sbt download

For the Chisel based tests a compiler with gcc like interface is needed.

Projects

The CPU Interface PipeCon

For this project we define a simple pipelined IO interface, that we name PipeCon for pipelined connection. The interface consisting of following signals:

class PipeConIO(private val addrWidth: Int) extends Bundle {
   val address = Input(UInt(addrWidth.W))
   val rd = Input(Bool())
   val wr = Input(Bool())
   val rdData = Output(UInt(32.W))
   val wrData = Input(UInt(32.W))
   val wrMask = Input(UInt(4.W))
   val ack = Output(Bool())
}

PipeCon itself is an abstract class, just containing the interface:

abstract class PipeCon(addrWidth: Int) extends Module {
  val io = IO(new Bundle {
    val cpuPort = new PipeConIO(addrWidth)
  })
  val cp = io.cpuPort
}

The main rules define PipeCon:

There are two transactions: read and write
The transaction command is valid for a single clock cycle
The IO device responds earliest in the following clock cycle with an asserted ack signal
A read result is valid in the clock cycle ack is asserted
An IO device can insert wait cycles by asserting ack later
The CPU may issue a new read or write command in the same cycle ack is asserted

The PipeCon specification fits well for pipelined processors, being parallel to the memory stage that has one clock cycle read latency.

This definition is basically the same as the CoreIO from Patmos, which itself is a valid OCP interface. However, as OCP is a big specification and not used so much, we define here the simplified version without a reference to OCP.

A read or write command are signaled by an asserted rd or wr. The address and write data (if it is a write) need to be valid during the command. Commands are only valid for a single cycle. Each command needs to be acknowledged by an active ack, from the IO device earliest one cycle after the command. The IO device can also insert wait states by delaying ack. Read data is available with the ack signal for one clock cycle.

The figure shows such a bus protocol that does not need a combinational reaction of the peripheral device. It is pipelined handshaking, as we propose it in this project. The request from the processor is only a single cycle long. The address bus and the read signal does not need to be driven till the acknowledgment. The ack signal comes earliest one clock cycle after the rd command, in clock cycle 3. The first read sequence has one cycle latency in this example. the same latency as the former example. However, as the request needs to be valid only one clock cycle, we can pipeline requests. Read of addresses A2 and A3 can be requested back to back, allowing a throughput of 1 data word per clock cycle.

The Patmos processor uses an OCP version with exact this protocol for accessing IO devices (OCPcore). Memory is connected via a burst interface. The Patmos Handbook gives a detailed description of the used OCP interfaces.

Ready/Valid Interface

For IO devices with a ready/valid interface (Chisel Decoupled) we provide a standard mapping for the PipeCon, the PipeConRV:

CPU interface to two ready/valid channels (one for transmit/tx, one for receive/rx).
IO mapping as in the classic PC serial port (UART)
0: status (control): bit 0 tx ready, bit 1 rx data available
4: write into txd and read from rxd

Additionally, for the S4NOC we provide following port:

8: write destination, read source (S4NOC specific)

S4NOC

The network interface and the S4NOC are written in Chisel and the source can be found in s4noc

The tests can run from the current folder with a plain

sbt test

or from your favorite Scala IDE (e.g., InelliJ or Eclipse).

To generate the Verilog code with a traffic generator execute

sbt "runMain s4noc.S4nocTrafficGen n"

where n is the number of cores.

The generated Verilog file can be found in generated/S4nocTrafficGen.v and can be synthesized to provide resource numbers and maximum clocking frequency. An Intel Qartus project is available in quartus.

The performance test is run as an application within the test folder:

sbt "Test / run s4noc.PerformanceTest"

TODO (S4NOC)

NetworkTest and LatencyTest disabled, as they (now) run too long
Share testing code between ideal and concrete NIs
Play with configuration
Check memory FIFO if it is memory in an FPGA
Should also check how much HW the translation is, probably nothing. Max 4 LUTs for a table for 16 cores
Play with FIFO buffer variations
Have Raw tester with Verilator annotation

To analyze memory issues (e.g., increase the heap size with Xmx) use a .sbtopts with

-J-XX:+HeapDumpOnOutOfMemoryError
-J-XX:HeapDumpPath=.
-J-Xmx4G

TODO

OCP Wrapper like this:

class S4nocOCPWrapper(nrCores: Int, txFifo: Int, rxFifo: Int) extends CmpDevice(nrCores) {

  val s4noc = Module(new S4noc(nrCores, txFifo, rxFifo))

  for (i <- 0 until nrCores) {

    val resp = Mux(io.cores(i).M.Cmd === OcpCmd.RD || io.cores(i).M.Cmd === OcpCmd.WR,
      OcpResp.DVA, OcpResp.NULL)

    // addresses are in words
    s4noc.io.cpuPorts(i).addr := io.cores(i).M.Addr
    s4noc.io.cpuPorts(i).wrData := io.cores(i).M.Data
    s4noc.io.cpuPorts(i).wr := io.cores(i).M.Cmd === OcpCmd.WR
    s4noc.io.cpuPorts(i).rd := io.cores(i).M.Cmd === OcpCmd.RD
    io.cores(i).S.Data := RegNext(s4noc.io.cpuPorts(i).rdData)
    io.cores(i).S.Resp := Reg(init = OcpResp.NULL, next = resp)
  }
}

Next Paper

Build a standard NoC router for best effort

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
.github/workflows		.github/workflows
project		project
quartus		quartus
src		src
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.sbt		build.sbt
handshake.svg		handshake.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multicore Communication

Usage

Dependency

Setup

Projects

The CPU Interface PipeCon

Ready/Valid Interface

S4NOC

TODO (S4NOC)

TODO

Next Paper

About

Releases 5

Packages

Contributors 2

Languages

License

t-crest/soc-comm

Folders and files

Latest commit

History

Repository files navigation

Multicore Communication

Usage

Dependency

Setup

Projects

The CPU Interface PipeCon

Ready/Valid Interface

S4NOC

TODO (S4NOC)

TODO

Next Paper

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Languages

Packages