Document atomic operation support

dpretet · Apr 25, 2024 · 985562a · 985562a
1 parent 5245260
commit 985562a
Show file tree

Hide file tree

Showing 8 changed files with 478 additions and 187 deletions.
diff --git a/doc/atomic_ops.md b/doc/atomic_ops.md
@@ -0,0 +1,177 @@
+
+## Overview
+
+The aim of this dev (made from v1.6.1) is to support atomic operation instructions. Atomic
+operations will bring synchronization techniques required by kernels. The goal for FRISCV is to be
+able to boot a kernel like FreeRTOS or Linux (without MMU) and makes the core a platform for real
+world usecases.
+
+From [OS dev wiki](https://wiki.osdev.org/Atomic_operation):
+
+    An atomic operation is an operation that will always be executed without any other process being
+    able to read or change state that is read or changed during the operation. It is effectively
+    executed as a single step, and is an important quality in a number of algorithms that deal with
+    multiple independent processes, both in synchronization and algorithms that update shared data
+    without requiring synchronization.
+
+For single core system:
+
+    If an operation requires multiple CPU instructions, then it may be interrupted in
+    the middle of executing. If this results in a context switch (or if the interrupt handler refers
+    to data that was being used) then atomicity could be compromised. It is possible to use any
+    standard locking technique (e.g. a spinlock) to prevent this, but may be inefficient. If it is
+    possible, disabling interrupts may be the most efficient method of ensuring atomicity (although
+    note that this may increase the worst-case interrupt latency, which could be problematic if it
+    becomes too long).
+
+For multi core system:
+
+    In multiprocessor systems, ensuring atomicity exists is a little harder. It is still possible to
+    use a lock (e.g. a spinlock) the same as on single processor systems, but merely using a single
+    instruction or disabling interrupts will not guarantee atomic access. You must also ensure that
+    no other processor or core in the system attempts to access the data you are working with.
+
+[Wiki Linearizability](https://en.m.wikipedia.org/wiki/Linearizability)
+
+[Wiki Load-link/Store-Conditional](https://en.wikipedia.org/wiki/Load-link/store-conditional)
+
+In summary, an atomic operation can be useful to:
+- synchronize threads among a core
+- synchronize cores in a SOC
+- ensure a memory location can be read-then-update in any situation, including exceptions handling
+  and avoid any hazards
+
+Atomic operations will be implemented in the load/store stage (`memfy`). dCache stage will also be
+updated to better support `ACACHE`, slighlty change `AID` handling and put in place exclusive access
+support (a new special routing). Finally, AXI memory model needs to support this new access type.
+
+## Implementation
+
+From [Y-Combinator](https://news.ycombinator.com/item?id=27674238)
+
+LR/SC stands for load-reserved/store-conditional, also called load-linked/store-conditional.
+In a traditional atomic implementation using Compare-and-Swap, the order of execution is as follows:
+
+1. Read value X into register A.
+2. Do computation using register A, creating a new value in register B.
+3. Do a compare-and-swap on value X: If X == A, then set X to B. The operation was successful. If X
+   != A, another thread changed X while we were using it, so the operation failed. Rollback and
+   retry.
+
+This suffers from the ABA problem: it does not detect the case where another thread changes X to a
+new value C, but then changed it back to A before the compare-and-swap happens.
+
+[Google Group](https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/bdiZ9QANeQM?pli=1a)
+
+
+
+
+## RISCV Specification v1.0 - Chapter 8 - “A” Standard Extension
+
+Instructions that atomically read-modify-write memory to support synchronization between multiple
+RISC-V harts running in the same memory space.
+
+The two forms of atomic instruction provided are load-reserved/store-conditional instruction and
+atomic fetch-and-op memory instruction.
+
+### Ordering
+
+The base RISC-V ISA has a relaxed memory model, with the FENCE instruction used to impose additional
+ordering constraints. The address space is divided by the execution environment into memory and I/O
+domains, and the FENCE instruction provides options to order accesses to one or both of these two
+address domains.
+
+To provide more efficient support for release consistency, each atomic instruction has two bits,
+aq and rl, used to specify additional memory ordering constraints as viewed by other RISC-V harts.
+
+If both bits are clear, no additional ordering constraints are imposed on the atomic memory op-
+eration.
+
+If only the aq bit is set, the atomic memory operation is treated as an acquire access,
+i.e., no following memory operations on this RISC-V hart can be observed to take place before the
+acquire memory operation.
+
+=> All memory instructions must be executed before the AMO.
+
+If only the rl bit is set, the atomic memory operation is treated as a release access, i.e., the
+release memory operation cannot be observed to take place before any earlier memory operations on
+this RISC-V hart.
+
+=> All memory instructions must be executed after the AMO.
+
+If both the aq and rl bits are set, the atomic memory operation is sequentially consistent and
+cannot be observed to happen before any earlier memory operations or after any later memory
+operations in the same RISC-V hart and to the same address domain.
+
+=> All memory instructions must be executed before & after the AMO.
+
+
+## Design Plan
+
+- Document and list all AXI usage and limitations in the IP.
+- The core, `memfy` and `dCache` stages, will be updated on `AID` usage. Please refer
+  to [AMBA spec](./axi_id_ordering.md) for further details of `AID` usage and ordering model.
+
+
+### Global Update
+
+- Add ALOCK among the core & the platorm
+- Resize ALOCK to 1 bit in interconnect
+
+### Processing Unit
+
+Nothing expected to be changed
+
+### Memfy Unit
+
+When `memfy` unit receives an atomic operation:
+- it reserves its `rs1`/`rs2`/`rd` registers in processing scheduler
+- it issues a read request to a memory register with:
+    - a specific `AID` (e.g. `0x50`), dedicated to exclusive access
+    - `ALOCK=0x1` making the request an `exclusive access`
+    - `ACACHE=0x0` making the request `non-cachable` and `non-bufferable`, a `device` access
+- it executes the atomic operation
+- it issues to memory a request with the same attributes than read operation
+    - a write request to update the memory register
+    - a read request to release the memory register
+
+### dCache Unit
+
+Needs to support exclusive access
+- Exclusive access is a `device` access (`non-cachable` and `non-bufferable`), read/write trough
+  policy
+- Don't replace ID for exclusive access
+- Invalidate cache line if exclusive access occurs on a cache hit. Even if memory map should ensure
+  a proper attribute to a memory cell, it will ease software design without hardware knowledge
+- dCache will not be responsible of concurrency between exclusive access and regular access.
+  Memfy needs to handle correctly requests
+
+### AXI Memory
+
+- Upgrade to AXI4
+- Support exclusive access, managed by a dedicated LUT
+    - Reserve if first access
+    - Release on a second (either with read or write)
+    - Based on ID and address
+    - Release exclusivity if write non-exclusive target a reserved-exclusive access
+- Correctly support in-order if same ID issued multiple times
+
+### Core
+
+- Upgrade interfaces to AXI4
+
+### Platform
+
+- Upgrade to AXI interconnect
+
+
+## Test Plan
+
+- An atomic operation can't be stopped if control unit manages async/sync exceptions
+- Check ordering with aq & rl bits combinations
+- Used an unaligned address to raise an exception
+- Read-exclusive followed by a write non-exclusive to check exclusivity in RAM
+- Concurrent excusive accesses to check exclusivity in RAM
+- Write applications
+    - https://begriffs.com/posts/2020-03-23-concurrent-programming.html
+    - voir les livres / pdf sur le sujet OS et semaphores
diff --git a/doc/axi_id_ordering.md b/doc/axi_id_ordering.md
@@ -0,0 +1,71 @@
+# AMBA AXI ID & Ordering
+
+## AXI Transaction Identifier
+
+### Overview
+
+The AXI protocol includes AXI ID transaction identifiers. A Manager can use these to identify
+separate transactions that must be returned in order. All transactions with a given AXI ID value
+must remain ordered, but there is no restriction on the ordering of transactions with different ID
+values. 
+
+A single physical port can support out-of-order transactions by acting as a number of logical ports,
+each handling its transactions in order. 
+
+By using AXI IDs, a Manager can issue transactions without waiting for earlier transactions to
+complete. This can improve system performance, because it enables parallel processing of
+transactions. 
+
+There is no requirement for Subordinates or Managers to use AXI transaction IDs. Managers and
+Subordinates can process one transaction at a time. Transactions are processed in the order they are
+issued. 
+
+Subordinates are required to reflect on the appropriate BID or RID response an AXI ID received from
+a Manager.
+
+### Read Data Ordering
+
+The Subordinate must ensure that the RID value of any returned data matches the ARID value of the
+address that it is responding to.
+
+The interconnect must ensure that the read data from a sequence of transactions with the same ARID
+value targeting different Subordinates is received by the Manager in the order that it issued the
+addresses.
+
+The read data reordering depth is the number of addresses pending in the Subordinate that can be
+reordered. A Subordinate that processes all transactions in order has a read data reordering depth
+of one. The read data reordering depth is a static value that must be specified by the designer of
+the Subordinate.
+
+There is no mechanism that a Manager can use to determine the read data reordering depth of a
+Subordinate.
+
+### Write data ordering
+
+A Manager must issue write data in the same order that it issues the transaction addresses.
+
+An interconnect that combines write transactions from different Managers must ensure that it
+forwards the write data in address order.
+
+
+### Interconnect use of transaction identifiers
+
+When a Manager is connected to an interconnect, the interconnect appends additional bits to the
+ARID, AWID and WID identifiers that are unique to that Manager port. This has two effects:
+
+- Managers do not have to know what ID values are used by other Managers because the interconnect
+  makes the ID values used by each Manager unique by appending the Manager number to the original
+  identifier.
+- The ID identifier at a Subordinate interface is wider than the ID identifier at a Manager
+  interface.
+
+For response, the interconnect uses the additional bits of the xID identifier to determine which
+Manager port the response is destined for. The interconnect removes these bits of the xID
+identifier before passing the xID value to the correct Manager port.
+
+
+#### Master
+
+#### Slave
+
+#### Interconnect