-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
478 additions
and
187 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
|
||
## Overview | ||
|
||
The aim of this dev (made from v1.6.1) is to support atomic operation instructions. Atomic | ||
operations will bring synchronization techniques required by kernels. The goal for FRISCV is to be | ||
able to boot a kernel like FreeRTOS or Linux (without MMU) and makes the core a platform for real | ||
world usecases. | ||
|
||
From [OS dev wiki](https://wiki.osdev.org/Atomic_operation): | ||
|
||
An atomic operation is an operation that will always be executed without any other process being | ||
able to read or change state that is read or changed during the operation. It is effectively | ||
executed as a single step, and is an important quality in a number of algorithms that deal with | ||
multiple independent processes, both in synchronization and algorithms that update shared data | ||
without requiring synchronization. | ||
|
||
For single core system: | ||
|
||
If an operation requires multiple CPU instructions, then it may be interrupted in | ||
the middle of executing. If this results in a context switch (or if the interrupt handler refers | ||
to data that was being used) then atomicity could be compromised. It is possible to use any | ||
standard locking technique (e.g. a spinlock) to prevent this, but may be inefficient. If it is | ||
possible, disabling interrupts may be the most efficient method of ensuring atomicity (although | ||
note that this may increase the worst-case interrupt latency, which could be problematic if it | ||
becomes too long). | ||
|
||
For multi core system: | ||
|
||
In multiprocessor systems, ensuring atomicity exists is a little harder. It is still possible to | ||
use a lock (e.g. a spinlock) the same as on single processor systems, but merely using a single | ||
instruction or disabling interrupts will not guarantee atomic access. You must also ensure that | ||
no other processor or core in the system attempts to access the data you are working with. | ||
|
||
[Wiki Linearizability](https://en.m.wikipedia.org/wiki/Linearizability) | ||
|
||
[Wiki Load-link/Store-Conditional](https://en.wikipedia.org/wiki/Load-link/store-conditional) | ||
|
||
In summary, an atomic operation can be useful to: | ||
- synchronize threads among a core | ||
- synchronize cores in a SOC | ||
- ensure a memory location can be read-then-update in any situation, including exceptions handling | ||
and avoid any hazards | ||
|
||
Atomic operations will be implemented in the load/store stage (`memfy`). dCache stage will also be | ||
updated to better support `ACACHE`, slighlty change `AID` handling and put in place exclusive access | ||
support (a new special routing). Finally, AXI memory model needs to support this new access type. | ||
|
||
## Implementation | ||
|
||
From [Y-Combinator](https://news.ycombinator.com/item?id=27674238) | ||
|
||
LR/SC stands for load-reserved/store-conditional, also called load-linked/store-conditional. | ||
In a traditional atomic implementation using Compare-and-Swap, the order of execution is as follows: | ||
|
||
1. Read value X into register A. | ||
2. Do computation using register A, creating a new value in register B. | ||
3. Do a compare-and-swap on value X: If X == A, then set X to B. The operation was successful. If X | ||
!= A, another thread changed X while we were using it, so the operation failed. Rollback and | ||
retry. | ||
|
||
This suffers from the ABA problem: it does not detect the case where another thread changes X to a | ||
new value C, but then changed it back to A before the compare-and-swap happens. | ||
|
||
[Google Group](https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/bdiZ9QANeQM?pli=1a) | ||
|
||
|
||
|
||
|
||
## RISCV Specification v1.0 - Chapter 8 - “A” Standard Extension | ||
|
||
Instructions that atomically read-modify-write memory to support synchronization between multiple | ||
RISC-V harts running in the same memory space. | ||
|
||
The two forms of atomic instruction provided are load-reserved/store-conditional instruction and | ||
atomic fetch-and-op memory instruction. | ||
|
||
### Ordering | ||
|
||
The base RISC-V ISA has a relaxed memory model, with the FENCE instruction used to impose additional | ||
ordering constraints. The address space is divided by the execution environment into memory and I/O | ||
domains, and the FENCE instruction provides options to order accesses to one or both of these two | ||
address domains. | ||
|
||
To provide more efficient support for release consistency, each atomic instruction has two bits, | ||
aq and rl, used to specify additional memory ordering constraints as viewed by other RISC-V harts. | ||
|
||
If both bits are clear, no additional ordering constraints are imposed on the atomic memory op- | ||
eration. | ||
|
||
If only the aq bit is set, the atomic memory operation is treated as an acquire access, | ||
i.e., no following memory operations on this RISC-V hart can be observed to take place before the | ||
acquire memory operation. | ||
|
||
=> All memory instructions must be executed before the AMO. | ||
|
||
If only the rl bit is set, the atomic memory operation is treated as a release access, i.e., the | ||
release memory operation cannot be observed to take place before any earlier memory operations on | ||
this RISC-V hart. | ||
|
||
=> All memory instructions must be executed after the AMO. | ||
|
||
If both the aq and rl bits are set, the atomic memory operation is sequentially consistent and | ||
cannot be observed to happen before any earlier memory operations or after any later memory | ||
operations in the same RISC-V hart and to the same address domain. | ||
|
||
=> All memory instructions must be executed before & after the AMO. | ||
|
||
|
||
## Design Plan | ||
|
||
- Document and list all AXI usage and limitations in the IP. | ||
- The core, `memfy` and `dCache` stages, will be updated on `AID` usage. Please refer | ||
to [AMBA spec](./axi_id_ordering.md) for further details of `AID` usage and ordering model. | ||
|
||
|
||
### Global Update | ||
|
||
- Add ALOCK among the core & the platorm | ||
- Resize ALOCK to 1 bit in interconnect | ||
|
||
### Processing Unit | ||
|
||
Nothing expected to be changed | ||
|
||
### Memfy Unit | ||
|
||
When `memfy` unit receives an atomic operation: | ||
- it reserves its `rs1`/`rs2`/`rd` registers in processing scheduler | ||
- it issues a read request to a memory register with: | ||
- a specific `AID` (e.g. `0x50`), dedicated to exclusive access | ||
- `ALOCK=0x1` making the request an `exclusive access` | ||
- `ACACHE=0x0` making the request `non-cachable` and `non-bufferable`, a `device` access | ||
- it executes the atomic operation | ||
- it issues to memory a request with the same attributes than read operation | ||
- a write request to update the memory register | ||
- a read request to release the memory register | ||
|
||
### dCache Unit | ||
|
||
Needs to support exclusive access | ||
- Exclusive access is a `device` access (`non-cachable` and `non-bufferable`), read/write trough | ||
policy | ||
- Don't replace ID for exclusive access | ||
- Invalidate cache line if exclusive access occurs on a cache hit. Even if memory map should ensure | ||
a proper attribute to a memory cell, it will ease software design without hardware knowledge | ||
- dCache will not be responsible of concurrency between exclusive access and regular access. | ||
Memfy needs to handle correctly requests | ||
|
||
### AXI Memory | ||
|
||
- Upgrade to AXI4 | ||
- Support exclusive access, managed by a dedicated LUT | ||
- Reserve if first access | ||
- Release on a second (either with read or write) | ||
- Based on ID and address | ||
- Release exclusivity if write non-exclusive target a reserved-exclusive access | ||
- Correctly support in-order if same ID issued multiple times | ||
|
||
### Core | ||
|
||
- Upgrade interfaces to AXI4 | ||
|
||
### Platform | ||
|
||
- Upgrade to AXI interconnect | ||
|
||
|
||
## Test Plan | ||
|
||
- An atomic operation can't be stopped if control unit manages async/sync exceptions | ||
- Check ordering with aq & rl bits combinations | ||
- Used an unaligned address to raise an exception | ||
- Read-exclusive followed by a write non-exclusive to check exclusivity in RAM | ||
- Concurrent excusive accesses to check exclusivity in RAM | ||
- Write applications | ||
- https://begriffs.com/posts/2020-03-23-concurrent-programming.html | ||
- voir les livres / pdf sur le sujet OS et semaphores |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# AMBA AXI ID & Ordering | ||
|
||
## AXI Transaction Identifier | ||
|
||
### Overview | ||
|
||
The AXI protocol includes AXI ID transaction identifiers. A Manager can use these to identify | ||
separate transactions that must be returned in order. All transactions with a given AXI ID value | ||
must remain ordered, but there is no restriction on the ordering of transactions with different ID | ||
values. | ||
|
||
A single physical port can support out-of-order transactions by acting as a number of logical ports, | ||
each handling its transactions in order. | ||
|
||
By using AXI IDs, a Manager can issue transactions without waiting for earlier transactions to | ||
complete. This can improve system performance, because it enables parallel processing of | ||
transactions. | ||
|
||
There is no requirement for Subordinates or Managers to use AXI transaction IDs. Managers and | ||
Subordinates can process one transaction at a time. Transactions are processed in the order they are | ||
issued. | ||
|
||
Subordinates are required to reflect on the appropriate BID or RID response an AXI ID received from | ||
a Manager. | ||
|
||
### Read Data Ordering | ||
|
||
The Subordinate must ensure that the RID value of any returned data matches the ARID value of the | ||
address that it is responding to. | ||
|
||
The interconnect must ensure that the read data from a sequence of transactions with the same ARID | ||
value targeting different Subordinates is received by the Manager in the order that it issued the | ||
addresses. | ||
|
||
The read data reordering depth is the number of addresses pending in the Subordinate that can be | ||
reordered. A Subordinate that processes all transactions in order has a read data reordering depth | ||
of one. The read data reordering depth is a static value that must be specified by the designer of | ||
the Subordinate. | ||
|
||
There is no mechanism that a Manager can use to determine the read data reordering depth of a | ||
Subordinate. | ||
|
||
### Write data ordering | ||
|
||
A Manager must issue write data in the same order that it issues the transaction addresses. | ||
|
||
An interconnect that combines write transactions from different Managers must ensure that it | ||
forwards the write data in address order. | ||
|
||
|
||
### Interconnect use of transaction identifiers | ||
|
||
When a Manager is connected to an interconnect, the interconnect appends additional bits to the | ||
ARID, AWID and WID identifiers that are unique to that Manager port. This has two effects: | ||
|
||
- Managers do not have to know what ID values are used by other Managers because the interconnect | ||
makes the ID values used by each Manager unique by appending the Manager number to the original | ||
identifier. | ||
- The ID identifier at a Subordinate interface is wider than the ID identifier at a Manager | ||
interface. | ||
|
||
For response, the interconnect uses the additional bits of the xID identifier to determine which | ||
Manager port the response is destined for. The interconnect removes these bits of the xID | ||
identifier before passing the xID value to the correct Manager port. | ||
|
||
|
||
#### Master | ||
|
||
#### Slave | ||
|
||
#### Interconnect |
Oops, something went wrong.