- Kernel-capable Hart
- Supporter des set de config du core en test bench.
- Support U-mode
- Support PMP/PMA
- https://github.com/eembc/coremark
- Advanced Interrupt controller
- AXI ERR handling
- AXI EXOKAY handling
- Atomic operations
- stage to execute the instruction, controlling ldst Stages
- memfy exposes two interfaces for requests.
- memfy drives back response along register write channel
- memfy supports two tags, one for regular, one for exclusive
- support exclusive access in cache
- support in-order for same tag (don't always subsitute)
- exclusive can't be cachable
- support exclusive access in memory, track the ID with a LUT
- Zc extension
Any new features should be carefully study to ensure a proper exception and interrupt handling
- Better manage ACACHE attribute
- Correct value driven from memfy
- Use it correctly across the cache
- Read/write allocate based on memory map
- Check impossible combination
- IO map bufferable / non-bufferable
- Make memory mapping of the core with:
- Normal vs device
- Inst vs data zone for cacheability / executability
- Sharable for L2 cache
- Support exception code for memory access error
- Manage write response from cache or interco, don’t wait endpoint
- Raise exception also from cache
- Support AXI response
- drive APROT with priv_mode
- raise an exception (which one? a custom mcause?)
- test with mapping outside interconnect memory region
- manage in a clic controller and so avoid custom spec implementation could be used for other purpose later
- Support fine-grain permission over memory range
- RISCV doesn't define privilege permission over PMP region
- raise an exception
- methode AER-like pour les enregistrer: src, address, permission
- trig an interrupt catched with PLIC controller
- https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri-risc-v.html
- AXI4 + Wrap mode for read
- Support datapath adaptation from memory controller
- Narrow transfer support?
- Gather/merge multiple continuous transactions?
- Bien définir la politique write through no allocate
- Write thru n’a pas besoin de n’a pas besoin d’eviction buffer https://stackoverflow.com/questions/23635284/what-is-the-difference-between-eviction-buffer-and-merging-store-buffer-on-arm-c
- Renommer le write stage pour merging store buffer et essayer de merger les acces au besoin
- https://en.wikipedia.org/wiki/Write_buffer
- Write back policy permet de sauver de la BW mais rend la structure plus évoluée
- New cache associativity (2 / 4 / 8 ways configurable)
- OoO read: miss could be stacked and served later waiting for cache fill and continue reading the next address
- Fully concurrent read / write access (Issue #1)
- Split memfy in load unit & store unit
- Add test for vector table
- Test MSTATUS.TW
- mcountinhibit: stop a specific counter
- Machine Environment Configuration Registers (menvcfg and menvcfgh)
- Machine Configuration Pointer Register (mconfigptr)
- Create a HW test platform
- Analogue pocket
- [C] Cloud
- Add registers to configure the core in platform (use custom CSR)
- Caches
- Interconnect
- processing: scheduling, hazard detection
- Support CLIC controller
- Random peripheral
- UART: Support 9/10 bits & parity
- Deactivate the core with WFI (clock gating)
- Security Extension
- Custom pmpsec CSR
- priv/non-priv
- cacheability
- shareability
- io/mem
- HW isolation by CPU / Thread IDs
- Custom pmpsec CSR
- High-end architecture
- Supervisor mode
- https://danielmangum.com/posts/risc-v-bytes-privilege-levels/
- https://mobile.twitter.com/hasheddan/status/1514581031092899843?s=12&t=MMNTY_iRC48CjykLQBdTkQ
- https://man7.org/linux/man-pages/man2/syscall.2.html
- https://www.youtube.com/watch?app=desktop&v=1-8oYzL_Thk
- https://jborza.com/emulation/2021/04/22/ecalls-and-syscalls.html
- 64 bits support
- Support MMU extension
- Supervisor mode
- Multi-core platform:
- Counters and timers should be reworked
- Nb core configurable
- PLIC controller
- Extended atomic operation support
- Implement a L2 cache stage
- Extended Security / Sandboxing
- Debug Support / JTAG interface / GDB Usage / OpenOCD - https://tomverbeure.github.io/2021/07/18/VexRiscv-OpenOCD-and-Traps.html - https://tomverbeure.github.io/2022/02/20/GDBWave-Post-Simulation-RISCV-SW-Debugging.html - https://github.com/BLangOS/VexRiscV_with_HW-GDB_Server
- Detect address collision in memfy for better performance
- support concurrent r/w in dCache
- merge memfy_opt for memfy core udpate
- Support different clock for AXI4 memory interface, cache and internal core
- Support ECC bits in core/crossbar
- Rework GPIOs sub-system
- Reduce latency in switching logic
- Ajouter PERROR sur l’APB, to log on error reporting bus
- Rework IO APB interconnect
- Fix IO subsystem misrouted
- Fix IO subsystem bridge
- Out of order support in AXI (memfy if not using cache)
- Detect IO requests to forward info for FENCE execution
- Branch prediction
- Rewind pipeline (L0 local cache)
- Pipeline PMP CSR up to MPU setup path and stop the core with csr_ready during few cycles
- Parameter to deactivate hazard detection, save logic and measure gain
- Memfy:
- If not ready, and request present, the FSM can’t drive further data
- Manage RRESP/BRESP in the exception bus
- Support F extension: https://bellard.org/softfp/
- Division
- Save bandwidth by removing dead cycles
- Manage pow2 division by shifting
- Start division from first non-zero digit
- OoO execution with Tomasulo algorithm
- RVV for machine learning
- Move LUI into processing to prepare future extension support
- Read ASM to be sure its used for processing and not control
- Benchmark waveform doesn’t reveal high usage
- Drop lxt2 waveform
- Create app per benchmark
- Testcase C ASM cache stress
- Update synthesis flow
- Core config
- Faire un test de synthèse selon les configs du core
- Support cache disable in testbench
- Error Logger Interface
- Shared bus des CSRs, privilege mode, event, …
- Stream the event like a write memory error
- log error in a file
- Support GDB: https://tomverbeure.github.io/2021/07/18/VexRiscv-OpenOCD-and-Traps.html
- Update RISCV testsuite sources
- SV Testbench: be able to assert or not a flush req along a new request on the same cycle
- Revoir la RAM AXI pour les temps de réponses write compliance et speed
- Support LiteX: https://github.com/litex-hub/litex-boards, https://pcotret.gitlab.io/blog/processor_in_litex/
- Azure: https://www.xilinx.com/products/boards-and-kits/alveo/cloud-solutions/microsoft-azure.html
- AWS: https://www.xilinx.com/products/design-tools/acceleration-zone/aws.html
- Openlane submission
- Include a DMA in platform
- must respect PMP / PMA
- Next CPU architecture:
- Super scalar architecture
- SIMD architecture
- Vector architecture
- Application to GPGPU area
- Many-core / NoC architecture (power/interrupt consideration)
- Support float16 & float8, more generaly low-precision arithmetic like int8...
- Build a testing platform to validate IPs
- Secure platform https://msrc-blog.microsoft.com/2022/09/06/whats-the-smallest-variety-of-cheri/
- v1.6.0: User Mode
- Design
- Support U-mode:
- Previous privilege mode interrupt is stored in xPP to support nested trap
- Ecall move to M-mode
- Mret move to U-mode
- Support exceptions
- M-mode instructions executed in U-mode must raise an illegal instruction exception
- Access to M-mode only registers must raise an illegal instruction exception
- ecall code when coming from U-mode in mcause
- Support PMP (Physical Memory Protection)
- Instruction read or data R/W access are checked against PMP to secure the hart
- Address is checked with CSRs pmpcfg
- Up to 16 zones can be defined
- A zone can be readable, writable, executable
- PMP checks are applied to all accesses whose effective privilege mode is S or U, including instruction fetches and data accesses in S and U mode, and data accesses in M-mode when the MPRV bit in mstatus is set and the MPP field in mstatus contains S or U (page 56 & page 23)
- Study PMA (Physical Memory Attribute) (section 3.6)
- define R/W/X et l'address matching
- le PMA ne permet pas de définir des zones d'IO et/ou si une region peut etre cohérente
- WFI:
- if MIE/SIE=1, wait for one of them and trap to m-mode. Resume to mepc=pc+4
- if MIE/SIE=0, wait for any intp and move forward
- Support MSTATUS.TW (timeout platform-dependent)
- add FIFO for memory exceptions
- Drive aprot[0] based on priviledge mode
- mcounteren: accessibility to lower privilege modes
- Bit x = 1, lower privilege mode can read the counter
- Bit x = 0, lower privilege mode access is forbidden and raise an illegal instruction exception -[X] Testcases
- Faire varier la periode de l'EIRQ U-mode
- pass from/to m-mode/u-mode
- try mret in u-mode, needs to fail
- try to access m-mode only CSRs Traps
- Do something within a loop with interrupt enabled, data needs to be OK
- WFI in u-mode, interrupt enabled, trapped in m-mode
- WFI in u-mode, interrupt disabled, NOP
- Test des exception load/store misaligned MPU:
- configure registers
- all region configuration mode: NA4 / NAPOT / TOR
- multiple mixed region type and size
- [-] Access exceptions
- execute instruction outside allowed regions (U-mode)
- write data in U-mode
- read data in U-mode
- read data in M-mode with MPRV=1 + MPP=U-mode
- write data in M-mode with MPRV=1 + MPP=U-mode
- execute in M-mode without X + locked region
- locked access to change configuration MCOUNTEREN:
- Bit x = 1, lower privilege mode can read the counter
- Bit x = 0, lower privilege mode access is forbidden and raise an illegal instruction exception
- Support U-mode:
- Design
- v1.5.1: maintenance
- Preload jal even if processing is busy
- Print des tests qui ne marchent pas, un par un, dans le bash
- Join errors after a test status
- Review readme files
- Revoir tous les paramètres de chaque instance et les documenter
- v1.5.0: Mesure et amélioration des performances
- Print et save des registres CSRs pour chaque test, garde la trace des performances dans Git
- IP point de mesure des différents bus en bandwidth
- CPI measure in benchmark
- Augmenter le nombre d’OR max de dCache
- Prefetch read request
- Optimize write pusher to save a cycle
- Optimize Memfy dead cycle (RD write comb & pending request =0 if == 1 & valid)
- Enhance read outstanding requests in MemFy
- No more pending flags in caches, BCH / RCH handshake is used to manage reording in Memfy
- Enhance completion in OoO
- Save a cycle on RD write in Memfy
- Pending flag to deassert on completion if or=1
- OoO write completion, response needs to come from the destination if IO write
- Support prefetch: if no jump/branch detected in fetched instructions grab the next line, else give a try to fetch the branch address. AXI hint?
- Reduce cache jump
- v1.4.0
- Rework Control for faster jump.
- Rework iCache block fetcher to simplify it
- Block fetcher: pass-thru front-end FIFO to reduce latency on jump
- Scheduler to run multiple operations in parallel. ALU can run along LD/ST if no hazard
- CSR executes in a single cycle
- Enhance Memfy outstanding request support
- Add Zihpm
- Fix TX read of UART which is blocking
- Develop dCache
- Uncachable access for IOs region
- Derive from iCache
- Add pusher stage for write access
- APROT[2] pour instruction or data hint
- Develop dCache testbench
- Fix lint error code management in CI
- Memfy:
- Support outstanding read/write request
- Don’t block write if AW / W are ready
- Don’t block write until BCH but block any further read if pending write (in-order only)
- Testcase WFI
- Testcase outstanding requests
- Testcase Zicnt
- Add Zicntr
- Rework trace among the modules
- Deactivate trace with define for every module
- AXI RAM model: add a performance mode
- Add unsupported cache setup in core checkers
- Add Github actions
- Support unaligned address in APB sub-system
- Add Clint peripheral
- Output ISA regs on top level for debug purpose
- Create a tesbench for iCache
- Support script in App interactive testsuite
- Add C testsuite
- Add Apps testsuite, interactive tb with UART link from Verilator
- Add almost empty/full flags to scfifo
- Ensure interrupt and trap are correctly supported
- Update SVUT to pass extra string to vvp for VPI
- Review flush/reboot in fetcher & memctrl
- Enhance cache reboot when ARID changes. Today just flush the FIFO, could restart the whole fetcher stages
- Make AXI4-lite RAM throtteling
- Enhance processing unit (CANCELLED: implementation is too big for too few benefits) - control checks registers under use in an instruction and knows if can branch, LUI, AUIPC - processing clear the tickets once instruction is finished - processing knows if a ALU can be used based register targeted
- Support multiple ALUs in parallel, differents extensions (float, mult/div, ...)
- Better print control status when branching and trapping (MAUSE info)
- Add Github Actions and deploy CI flow
- Support both Icarus and Verilator in simulation flow
- Add M extension
- Share common sources between ASM and Compliance testsuite
- Testbench supports both CORE and platform configuration
- Develop FRISCV platform including the core, an AXI4 crossbar and peripherals
- Simplifier les r/w de CSR, save one cycle to execute an op
- Option to read ISA registers on falling edge, not combinatorial read
- Design a generic pipeline stage for processing front-end
- Support trap and interupts
- Add clint controller
- First documentation
- Add external IRQ
- Add software IRQ
- Add timer IRQ
- Parse doc and verify the trap handling (MCAUSE / ... fields)
- Support traps
- Convertir la testsuite ASM avec le format riscv-tests
- Clean up repo after
- Fix an isseu when rebooting teh cache, it issued a addr=0 request
- Better handle traps on bad instruction
- Support AXI4-lite for data interface
- Pass RISCV compliance
- Study how to use CSR
- Partager les testbench et scripts entre les envs (use verilator?)
- Define for SVLogger
- Rename sources to remove rv32 mentions
- Ajouter de check de parameters dans le top level
- Print state with function and a verbosity level
- Implement a generic logger
- Implement instruction cache
- Support AXI4-lite from control unit
- Pipeline operations
- Support outstanding requests
- Bundle fetcher in a dedicated module
- Configure the testbench for command line
- cache line width define, used to setup bin2hex.py
- cache enabling
- Debug the core
- Reboot fetcher if new ID incomes
- Write-thru FIFO: If pull and empty, write directly the output not the RAM
- Use AXI4-lite to fetch instruction
- Always forward and define addressing in byte
- Move CSR out of control unit
- Synthesis session: OK for Yosys, needs to use another or lib to map async/sync reset FFD.
- Add a debug interface (UART, JTAG) + DPI
- Add GPIOs
- Implement in-house profiler to check branching, stall time, ...
- Develop top testbench to use asm programs and rely only on RAM to drive
instructions and data into the core
- Develop a unit test framework for ASM
- Test memfy
- Test processing + memory
- Test processing vs JAL/JALR
- Test JAL/JALR vs CSRs vs Processing
- Define the architecture of the first testbench. Goal: use C/asm to produce a RAM init file to drive the testcases. Break with a EBREAK instruction
- Understand vexrisc and picorv32 make file
- Write a simple program, compile it, understand the asm
- transform object into hex file
- Understand the toolchain
- Understand the linker description to be able to initialize the processor instruction memory
- Read RISCV unpriviligied specification
- Implement control unit and its testbench
- be able to handle ALU halts for long instruction execution
- support branching / system instructions
- support pc correctly
- Implement ALU
- Populate modules' unit tests (control & alu)