Branch Prediction API #139

arupc-vmicro · 2024-01-17T07:22:02Z

A code introducing branch prediction API. Still a draft PR as I am working through a compilation issue.

Please feel free to review and provide feedback.

- Mostly intended to be pseduocode at this stage; compilation not attempted

klingaard · 2024-01-21T20:49:51Z

@bdutro you might have some input...

danbone · 2024-01-23T17:28:27Z

Hi Arup, following up on our discussion last night. Could you help me understand how the API will provide configurability, and support speculative state updates?

For configurability, how do we pass in runtime parameters. Furthermore how can we enable reuse of predictor components.
A simple example could be that we have two different branch predictors, one is a bimodal table (similar to what you have implemented), the other is a global history predictor such as gshare (i.e. a branch history table that indexed through a hash of the global history and the PC). Both predictors have different parameters, and you would likely want to swap them in and out but use the same BTB (and RAS/indirect predictor).

Like the Cache model allows users to change the address decoder, do we need similar methods for changing the components within the branch predictor i.e. the conditional branch predictor, the indirect target predictor, BTB and return address stack.

The other point is about supporting predictors which have internal speculative state. For example a global history predictor has the global history register which needs to be recovered following a flush. Similarly some predictors require meta data about the prediction to be available for the update, TAGE requires the provider table, and alternate prediction for example.

Finally we did speak about how to fit a block based BTB, like you've implemented, into a trace execution model. Branch predictors work in addresses, using trace is difficult because we have to pair instructions with their predictions based on their address. How do we handle malformed trace, or trace that has unexpected change of flow (i.e. interrupt/exception).

EDIT: I think we also need to consider how to collect statistics from the predictor.

danbone · 2024-01-25T17:33:54Z

Knute mentioned looking at other simulators. I've used gem5, but I'm also aware of champsim.

gem5 o3 model
The branch predictor is an instance within the fetch unit. It contains the BTB, RAS, conditional and indirect target predictors. However the BTB, indirect target predictor and RAS are all objects within the branch unit, which can be swapped out, or disabled entirely.

Fetch
    ├── BranchPredictor
    │   ├── btb
    │   ├── ras
    │   ├── indirectBranchPred
    │   └── IndirectPredictor

As for the API you can read the documentation here for details but I'll summarize.

The interface with Fetch is just 3 methods: predict, squash(flush) and update.
Predict is called for each instruction fetched, the instruction itself is passed through the API. The predictor relies on a sequence number (seqNum) to map predictions to squashes/updates, as the updates/history is stored within the class.

The BTB/RAS/Indirect predictor objects can be swapped out in the build configuration as long as they're a subclass of the original implementation.

The conditional predictor on the other hand, must be a subclass of the branch predictor unit itself. Many leave the base predict, squash and update implementations, and instead override the methods referenced in those. Mainly the lookup*, updateHistories*, update (internal one), and squash.

*Both referenced by predict.

ChampSim
In ChampSim (looking at the ooo_cpu) it separates the BTB and BranchPredictor into two separate classes, though the BTB contains the RAS. The ooo_cpu does a prediction for each instruction fetched (as far as I can tell). It also validates the predictions against the trace, mispredictions stall the pipeline until their resolved.

Both the BTB and BranchPredictor take the PC to produce a prediction. The update methods take the PC, target, direction and branch type.

Olympia
I think the makePrediction/doUpdate methods are fine, given how generic they are. But I do think more thought is needed on how to integrate these, providing parameterization and enable statistic collection.

It might be better to have separate objects for the conditional predictor, indirect target, BTB and RAS. With another API defined for each of them, and then a BranchPredict implementation which orchestrates predictions, updates, flushes across those objects.

That single implementation will most likely provide enough flexibility for many use cases. We also need to determine how it fits into the pipeline. My thoughts are that allow the predictor to maintain it's own state for history/pending updates, and like gem5, an ID is used to tag updates to instructions.

makePrediction is called at the start of the pipeline with the fetchPC, the prediction returned contains a unique prediction ID which is attached to the instructions.
doUpdate is called from Fetch upon flushing with some flag inside the Update class to indicate this. The prediction ID is used to rollback history/flush updates. This will also have to handle mispredictions
doUpdate is called from Fetch upon retirement to enable the predictor to commit it's updates.

arupc-vmicro · 2024-01-31T15:59:08Z

For configurability, how do we pass in runtime parameters. Furthermore how can we enable reuse of predictor components.
A simple example could be that we have two different branch predictors, one is a bimodal table (similar to what you have implemented), the other is a global history predictor such as gshare (i.e. a branch history table that indexed through a hash of the global history and the PC). Both predictors have different parameters, and you would likely want to swap them in and out but use the same BTB (and RAS/indirect predictor).

Addressing this part of question/comment: We can do this using ResourceFactories, which are used to configure the tree nodes. As an example, please see files core/CPUFactories.hpp. While sparta units such fetch, decode use a default factory, the unit execute uses a custom factory defined in core/Execute.cpp, which goes through the core extension yaml and creates execution units.

Extending that idea, we can do the following:

use a simple branch predictor or a more complex one (like specified in Add decoupled frontend (+ L1 Instruction Cache) #143) based on a command line parameter
for a more complex branch predictor with sub-predictors, select one of multiple available indirect predictor, based on parameters

arupc-vmicro · 2024-02-07T06:08:42Z

Like the Cache model allows users to change the address decoder, do we need similar methods for changing the components within the branch predictor i.e. the conditional branch predictor, the indirect target predictor, BTB and return address stack.

IMO, the API should be generic enough to support simplest of the predictors, which may not have different sub-predictors as as long as API does not prevent or create undue burden on implementing more complicated predictors with sub-predictors.

The other point is about supporting predictors which have internal speculative state. For example a global history predictor has the global history register which needs to be recovered following a flush. Similarly some predictors require meta data about the prediction to be available for the update, TAGE requires the provider table, and alternate prediction for example.

Can we not use the updatePredictor method with appropriate input (sub-classed from DefaultUpdate or defined separately) to restore/change predictor states, as needed?

Finally we did speak about how to fit a block based BTB, like you've implemented, into a trace execution model. Branch predictors work in addresses, using trace is difficult because we have to pair instructions with their predictions based on their address. How do we handle malformed trace, or trace that has unexpected change of flow (i.e. interrupt/exception).

We do have to add proper mechanism to handle traces. I plan to address this later.

Also, similar to Sparta's cache API, the user of the API can maintain stats, as needed.

arupc-vmicro · 2024-02-07T06:35:31Z

I think the makePrediction/doUpdate methods are fine, given how generic they are. But I do think more thought is needed on how to integrate these, providing parameterization and enable statistic collection.

I agree and issue #143 will give us the opportunity to work on these aspects.

It might be better to have separate objects for the conditional predictor, indirect target, BTB and RAS. With another API defined for each of them, and then a BranchPredict implementation which orchestrates predictions, updates, flushes across those objects.

I am happy to build on top of the currently proposed API, once reviewed and finalized, to create sub-predictor-specific APIs, if that's needed.

My suggestion would be that we work towards implementing #143 using the proposed branch prediction API and Sparta subunits to create, parameterize sub-predictors and combine them. As we work through it, we can discuss how can we keep interfaces generic enough to be APIs

danbone · 2024-02-07T16:22:08Z

On a different note do we need to discuss the directory structure going forward for when others implement their own predictors? Do we need a separate branchpred directory under core?

core/BranchPred.hpp

rajatbhatia1

Please put everything in separate namespace

core/BranchPred.cpp

core/BranchPred.hpp

rajatbhatia1 · 2024-02-07T18:21:04Z

core/BranchPred.hpp

+class BranchPredictorIF
+{
+public:
+    virtual PredictionT getPrediction(const InputT &) = 0;


should be a const return type, to allow compiler to do move optimization
use: virtual const PredictionT getPrediction(const InputT &) = 0;

Can you please explain further why this is needed / recommended ?

This https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f49-dont-return-const-t , seem to suggest that const return type would suppress use of move semantics and is not advisable.

core/BranchPred.cpp

core/BranchPred.hpp

arupc-vmicro · 2024-02-26T05:02:18Z

Please put everything in separate namespace

Have added branch prediction IF and SimplePredictor in the Olympia namespace, for now, in line with what we have currently for LSU, Dispatch etc. I will capture this request in a separate issue to create separate namespaces for different units of Olympia

arupc-vmicro · 2024-02-26T05:03:32Z

On a different note do we need to discuss the directory structure going forward for when others implement their own predictors? Do we need a separate branchpred directory under core?

Let us address this in a new PR when we add other branch predictors.

arupc-vmicro · 2024-02-26T05:37:01Z

Addressed review comments. Please take a look and approve, if looks good. I need at least one approval for merge.

core/BranchPredIF.hpp

danbone · 2024-02-26T08:56:14Z

core/SimpleBranchPred.hpp

+        bool actually_taken = false;
+    };
+
+    class DefaultInput


It doesn't feel right to pollute the olympia namespace with a fairly generically named class "DefaultInput". It's not obvious that this is related to branch predictors, let alone that it's used for the SimpleBranchPredictor.

Goes for the others DefaultUpdate/DefaultPrediction as well?

Can't we place them under SimpleBranchPredictor? Or name them SimpleBranchPredictorUpdate

Added a separate namespace for BranchPredictor within olympia.

arupc-vmicro · 2024-03-14T17:27:56Z

As a reminder, if there is no further comment, please approve so that we can merge this.

arupc-vmicro added 6 commits March 6, 2023 23:08

A first take on Branch Prediction API (finally!)

f41e81d

A draft of branch prediction implemention:

41be0f9

- Mostly intended to be pseduocode at this stage; compilation not attempted

Branch prediction API and an example implementation, per discussion #125

367eb84

Merge branch 'master' into arupc-vmicro/branch_pred_api

d00da67

Adding branch pred to CMake

ac60949

Fix compilation errors

7db3d25

Added tests for branch prediction

948131c

arupc-vmicro requested review from rajatbhatia1 and klingaard February 7, 2024 06:38

arupc-vmicro changed the title ~~Draft: Branch Prediction API~~ Branch Prediction API Feb 7, 2024

arupc-vmicro marked this pull request as ready for review February 7, 2024 06:43