Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch Prediction API #139

Merged
merged 12 commits into from
Mar 25, 2024
Merged

Branch Prediction API #139

merged 12 commits into from
Mar 25, 2024

Conversation

arupc-vmicro
Copy link
Collaborator

A code introducing branch prediction API. Still a draft PR as I am working through a compilation issue.

Please feel free to review and provide feedback.

@klingaard
Copy link
Collaborator

@bdutro you might have some input...

@danbone
Copy link
Contributor

danbone commented Jan 23, 2024

Hi Arup, following up on our discussion last night. Could you help me understand how the API will provide configurability, and support speculative state updates?

For configurability, how do we pass in runtime parameters. Furthermore how can we enable reuse of predictor components.
A simple example could be that we have two different branch predictors, one is a bimodal table (similar to what you have implemented), the other is a global history predictor such as gshare (i.e. a branch history table that indexed through a hash of the global history and the PC). Both predictors have different parameters, and you would likely want to swap them in and out but use the same BTB (and RAS/indirect predictor).

Like the Cache model allows users to change the address decoder, do we need similar methods for changing the components within the branch predictor i.e. the conditional branch predictor, the indirect target predictor, BTB and return address stack.

The other point is about supporting predictors which have internal speculative state. For example a global history predictor has the global history register which needs to be recovered following a flush. Similarly some predictors require meta data about the prediction to be available for the update, TAGE requires the provider table, and alternate prediction for example.

Finally we did speak about how to fit a block based BTB, like you've implemented, into a trace execution model. Branch predictors work in addresses, using trace is difficult because we have to pair instructions with their predictions based on their address. How do we handle malformed trace, or trace that has unexpected change of flow (i.e. interrupt/exception).

EDIT: I think we also need to consider how to collect statistics from the predictor.

@danbone
Copy link
Contributor

danbone commented Jan 25, 2024

Knute mentioned looking at other simulators. I've used gem5, but I'm also aware of champsim.

gem5 o3 model
The branch predictor is an instance within the fetch unit. It contains the BTB, RAS, conditional and indirect target predictors. However the BTB, indirect target predictor and RAS are all objects within the branch unit, which can be swapped out, or disabled entirely.

Fetch
    ├── BranchPredictor
    │   ├── btb
    │   ├── ras
    │   ├── indirectBranchPred
    │   └── IndirectPredictor

As for the API you can read the documentation here for details but I'll summarize.

The interface with Fetch is just 3 methods: predict, squash(flush) and update.
Predict is called for each instruction fetched, the instruction itself is passed through the API. The predictor relies on a sequence number (seqNum) to map predictions to squashes/updates, as the updates/history is stored within the class.

The BTB/RAS/Indirect predictor objects can be swapped out in the build configuration as long as they're a subclass of the original implementation.

The conditional predictor on the other hand, must be a subclass of the branch predictor unit itself. Many leave the base predict, squash and update implementations, and instead override the methods referenced in those. Mainly the lookup*, updateHistories*, update (internal one), and squash.

*Both referenced by predict.

ChampSim
In ChampSim (looking at the ooo_cpu) it separates the BTB and BranchPredictor into two separate classes, though the BTB contains the RAS. The ooo_cpu does a prediction for each instruction fetched (as far as I can tell). It also validates the predictions against the trace, mispredictions stall the pipeline until their resolved.

Both the BTB and BranchPredictor take the PC to produce a prediction. The update methods take the PC, target, direction and branch type.

Olympia
I think the makePrediction/doUpdate methods are fine, given how generic they are. But I do think more thought is needed on how to integrate these, providing parameterization and enable statistic collection.

It might be better to have separate objects for the conditional predictor, indirect target, BTB and RAS. With another API defined for each of them, and then a BranchPredict implementation which orchestrates predictions, updates, flushes across those objects.

That single implementation will most likely provide enough flexibility for many use cases. We also need to determine how it fits into the pipeline. My thoughts are that allow the predictor to maintain it's own state for history/pending updates, and like gem5, an ID is used to tag updates to instructions.

  • makePrediction is called at the start of the pipeline with the fetchPC, the prediction returned contains a unique prediction ID which is attached to the instructions.
  • doUpdate is called from Fetch upon flushing with some flag inside the Update class to indicate this. The prediction ID is used to rollback history/flush updates. This will also have to handle mispredictions
  • doUpdate is called from Fetch upon retirement to enable the predictor to commit it's updates.

@arupc-vmicro
Copy link
Collaborator Author

For configurability, how do we pass in runtime parameters. Furthermore how can we enable reuse of predictor components.
A simple example could be that we have two different branch predictors, one is a bimodal table (similar to what you have implemented), the other is a global history predictor such as gshare (i.e. a branch history table that indexed through a hash of the global history and the PC). Both predictors have different parameters, and you would likely want to swap them in and out but use the same BTB (and RAS/indirect predictor).

Addressing this part of question/comment: We can do this using ResourceFactories, which are used to configure the tree nodes. As an example, please see files core/CPUFactories.hpp. While sparta units such fetch, decode use a default factory, the unit execute uses a custom factory defined in core/Execute.cpp, which goes through the core extension yaml and creates execution units.

Extending that idea, we can do the following:

  • use a simple branch predictor or a more complex one (like specified in Add decoupled frontend (+ L1 Instruction Cache) #143) based on a command line parameter
  • for a more complex branch predictor with sub-predictors, select one of multiple available indirect predictor, based on parameters

@arupc-vmicro
Copy link
Collaborator Author

arupc-vmicro commented Feb 7, 2024

Like the Cache model allows users to change the address decoder, do we need similar methods for changing the components within the branch predictor i.e. the conditional branch predictor, the indirect target predictor, BTB and return address stack.

IMO, the API should be generic enough to support simplest of the predictors, which may not have different sub-predictors as as long as API does not prevent or create undue burden on implementing more complicated predictors with sub-predictors.

The other point is about supporting predictors which have internal speculative state. For example a global history predictor has the global history register which needs to be recovered following a flush. Similarly some predictors require meta data about the prediction to be available for the update, TAGE requires the provider table, and alternate prediction for example.

Can we not use the updatePredictor method with appropriate input (sub-classed from DefaultUpdate or defined separately) to restore/change predictor states, as needed?

Finally we did speak about how to fit a block based BTB, like you've implemented, into a trace execution model. Branch predictors work in addresses, using trace is difficult because we have to pair instructions with their predictions based on their address. How do we handle malformed trace, or trace that has unexpected change of flow (i.e. interrupt/exception).

We do have to add proper mechanism to handle traces. I plan to address this later.

Also, similar to Sparta's cache API, the user of the API can maintain stats, as needed.

@arupc-vmicro
Copy link
Collaborator Author

I think the makePrediction/doUpdate methods are fine, given how generic they are. But I do think more thought is needed on how to integrate these, providing parameterization and enable statistic collection.

I agree and issue #143 will give us the opportunity to work on these aspects.

It might be better to have separate objects for the conditional predictor, indirect target, BTB and RAS. With another API defined for each of them, and then a BranchPredict implementation which orchestrates predictions, updates, flushes across those objects.

I am happy to build on top of the currently proposed API, once reviewed and finalized, to create sub-predictor-specific APIs, if that's needed.

My suggestion would be that we work towards implementing #143 using the proposed branch prediction API and Sparta subunits to create, parameterize sub-predictors and combine them. As we work through it, we can discuss how can we keep interfaces generic enough to be APIs

@arupc-vmicro arupc-vmicro changed the title Draft: Branch Prediction API Branch Prediction API Feb 7, 2024
@arupc-vmicro arupc-vmicro marked this pull request as ready for review February 7, 2024 06:43
@danbone
Copy link
Contributor

danbone commented Feb 7, 2024

On a different note do we need to discuss the directory structure going forward for when others implement their own predictors? Do we need a separate branchpred directory under core?

core/BranchPred.hpp Outdated Show resolved Hide resolved
core/BranchPred.hpp Outdated Show resolved Hide resolved
core/BranchPred.hpp Outdated Show resolved Hide resolved
core/BranchPred.hpp Outdated Show resolved Hide resolved
core/BranchPred.hpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@rajatbhatia1 rajatbhatia1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put everything in separate namespace

core/BranchPred.cpp Outdated Show resolved Hide resolved
core/BranchPred.cpp Outdated Show resolved Hide resolved
core/BranchPred.cpp Outdated Show resolved Hide resolved
core/BranchPred.hpp Outdated Show resolved Hide resolved
core/BranchPred.hpp Outdated Show resolved Hide resolved
class BranchPredictorIF
{
public:
virtual PredictionT getPrediction(const InputT &) = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a const return type, to allow compiler to do move optimization
use: virtual const PredictionT getPrediction(const InputT &) = 0;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain further why this is needed / recommended ?

This https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f49-dont-return-const-t , seem to suggest that const return type would suppress use of move semantics and is not advisable.

core/BranchPred.cpp Outdated Show resolved Hide resolved
core/BranchPred.cpp Outdated Show resolved Hide resolved
core/BranchPred.cpp Outdated Show resolved Hide resolved
core/BranchPred.cpp Outdated Show resolved Hide resolved
core/BranchPred.hpp Outdated Show resolved Hide resolved
core/BranchPred.hpp Outdated Show resolved Hide resolved
core/BranchPred.hpp Outdated Show resolved Hide resolved
@arupc-vmicro
Copy link
Collaborator Author

Please put everything in separate namespace

Have added branch prediction IF and SimplePredictor in the Olympia namespace, for now, in line with what we have currently for LSU, Dispatch etc. I will capture this request in a separate issue to create separate namespaces for different units of Olympia

@arupc-vmicro
Copy link
Collaborator Author

On a different note do we need to discuss the directory structure going forward for when others implement their own predictors? Do we need a separate branchpred directory under core?

Let us address this in a new PR when we add other branch predictors.

@arupc-vmicro
Copy link
Collaborator Author

Addressed review comments. Please take a look and approve, if looks good. I need at least one approval for merge.

core/BranchPredIF.hpp Show resolved Hide resolved
bool actually_taken = false;
};

class DefaultInput
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't feel right to pollute the olympia namespace with a fairly generically named class "DefaultInput". It's not obvious that this is related to branch predictors, let alone that it's used for the SimpleBranchPredictor.

Goes for the others DefaultUpdate/DefaultPrediction as well?

Can't we place them under SimpleBranchPredictor? Or name them SimpleBranchPredictorUpdate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a separate namespace for BranchPredictor within olympia.

@arupc-vmicro
Copy link
Collaborator Author

As a reminder, if there is no further comment, please approve so that we can merge this.

@arupc-vmicro arupc-vmicro requested review from arupc and removed request for knute-mips and oms-vmicro March 25, 2024 02:36
@arupc arupc merged commit 8d187b2 into master Mar 25, 2024
4 checks passed
@arupc arupc deleted the arupc-vmicro/branch_pred_api branch March 25, 2024 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants