Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle hidden state reset? #577

Open
Babylonehy opened this issue Sep 28, 2024 · 4 comments
Open

How to handle hidden state reset? #577

Babylonehy opened this issue Sep 28, 2024 · 4 comments

Comments

@Babylonehy
Copy link

  1. I have a tensor like BLD input to Mamba, the output is also BLD. This is for training. I wanna to know every new Batch is given, does the hidden stats reset?
  2. for inference, I pass throught 1 image at once, how to keep the hidden state until the end of sequence. for new seq, how to manual reset hidden states?
@gkianfar
Copy link

@Babylonehy I have the same question about how the hidden stats reset is handled. Have you found any answers?

@Hprairie
Copy link
Contributor

Hprairie commented Nov 5, 2024

You can pass it in as different samples in a batch and the hidden state for each one will be kept different. You can think of it as essentially just doing a different scan for each sample in B for BLD, starting with a hidden state initialized to 0 for each of the samples in the batch.

@Babylonehy
Copy link
Author

You can pass it in as different samples in a batch and the hidden state for each one will be kept different. You can think of it as essentially just doing a different scan for each sample in B for BLD, starting with a hidden state initialized to 0 for each of the samples in the batch.

The training is fine, but the issue is that during online inference with a video sequence, I can’t obtain the full data for dimension B; I only have 1. However, I still want the hidden state to be passed across different frames within the same video sequence.

@Hprairie
Copy link
Contributor

Hprairie commented Nov 8, 2024

They have an inference mode, which will cache the hidden state for inference. I would take a look at InferenceParams, which will call this function in Mamba.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants