Time axes of RecurrentBlock #1821

patins1 · 2022-07-21T12:50:04Z

patins1
Jul 21, 2022

Hi, I try to learn recurrent networks, and a RecurrentBlock expects a (Batch, Time, Channel) shape as input.
My question is about how data for different time steps are organized. DIVE INTO DEEP LEARNING chapter 8.6 states

Besides, the updated hidden state (stateNew) returned by rnnLayer refers to the hidden state at the last time step of the minibatch. It can be used to initialize the hidden state for the next minibatch within an epoch in sequential partitioning.

According to this, I would think that each index of the Batch axes represents a different time step and the highest index is the latest point in time of the batch and its hidden state is inputed to the subsequent batch at index 0. So what does the second axes, "Time", encode, isn't (Batch, Channel) enough?

A side question: Items of a batch are usually processed in parallel for neural nets, but with the above interpretation of the shape of a RecurrentBlock, are they processed in sequence to as one sample needs the output of the previous sample as input?

zachgk · 2022-07-25T19:51:25Z

zachgk
Jul 25, 2022
Maintainer

The Time axis represents different time steps and a single sequence would be tc. Then, if you combine multiple sequences (like a minibatch for training), you get a batch of sequences ntc (n is batches).

With a recurrent network, it processes a sequence one element at a time (in sequence) because each samples depends on the output of the previous sample. But, you would use a batch to process multiple sequences in parallel

2 replies

patins1 Jul 25, 2022
Author

I see, so assuming we predict the weather, with 4 readings per hour (current temperature, current humidity, wind speed, rain) and the output is the probability of rain for the respective next hour. So we could probably choose t=24 to capture the meteorologic specifics of daylight/night influence. Now if we want to have the previous day to influence the prediction on the next day (which is kind of important as the first hour of the day would not know if rainy conditions are building up or declining), do we have to choose n=1 (so one sequence is processed at a time so its output can have influence on the next day) so totally having the shape (1, 24, 4)? Do I have to use optReturnState() in the LSTM's builder to make the LSTM "stateful" so to pass the state to the next sequence?

What would actually happen if we chose (7, 24, 4) as state, so the sequences within the week are processed independent from each other, but altogether influence the prediction in next week? It seems a bit for me that this wouldn't make much sense, so should always n=1 for stateful LSTMs?

zachgk Jul 26, 2022
Maintainer

I think of the state is a sort of accumulation. If the model is designed to receive something per hour, then you would have to accumulate state for each hour in the previous day to pass onto the next day. And, you would need to use the optReturnState().

I tend to think of the batches more for datasets with many fixed-size sequences like text or audio, rather than time series. Then, the batches would all be independent. I actually don't know as much about how time series models are trained, so I would need to look into it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time axes of RecurrentBlock #1821

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Time axes of RecurrentBlock #1821

patins1 Jul 21, 2022

Replies: 1 comment · 2 replies

zachgk Jul 25, 2022 Maintainer

patins1 Jul 25, 2022 Author

zachgk Jul 26, 2022 Maintainer

patins1
Jul 21, 2022

Replies: 1 comment 2 replies

zachgk
Jul 25, 2022
Maintainer

patins1 Jul 25, 2022
Author

zachgk Jul 26, 2022
Maintainer