How does DeepAR validation work (in detail)? #3207

Serendipity31 · 2024-07-26T08:37:40Z

Serendipity31
Jul 26, 2024

The Issue - I want to understand the validation process in detail

I am struggling to feel confident that I have found the answers to several questions about how validation works in DeepAR (version 0.14.0). This post contains my questions, my attempts to work out the answers, and a hypothetical scenario to help make this all a bit more concrete. If anyone is able to take a look at any of these questions and check/correct my understanding, I would greatly appreciate it.

Hypothetical Scenario - Suppose for the sake of these questions that:

prediction_length = 1
context_length = 2
lags_seq = 3
batch_size = 20
The dataset used to train, validate, and test a DeepAREstimator has:
- 100 series, each of which have 50 data points
- No missing values
The train/validation/test division is as follows:
- train_data = full_data[:48]
- valid_data = full_data[:49]
- test_data = full_data[:50] # (i.e. the full data)

Question 1: When average validation loss is calculated, what values are included in the average loss calculation?

Option 1: Only the loss values associated with full_data[49] (i.e. 1 loss value per series x 100 series)
Option 2: The loss values associated with full_data[47:49] (i.e. including context_length --> 3 loss values per series x 100 series)
Option 3: Something else

Answer based on my current understanding: Option 2

Reasoning

Since the purpose of observing validation loss is to see how well the model is generalising to unseen data, it would seem like option 1 is the way it should be. And in this thread, Iostella's answer suggests the answer to my question is option 1.

However, the [definition] of validation_step() does not pass future_only = True to loss(). This means the default value (future_only = False) remains.

With nothing overriding future_only = False, when the the loss is calculated, the target values provided to the loss function are a concatenation of context_target and future_target_reshaped). The concatenation happens on line 569. The estimation of loss values happens in line 579. This really makes it seem like the answer to my question is actually option 2.

Is this correct?

Question 2: (Per epoch) For what number of series is validation loss calculated?

Option 1: All of the series, a single time (i.e. 1 loss value per series x 100 series)
Option 2: One time per num_batches_per_epoch (even if this means the validation loss for some series is calculated more than once)
Option 3: Something else

Answer based on my current understanding: Option 1

Reasoning

I think option 1 is both what should be happening and what is happening. However, I am struggling to understand create_validation_data_loader() [code], and would be greatful for someone to verify my understanding. Here is what I undestand about create_validation_data_loader()... Within create_validation_data_loader():

The data are sliced into conditioning and prediction intervals by _create_instance_splitter() [code]
- The exact way the data are sliced is governed by self.validation_sampler (for which the default is an instance of ValidationSplitSampler)
The sliced series are then batched and stacked through the call to as_stacked_batches() [code].
Accordingly, as_stacked_batches() returns an instance of IterableSlice
IterableSlice [code] takes two inputs: an iterable version of the dataset and num_batches_per_epoch (which will be either an int or the default of None). Because nothing explicitly establishes a new value for num_batches_per_epoch, the default remains. Therefore, in the hypothetical example, this would result in:
- 5 validation batches (each with 20 sliced series)
- Each series would show up in a single one of these batches
The trainer is a PyTorch Lightning trainer. This trainer has an argument limit_val_batches, but defaults to 1.0. Therefore, in my hypotehtical example, unless I were to explicltly over-ride this argument, whenever validation loss is calculated, it will be calculated using all 5 validation batches.

Is this an accurate description of events?

Question 3: After how many iterations is the validation loss calculated?

Option 1: At the end of each epoch (where an epoch is defined by the number given to num_batches_per_epoch)
Option 2: At the end of each iteration
Option 3: Something else

Answer based on my current understanding: Option 1

Reasoning

The PyTorch Lightning trainer has a 'flag' called check_val_every_n_epoch. This flag takes a default value of 1.0. Therefore, unless I were to explicitly over-ride this default (and this example I have not), I would expect that validation loss gets calculated at the end of each epoch.

Is this correct?

Question 4: When checking validation loss, how many times is the LSTM network unrolled?

Option 1: Fully (i.e. all the way back to the earliest time in the data) (i.e. 48 times in my example)
Option 2: The network is unrolled as many times as there are data points in context_length (i.e. 2 in my example)
Option 3: The network is unrolled as many times as there are data points in past_length (a concatenation of context_length and max(lags_seq) - i.e. 5 in my example)

Answer based on my current understanding: Option 2

Reasoning

Considering option 1
If I look at the definition of ValidationSplitSampler [code], it returns an instance of PredictionSplitSampler with min_past = 0 and min_future = prediction_length. When given a validation time series, the __call__ function in PredictionSplitSampler [code] returns the last time point for splitting (e.g. 48 in my example).

This then gets used within the InstanceSplitter that is returned by _create_instance_splitter [code]. Mote specifically, it gets used in flatmap_transform() [code]. This function uses the indices from the ValidationSplitSampler in a call to _split_instance() [code]. In turn, two objects (past_piece and future_piece) are returned from a call to _split_array() [code]. Within _split_array(), past_piece goes back past_length in time from the point where the data are sliced [code]. And past_length covers the part of the series made up from context_length and the max(lag_seq).

Therefore it seems like the validation batches do not include any data older than past_length, and so it seems like option 1 cannot be true. That leaves options 2 and 3.

Considering Option 2
The number of times a PyTorch LSTM network unrolls is determined by the sequence length of the input tensor. The input to the LSTM network in DeepAR is called rnn_input, and it's the first output from the prepare_rnn_input() function [code]. A closer look at prepare_rnn_input() shows that the first output is formed from the concatenation of lags and features [code]. This means that the sequence length of the input tensor that gets passed to the LSTM network must come from lags.

lags is the output of a call to lagged_sequence_values(), which takes as inputs self.lags_seq, prior_input, and input (i.e. the scaled context length) [see here]. The output from lagged_sequence_values() is a tensor [code], the shape of which depends on input (and not prior_input or past_length).

Therefore, although I am not 100% confident that I understand the full shape of rnn_input, I think this means that the sequence length passed to the LSTM network during validation is the same as context_length (which would make option 2 the answer to this question).

Question 5: (Given the answer to Q4), why is the network unrolled `context_length` times during validation, rather than making use of the whole available history of each series?

Option 1: Something to do with pragmatic use of computing resources
Option 2: A reason to do with what data structures are expected by differents parts of GluonTS that I have failed to notice/appreciate
Option 3: A conceptual/theoretical reason that I that I have failed to notice/appreciate

Answer based on my current understanding: Option 1

Reasoning

I have not managed to find any discussions of anything related to options 2 or 3, so it's more a process of elimination guess than it is a confident assertion based on a deep understanding of this issue.

Is this right? Have I missed something?

satyrmipt · 2024-09-28T20:39:23Z

satyrmipt
Sep 28, 2024

@Serendipity31, if you finally find the answers to any of your questions, please post it here. Very useful questions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does DeepAR validation work (in detail)? #3207

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How does DeepAR validation work (in detail)? #3207

Serendipity31 Jul 26, 2024

The Issue - I want to understand the validation process in detail

Hypothetical Scenario - Suppose for the sake of these questions that:

Question 1: When average validation loss is calculated, what values are included in the average loss calculation?

Reasoning

Question 2: (Per epoch) For what number of series is validation loss calculated?

Reasoning

Question 3: After how many iterations is the validation loss calculated?

Reasoning

Question 4: When checking validation loss, how many times is the LSTM network unrolled?

Reasoning

Question 5: (Given the answer to Q4), why is the network unrolled context_length times during validation, rather than making use of the whole available history of each series?

Reasoning

Replies: 1 comment

satyrmipt Sep 28, 2024

Serendipity31
Jul 26, 2024

Question 5: (Given the answer to Q4), why is the network unrolled `context_length` times during validation, rather than making use of the whole available history of each series?

satyrmipt
Sep 28, 2024