Isolate StandardScalar from load dataset #39

akashshah59 · 2021-06-21T02:44:48Z

The current implementation of data.py/load_dataset() instantiates a standard scaler by default.

def load_dataset(dataset_dir, batch_size, val_batch_size=None, test_batch_size=None):
    if val_batch_size is None:
        val_batch_size = batch_size

    if test_batch_size is None:
        test_batch_size = batch_size

    data = {}

    for category in ["train", "val", "test"]:
        cat_data = np.load(os.path.join(dataset_dir, category + ".npz"))
        data["x_" + category] = cat_data["x"]
        data["y_" + category] = cat_data["y"]

    scaler = StandardScaler(data["x_train"][..., 0])

    for category in ["train", "val", "test"]:
        data["x_" + category][..., 0] = scaler.transform(data["x_" + category][..., 0])
        data["y_" + category][..., 0] = scaler.transform(data["y_" + category][..., 0])

    data_train = PaddedDataset(batch_size, data["x_train"], data["y_train"])
    data["train_loader"] = DataLoader(data_train, batch_size, shuffle=True)

    data_val = PaddedDataset(val_batch_size, data["x_val"], data["y_val"])
    data["val_loader"] = DataLoader(data_val, val_batch_size, shuffle=False)

    data_test = PaddedDataset(test_batch_size, data["x_test"], data["y_test"])
    data["test_loader"] = DataLoader(data_test, test_batch_size, shuffle=False)

    data["scaler"] = scaler
    return data

The goal is to be able to isolate the scalar from the data loading method, and support more scalars eventually.

The text was updated successfully, but these errors were encountered:

akashshah59 · 2021-06-22T02:51:50Z

@klane and @yuqirose Shouldn't scalar be a part of data preprocessing and not part of the model?

Currently, our forward step has a scaler implemented, however, does it make sense to have it defined outside the model step function by the user?

    def _step(self, batch, batch_idx, num_batches):
        x, y = self.prepare_batch(batch)

        if self.training:
            batches_seen = batch_idx + self.current_epoch * num_batches
        else:
            batches_seen = batch_idx

        pred = self(x, y, batches_seen)

        if self.scaler is not None:
            y = self.scaler.inverse_transform(y)
            pred = self.scaler.inverse_transform(pred)

akashshah59 added the enhancement New feature or request label Jun 21, 2021

akashshah59 self-assigned this Jun 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isolate StandardScalar from load dataset #39

Isolate StandardScalar from load dataset #39

akashshah59 commented Jun 21, 2021

akashshah59 commented Jun 22, 2021 •

edited

Loading

Isolate StandardScalar from load dataset #39

Isolate StandardScalar from load dataset #39

Comments

akashshah59 commented Jun 21, 2021

akashshah59 commented Jun 22, 2021 • edited Loading

akashshah59 commented Jun 22, 2021 •

edited

Loading