Regarding scaling of data #5

KarthikaKP · 2019-01-06T06:23:52Z

I have seen that standardscaler.fit(X) is being used which which scale the entire data.But the usual practice is to fit on the training data and apply the same mean on testing and validation data set.I am new to this feild and doesnt know how to preprocess time series data.Kindly reply

Seanny123 · 2019-01-07T02:37:24Z

You are absolutely correct and this is an embarrassing mistake which should be corrected.

mikel-brostrom · 2020-07-28T08:32:29Z

Ill leave this piece of code here if somebody needs to solve this issue
and want to reuse the output scaler to inverse transform the predictions:

`def preprocess_data(dat, col_names, train_percentage):

# read dataset. Shape: (40560, 82)
proc_dat = dat.to_numpy()  

# create one dedicated scaler for the input data 
# and one for the output data
in_data_scaler = MinMaxScaler() 
out_data_scaler = MinMaxScaler()

# separate target from features: (40560, 1) | (40560, 81)
mask = np.ones(proc_dat.shape[1], dtype=bool)
dat_cols = list(dat.columns)
for col_name in col_names:
    mask[dat_cols.index(col_name)] = False

feats = proc_dat[:, mask]
targs = proc_dat[:, ~mask]

# fit the scalers on train set only
train_size = int(train_percentage * len(dat))
in_data_scaler.fit(feats[0:train_size - 1, :])
out_data_scaler.fit(targs[0:train_size - 1, :])

# transform features and targets for model training
feats = in_data_scaler.transform(feats)
targs = out_data_scaler.transform(targs)

return feats, targs, in_data_scaler, out_data_scaler`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding scaling of data #5

Regarding scaling of data #5

KarthikaKP commented Jan 6, 2019

Seanny123 commented Jan 7, 2019

mikel-brostrom commented Jul 28, 2020

Regarding scaling of data #5

Regarding scaling of data #5

Comments

KarthikaKP commented Jan 6, 2019

Seanny123 commented Jan 7, 2019

mikel-brostrom commented Jul 28, 2020