Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rom dataobj #2051

Open
wants to merge 7 commits into
base: devel
Choose a base branch
from
Open

Fix rom dataobj #2051

wants to merge 7 commits into from

Conversation

dylanjm
Copy link
Collaborator

@dylanjm dylanjm commented Feb 1, 2023


Pull Request Description

What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)

#731

What are the significant changes in functionality due to this change request?

Implements changes found in #1718

Allows the option to pass training data sets directly to ROM SupervisedLearning algorithms rather than converting everything to dictionaries.

A flag is used to allow the SVL to self-identify whether it needs legacy training (dictionaries) or can handle training via DataSet.


For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • 1. Review all computer code.
  • 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
  • 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
  • 4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
  • 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
  • 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
  • 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
  • 8. If an analytic test is changed/added is the the analytic documentation updated/added?
  • 9. If any test used as a basis for documentation examples (currently found in raven/tests/framework/user_guide and raven/docs/workshop) have been changed, the associated documentation must be reviewed and assured the text matches the example.

@moosebuild
Copy link

Job Mingw Test on 7679bed : invalidated by @joshua-cogliati-inl

@dylanjm dylanjm requested a review from wangcj05 February 2, 2023 16:28
Copy link
Collaborator

@wangcj05 wangcj05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the comments I provided inside the code, I have the following comments:

  1. It is not clear to me how to utilize the DataSet directly as training input, I do not see an example, this may be because I do not familiar with TSA module, could you explain it?
  2. I do not see updated test or new test to check the proposed modifications. Is it checked in the existing TSA tests?

else:
# TODO: The following check may need to be moved to Dummy Class -- wangc 7/30/2018
if type(trainingSet).__name__ != 'dict' and trainingSet.type == 'HistorySet':
if type(trainingSet) != dict and trainingSet.type == 'HistorySet':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, could you add a description to list all possible data structures for trainingSet?
Second, could you add checks for different data structures for trainingSet?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a specific check for history set alignment, right? I don't know if we need to find out all the different approaches to ROMs within this PR, do we? This sounds like a bigger issue.


self._replaceVariablesNamesWithAliasSystem(self.trainingSet, 'inout', False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check to see if this line works with your proposed data structure? In Model.py, this method only accept dict or list as input.

Comment on lines +220 to +352
if self.needsDictTraining:
self.trainOnDictionary(trainingData, indexMap)
else:
self.amITrained = True
self.muAndSigmaFeatures = dict((f, (0,1)) for f in self.features)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines is not clear to me. When dataset is needed, I do not see a training process for the ROM. Could you explain it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, was a line missed from the old PR? If I recall correctly, we were directly overloading the "train" method if self.needsDictTraining is False.

@@ -239,15 +255,15 @@ def train(self, tdict, indexMap=None):
for feat in self.features:
for index in indexMap.get(feat, []):
if index not in needFeatures and index not in needTargets:
needFeatures.append(feat)
needFeatures.append(index)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an explanation here for the change?

@moosebuild
Copy link

Job Mingw Test on f4edc15 : invalidated by @joshua-cogliati-inl

computer rebooted

if oldName in sampledVars:
value = sampledVars.pop(oldName)
sampledVars[newName] = value
elif isinstance(sampledVars, list):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize originalVariables is a deepcopy of sampledVars, but it would be nice if this set of if isinstance checked on the same variable instead of the two different ones.

@moosebuild
Copy link

Job Test qsubs sawtooth on 76a9732 : invalidated by @joshua-cogliati-inl

timed out in Test Plugins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants