Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REFACTOR] Eliminate duplication of required input / output List/Set #535

Open
dbwiddis opened this issue Feb 22, 2024 · 7 comments
Open
Labels
good first issue Good for newcomers

Comments

@dbwiddis
Copy link
Member

Is your feature request related to a problem?

The WorkflowValidatorTests class contains a lot of repetitive code that could be refactored to be more general.

This test class was originally written to validate the JSON file. Following #523 we now have an Enum and easier access to programmatically perform these tests, so it can be simplified.

What solution would you like?

The manual creation of a map here can be reduced to a single statement, which I've done as part of #530 .

The Input/Output size checks here could be improved.

Taking one entry as an example:

assertTrue(validator.getWorkflowStepValidators().keySet().contains("create_connector"));
assertEquals(7, validator.getWorkflowStepValidators().get("create_connector").getInputs().size());
assertEquals(1, validator.getWorkflowStepValidators().get("create_connector").getOutputs().size());
  1. The repeated string "create_connector" corresponds to the class name and should be replaced by CreateConnectorStep.NAME.
  2. There's no need to check the presence of the key in the keyset, the next line will throw an NPE when get() returns null and we try to get inputs or outputs.
  3. We should refactor to avoid duplication of the required inputs. Currently each workflow step defines its required inputs inline in the code (as a Set<String>) and separately in the WorkflowStepValidator class (as a List<String>). This duplication should be removed by making the collection a class field with a getter similarly to how NAME is declared.
  4. We don't currently define the outputs in the WorkflowStep implementation classes, but we should do so to keep that declaration closer to where we actually populate the WorkflowData when the step is complete.

What alternatives have you considered?

Leave the code as-is and keep adding to it.

Do you have any additional context?

https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

@dbwiddis dbwiddis added good first issue Good for newcomers untriaged labels Feb 22, 2024
@ajayl83
Copy link

ajayl83 commented Feb 22, 2024

@dbwiddis can i pick this issue?

@ajayl83
Copy link

ajayl83 commented Feb 22, 2024

@dbwiddis can i pick this issue?

i do see other strings are also repeated! and can be as well refactored.

@dbwiddis
Copy link
Member Author

dbwiddis commented Feb 22, 2024

Hey @ajay83 go for it!

@ajayl83
Copy link

ajayl83 commented Feb 22, 2024

@dbwiddis can you kindly elaborate more about point 3 , as per point 3 getInputs currently returns List.copyOf(inputs) and this should be class field as NAME? same for the output.

@dbwiddis
Copy link
Member Author

@dbwiddis can you kindly elaborate more about point 3 , as per point 3 getInputs currently returns List.copyOf(inputs) and this should be class field as NAME? same for the output.

The NAME field is an example: a public static field.

Currently we just do a temporary assignment of required inputs / keys with this assignment (and similar in every step):

Set<String> requiredKeys = Set.of(
NAME_FIELD,
DESCRIPTION_FIELD,
VERSION_FIELD,
PROTOCOL_FIELD,
PARAMETERS_FIELD,
CREDENTIAL_FIELD,
ACTIONS_FIELD
);

Elsewhere when we validate the template we are independently creating a list of exactly the same inputs:

CREATE_CONNECTOR(
CreateConnectorStep.NAME,
List.of(NAME_FIELD, DESCRIPTION_FIELD, VERSION_FIELD, PROTOCOL_FIELD, PARAMETERS_FIELD, CREDENTIAL_FIELD, ACTIONS_FIELD),

My suggestion is to change the requiredKeys assignment to a (public static) class variable for the step, and access it from the WorkflowSteps enum instead of recreating it.

Point 4 would do the same thing for the outputs. They are not currently defined in each workflow step, but we should define them there, as the eventual assignment happens there and it's easier to make sure they are in alignment (we had a recent issue where we discovered a mismatch much later.) This may be a bit more complicated in the cases that we look it up from the resource enum.

Map.ofEntries(Map.entry(resourceName, mlCreateConnectorResponse.getConnectorId())),

@dbwiddis
Copy link
Member Author

dbwiddis commented Mar 8, 2024

Hey @ajayl83 how's it going with this issue? Need any help?

@dbwiddis
Copy link
Member Author

Hey @ajayl83 I'm going to unassign this to let someone else work on it. If you are around and want to re-claim it just let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants