Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformation sequence #356

Open
noahboerger opened this issue Sep 4, 2024 · 4 comments
Open

Transformation sequence #356

noahboerger opened this issue Sep 4, 2024 · 4 comments

Comments

@noahboerger
Copy link
Collaborator

noahboerger commented Sep 4, 2024

Currently, Trevas only supports writing transformations in the sequence in which these are executed. According to the reference manual 2.0 (line 294 ff.) "Not necessarily Transformations need to be written in sequence like a classical software program, in fact they are associated to the Artefacts they calculate, like it happens in the spreadsheets (each spreadsheet’s formula is associated to the cell it calculates)"

Here is an example that is currently working fine in Trevas:

tmp := ds1[filter val > 3];
res := tmp[calc additionalField := 10];

But it should als be running without errors when switching the statements:

res := tmp[calc additionalField := 10];
tmp := ds1[filter val > 3];

Currently it fails with the following exception:

Occured error

Exception

fr.insee.vtl.engine.exceptions.UndefinedVariableException: undefined variable tmp
                  at fr.insee.vtl.engine.visitors.expression.VarIdVisitor.visitVarID(VarIdVisitor.java:42)
                  at fr.insee.vtl.engine.visitors.expression.VarIdVisitor.visitVarID(VarIdVisitor.java:23)
                  at fr.insee.vtl.parser.VtlParser$VarIDContext.accept(VtlParser.java:9572)
                  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
                  at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitVarID(ExpressionVisitor.java:106)
                  at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitVarID(ExpressionVisitor.java:41)
                  at fr.insee.vtl.parser.VtlParser$VarIDContext.accept(VtlParser.java:9572)
                  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:46)
                  at fr.insee.vtl.parser.VtlBaseVisitor.visitVarIdExpr(VtlBaseVisitor.java:48)
                  at fr.insee.vtl.parser.VtlParser$VarIdExprContext.accept(VtlParser.java:478)
                  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
                  at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitClauseExpr(ExpressionVisitor.java:355)
                  at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitClauseExpr(ExpressionVisitor.java:41)
                  at fr.insee.vtl.parser.VtlParser$ClauseExprContext.accept(VtlParser.java:683)
                  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
                  at fr.insee.vtl.engine.visitors.AssignmentVisitor.visitAssignment(AssignmentVisitor.java:51)
                  at fr.insee.vtl.engine.visitors.AssignmentVisitor.visitTemporaryAssignment(AssignmentVisitor.java:59)
                  at fr.insee.vtl.parser.VtlParser$TemporaryAssignmentContext.accept(VtlParser.java:372)
                  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
                  at fr.insee.vtl.engine.VtlScriptEngine.evalStream(VtlScriptEngine.java:263)
                  at fr.insee.vtl.engine.VtlScriptEngine.eval(VtlScriptEngine.java:282)
                  at java.scripting/javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:262)
                  at fr.insee.trevas.jupyter.VtlKernel.eval(VtlKernel.java:305)
                  at io.github.spencerpark.jupyter.kernel.BaseKernel.handleExecuteRequest(BaseKernel.java:334)
                  at io.github.spencerpark.jupyter.channels.ShellChannel.lambda$bind$0(ShellChannel.java:64)
                  at io.github.spencerpark.jupyter.channels.Loop.lambda$new$0(Loop.java:21)
                  at io.github.spencerpark.jupyter.channels.Loop.run(Loop.java:78)

@NicoLaval
Copy link
Collaborator

Hi @noahboerger,

I reproduced without error.

Are you sure you had tmp datatset into bindings?

@noahboerger
Copy link
Collaborator Author

noahboerger commented Sep 4, 2024

Hi @NicoLaval,

No i only had ds1 datatset into bindings.

The example by the reference manual (l. 303 f) for the behaviour is

DS_p <- if DS_np >= 0 then DS_np else DS_1 ;
DS_np := ( DS_1 - DS_2 ) * 2 ;

with the additional information "DS_1 and DS_2 are input Data Sets, DS_np is a non persistent result, DS_p is a persistent result" (l. 280)

So out of my point of view the example i have tested with is similar to the one provided in the reference manual. The execution order of the statements should then be adjusted by the engine itself as pointed out in the reference manual (l. 298 ff.): "... not necessarily the Transformations are performed in the same order as they are written, because the order of execution depends on their input-output relationships ..."

@hadrienk
Copy link
Collaborator

hadrienk commented Sep 4, 2024

Although this could be implemented, I would question the utility of such a "variable commutativity".
Do you have a practical use case where this would be useful?

@noahboerger
Copy link
Collaborator Author

noahboerger commented Sep 5, 2024

Our current idea is to store multiple VTL rules that will be executed in a single engine run, separated from each other.
These rules may still have some interdependencies and may rely on intermediate results of other rules.
When the engine determines the calculation order by itself, as defined in the VTL standard, we do not need to consider the order when combining the rules into one script.

Additionally, I would not describe the calculation approach defined by the VTL standard as "variable commutativity." Since a result variable should always be final and cannot be overwritten, it should always be clear to the VTL user where a previous result is coming from. "... the VTL follows a functional programming paradigm, which treats computations as the evaluation of mathematical functions, so avoiding changing-state and mutable data in the specification of the calculation algorithm." (User Manual 2.0, l. 1643 ff.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants