Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use information of operand value to determine type for java locals #789

Draft
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

oxisto
Copy link

@oxisto oxisto commented Dec 31, 2023

This PR uses the already existing information of the operand's value type to determine the type for java locals. Previously, most of the stack / local variables were of "unknown" type.

I am not sure if this catches all the cases where type information about the operand are available, so feel free to add to this PR.

Fixes #635

@oxisto oxisto force-pushed the fix-unknown-locals branch 3 times, most recently from 58d745e to 726b6df Compare December 31, 2023 20:48
This PR uses the already existing information of the operand's value type to determine the type for java locals. Previously, most of the stack variables were of "unknown" type
.
Fixes soot-oss#635
Copy link
Collaborator

@swissiety swissiety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for your contribution, @oxisto !

  • We have to check whether / how this interferes with the BodyInterceptor that is used for typingTypeAssigner

@oxisto
Copy link
Author

oxisto commented Jan 2, 2024

I am currently trying to fix the test, the actual outcome after by patch is:

CastCounterDemos $l0,
Sub1 $stack3,
Super1[] $#l0, $#l1, $l1, $l2,
$l0 := @this: CastCounterDemos,
$l1 = newarray (Super1)[10],
$stack3 = new Sub1,
specialinvoke $stack3.<Sub1: void <init>()>(),
$#l0 = (Super1[]) $l1,
$#l0[0] = $stack3,
$#l1 = (Super1[]) $l1,
$l2 = $#l1[2],
return

But shouldn't the type of $l2 then be of Super1 instead of Super1[]? I am not an expert on Simple but I suspect that $l2 = $#l1[2] should return an index of $#l1 (which is of Super1[]), so it should be its element type, not the array type.

@oxisto
Copy link
Author

oxisto commented Jan 2, 2024

Update: This looks to me to be a bug in JArrayRef. The type is defined as the type of the base, but in my opinion it should be the base's element type. At least that would be consistent with other operands, for example in calls/invokes, where the "type" is the type of the returned function. Or is there another method I should use and the "type" property is something else entirely?

@Override
@Nonnull
public Type getType() {
return base.getType();
}

@oxisto
Copy link
Author

oxisto commented Jan 2, 2024

  • TypeAssigner

Unfortunately, it seems it does. Especially with casting, since the casted variable (which was previously unknown), now still has the type pre-cast.

@oxisto
Copy link
Author

oxisto commented Jan 2, 2024

OK it seems I misunderstood how things work. I was not using these body interceptors at all (they seem to be empty when using JavaClassPathAnalysisInputLocation with only the path argument).

After using BytecodeClassLoadingOptions.Default.bodyInterceptors I was able to get rid of all the unknowns after the type assigner, which is probably the better way of doing things rather than creating the variables with the types. But this is something you need to decide from a design point of view.

It would be good to add a remark to the doc about this or even specificy the BytecodeClassLoadingOptions.Default.bodyInterceptors in the default options when calling JavaClassPathAnalysisInputLocation without a body interceptor argument.

@swissiety
Copy link
Collaborator

1:

But shouldn't the type of $l2 then be of Super1 instead of Super1[]? I am not an expert on Simple but I suspect that $l2 = $#l1[2] should return an index of $#l1 (which is of Super1[]), so it should be its element type, not the array type.

yes $l2 should be the arrays element type ad not the array type itself. was the output from your change or from typeassigner?

2:

After using BytecodeClassLoadingOptions.Default.bodyInterceptors I was able to get rid of all the unknowns after the type assigner, which is probably the better way of doing things rather than creating the variables with the types. But this is something you need to decide from a design point of view.

direct assignment seems logical and maybe reduces the workload of the typpeassigner - that has to be investigated ;)

3:

It would be good to add a remark to the doc about this or even specificy the BytecodeClassLoadingOptions.Default.bodyInterceptors in the default options when calling JavaClassPathAnalysisInputLocation without a body interceptor argument.

as default option is/was the plan :)

@oxisto
Copy link
Author

oxisto commented Jan 2, 2024

1:

But shouldn't the type of $l2 then be of Super1 instead of Super1[]? I am not an expert on Simple but I suspect that $l2 = $#l1[2] should return an index of $#l1 (which is of Super1[]), so it should be its element type, not the array type.

yes $l2 should be the arrays element type ad not the array type itself. was the output from your change or from typeassigner?

It was from my change. It seems that the type resolver does not use the getType function of the operand, but rather has its own type lookup logic located here:

} else if (lhs instanceof JArrayRef) {
visit(((JArrayRef) lhs).getIndex(), PrimitiveType.getInt(), stmt);
ArrayType arrayType = null;
Local base = ((JArrayRef) lhs).getBase();
Type type_base = typing.getType(base);
if (type_base instanceof ArrayType) {
arrayType = (ArrayType) type_base;
} else {
if (rhs instanceof Local) {
Type type_rhs = typing.getType((Local) rhs);
// if base type of lhs is an object-like-type, retrieve its base type from array
// allocation site.
if (type_base != null
&& (Type.isObjectLikeType(type_base)
|| (Type.isObject(type_base) && type_rhs instanceof PrimitiveType))) {
Map<LValue, Collection<Stmt>> defs = Body.collectDefs(graph.getNodes());
Collection<Stmt> defStmts = defs.get(base);
boolean findDef = false;
if (defStmts != null) {
for (Stmt defStmt : defStmts) {
if (defStmt instanceof JAssignStmt) {
Value arrExpr = ((JAssignStmt) defStmt).getRightOp();
if (arrExpr instanceof JNewArrayExpr) {
arrayType = (ArrayType) arrExpr.getType();
findDef = true;
break;
} else if (arrExpr instanceof JNewMultiArrayExpr) {
arrayType = ((JNewMultiArrayExpr) arrExpr).getBaseType();
findDef = true;
break;
}
}
}
}
if (!findDef && type_rhs != null) {
arrayType = Type.createArrayType(type_rhs, 1);
}
}
}
if (arrayType == null && type_base != null) {
arrayType = Type.createArrayType(type_base, 1);
}
}
if (arrayType == null) {
return;
}
type_lhs = arrayType.getElementType();

2:

After using BytecodeClassLoadingOptions.Default.bodyInterceptors I was able to get rid of all the unknowns after the type assigner, which is probably the better way of doing things rather than creating the variables with the types. But this is something you need to decide from a design point of view.

direct assignment seems logical and maybe reduces the workload of the typpeassigner - that has to be investigated ;)

I can continue working on this PR if you want - even though my original problem was actually "solved" by just activating the type assigner. The only issue the PR has remaining is the one described above with the casts - although I am not sure if I can solve this or someone from the soot team would need to take over.

The question would be: Is there really a valid reason not to use the type assigner?

3:

It would be good to add a remark to the doc about this or even specificy the BytecodeClassLoadingOptions.Default.bodyInterceptors in the default options when calling JavaClassPathAnalysisInputLocation without a body interceptor argument.

as default option is/was the plan :)

Sounds reasonable :)

@swissiety
Copy link
Collaborator

TODO

  • check if this (i.e. direct assignment of types) leads to a similar problem that arised in an earlier version of the typeassigner algorithm

@stschott stschott marked this pull request as draft September 5, 2024 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

check LocalVariableNode.signature to avoid creating Locals with unknown Type
2 participants