Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: SARP Core Components #1842

Closed
wants to merge 8 commits into from
Closed

proposal: SARP Core Components #1842

wants to merge 8 commits into from

Conversation

masapr
Copy link
Contributor

@masapr masapr commented Jul 12, 2023

Project Abstract

This is the follow up to our initial research proposal, that we delivered here. The goal of this work package is to evaluate different software designs to implement a static code analysis on substrate pallets with MIRAI. Furthermore we want to investigate issues and bugs, we found in MIRAI in the previous work package.

Pull requests of the previous grant:

Grant level

  • Level 1: Up to $10,000, 2 approvals
  • Level 2: Up to $30,000, 3 approvals
  • Level 3: Unlimited, 5 approvals (for >$100k: Web3 Foundation Council approval)

Application Checklist

  • The application template has been copied and aptly renamed (project_name.md).
  • I have read the application guidelines.
  • Payment details have been provided (bank details via email or BTC, Ethereum (USDC/DAI) or Polkadot/Kusama (USDT) address in the application).
  • The software delivered for this grant will be released under an open-source license specified in the application.
  • The initial PR contains only one commit (squash and force-push if needed).
  • The grant will only be announced once the first milestone has been accepted (see the announcement guidelines).
  • I prefer the discussion of this application to take place in a private Element/Matrix channel. My username is: @_______:matrix.org (change the homeserver if you use a different one)

@CLAassistant
Copy link

CLAassistant commented Jul 12, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@Noc2 Noc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the application. Let me also share it with @bhargavbh and @semuelle, who previously evaluated your delivery. I have one initial feedback. The milestone deliveries are currently relatively unspecific. You could fulfill these relatively quickly. Could you be more specific here? Especially since this is already a follow-up grant. The milestone table basically lists the requirements of our contracts and should contain as many details as possible.

@Noc2 Noc2 requested review from semuelle and bhargavbh July 13, 2023 08:18
Copy link
Contributor

@bhargavbh bhargavbh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the follow-up grant should clearly have a tool (albeit with limitations) as a deliverable. A follow-up grant on ideating solutions without a tool that generates the MIRAI annotations for certain bug-classes is not the best way forward. The tool can assume certain human-in-the-loop model if it makes tricky cases easier to handle. This would involve tweaks on the MIRAI side to make the analysis faster by tracking only the pallet-level dataflow graph.
Feel free to add more technical details in the milestones regarding the approach.

applications/sarp-2-swdesign.md Outdated Show resolved Hide resolved
applications/sarp-2-swdesign.md Outdated Show resolved Hide resolved
applications/sarp-2-swdesign.md Outdated Show resolved Hide resolved
@Noc2 Noc2 added the changes requested The team needs to clarify a few things first. label Jul 13, 2023
@masapr
Copy link
Contributor Author

masapr commented Jul 13, 2023

Thanks for the feedback

I updated the deliverables to be more concrete. Also I included the delivery of a prototype tool.

I didn't incorporate all of your feedback @bhargavbh. You're giving very valuable input. My plan was to evaluate these things, when we actually do the project and not beforehand. Is that ok for you?

@bhargavbh
Copy link
Contributor

hi @masapr thank you for making the changes. I think the tool should be the main deliverable and list the vulnerability classes that it supports (generating the MIRAI annotations automatically), which should be demonstrated by a decent number of examples (ideally real-world but synthetic examples/tests would suffice). The analysis of approaches would just be a step towards the tool. Could you please restructure the deliverables to reflect this prioritisation by adding more details to what the tool does (there is ofcourse room to modify how its achieved during the grant).
Secondly, its not convincing (from a usability point-of-view) to hope that the runtime pallets will not use the FRAME macros. Hence, its better to focus on an approach where the tool works for FRAME macros.

@masapr
Copy link
Contributor Author

masapr commented Jul 17, 2023

Hi @bhargavbh

Ok, I became more concrete now on the examples and we'll provide examples for 3 of the 5 vulnerability classes (of the original RFP). I stated this explicitly in the deliverables now.

Also, I updated the approaches for the software design to include your suggestions, especially the preprocessing/annotation tool. I think it makes sense to look into this first. I still believe, though, that we have to figure out a way for the analysis to not run over specific parts of the code, simply to reduce the complexity. I don't know if this can be achieved by annotations.

To me, the main deliverable is not the tool, but the documentation on the software design. The real tool, will be delivered later. Here, we only show the feasibility of the approach.

semuelle
semuelle previously approved these changes Jul 28, 2023
Copy link
Member

@semuelle semuelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some small improvement suggestions in the evaluation of the last milestone. Would be nice to see them addressed. Other than that, happy to proceed.

@semuelle semuelle added ready for review The project is ready to be reviewed by the committee members. changes requested The team needs to clarify a few things first. and removed changes requested The team needs to clarify a few things first. labels Jul 28, 2023
@bhargavbh
Copy link
Contributor

Hi @semuelle Sabine and i had a chat recently on the technical feasibility of the proposal. There is still some high-level exploring and design decisions to be made, which needs to be added as technical details to the proposal. Also the main deliverable should be a tool and not a research report as tool is of more utility to the community. Can you please revert to "changes requested" status? Thanks

@semuelle semuelle added changes requested The team needs to clarify a few things first. and removed ready for review The project is ready to be reviewed by the committee members. labels Jul 28, 2023
@semuelle semuelle self-assigned this Jul 28, 2023
@Noc2
Copy link
Collaborator

Noc2 commented Aug 9, 2023

@masapr I just wanted to follow up and ask about its current status.

@masapr
Copy link
Contributor Author

masapr commented Aug 10, 2023

@Noc2 yes, I can give an update: After the call with @bhargavbh our aim is a proposal with a clear tool delivery, together with asking for more funding. To make this proposal, we decided to investigate a bit more to reduce the risk and to be able to make a serious estimate. I should be ready with an adjusted proposal in 1-2 weeks.

It also depends on the outcome of the question I asked at MIRAI

@Noc2 Noc2 added the on hold There is an external blocker, such as another grant in progress. label Aug 11, 2023
@Noc2
Copy link
Collaborator

Noc2 commented Aug 11, 2023

Thanks for the update. In this case, I put the application on hold for now.

@masapr masapr force-pushed the master branch 2 times, most recently from 3d704f5 to b6eebb8 Compare August 22, 2023 07:30
@masapr masapr changed the title proposal: SARP software design proposal: SARP Core Components Aug 22, 2023
@masapr
Copy link
Contributor Author

masapr commented Aug 23, 2023

Hi @Noc2
Thanks for your feedback. I updated the proposal to:

  • include an example in the tutorial
  • I removed the "Engagement" point. This was a left-over from the previous version. I think with this proposal the usual engagement alongside the milestones is enough.
  • I added the ink! smart contracts in the future roadmap. We haven't analyzed this yet, but you're right, this has a great potential.

@semuelle
Copy link
Member

Thanks for the updates, @masapr. I'll share the application with the rest of the committee.

@semuelle semuelle added ready for review The project is ready to be reviewed by the committee members. changes requested The team needs to clarify a few things first. and removed changes requested The team needs to clarify a few things first. labels Aug 25, 2023
Noc2
Noc2 previously approved these changes Aug 25, 2023
Copy link
Collaborator

@Noc2 Noc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to go ahead with it. Thanks for the update.

@bhargavbh
Copy link
Contributor

Hi @bhargavbh The plan is to mock the most basic functionality, induced by macros and storage. We do not plan to mock regular pallet functions. I don't have literature to point to. There's no more reason behind this approach, than just to reduce the complexity by reducing the complexity of the call graph, that is being analyzed.

Hi @masapr. I am skeptic about the usefulness of the proposed tool where the developer has to use mock-ups for basic functionalities. How trivial/difficult will this process be? Is it a matter of just replacing the dependencies? Or a tussle with the rust compiler to even get it compiled. It is hard to answer this question for any arbitrary pallet under analysis (which might use other basic functionalities than the provided mock-ups). It seems the problem statement is being adapted to match the tool (MIRAI) than the other way around. That said, if the committee is interested in funding exploratory approaches in the area of security tooling, this might be a good first effort and result in interesting insights.

Secondly, from my understanding, the main contribution of M1 is to write Mock-up for basic functionality at the right abstraction level which can be substituted in various pallets. This is something which could be clarified in the deliverables and also mention what pallets will be supported.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 28, 2023

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@masapr
Copy link
Contributor Author

masapr commented Aug 28, 2023

Hi @bhargavbh
I agree, that it is an open question how easy the process with the compiler will be. The idea is to have a fully automated process and that the user doesn't have to worry about the process. This is why we create a copy and then adjust the cargo.toml to point to the mock-libraries. I'm not entirely sure, if this approach is too naïve and we're overseeing something. But I think this could work and should give good results.

I also agree, that the result might still be, that the complexity is too high, because of dependencies (and the variety of those). But to be honest: then it is generally nonesense to do static code analysis on pallet code. The whole idea was, that pallet code can be analyzed, because it is not too complex in itself and doesn't have too many dependencies.

I updated the proposal text on the deliverables of milestone 1 & 2. Our plan is, to have SARP running against the examples, provided for the 2 vulnerability classes and to evaluate the tool against frame pallets. The actual adjustments needed for the frame pallets will then be done in milestone 3.

@masapr
Copy link
Contributor Author

masapr commented Aug 28, 2023

I have read and hereby sign the Contributor License Agreement.

@takahser takahser self-requested a review August 30, 2023 13:22
Copy link
Collaborator

@takahser takahser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @bhargavbh here - requiring the developer to add annotations manually sounds doesn't sound very straight-forward. The interest of using a solution like this might be rather low. Hence, I don't see much value in M1.

Apart from that, I'm also concerned about the limitations that mocking the lib code brings with it.

@masapr
Copy link
Contributor Author

masapr commented Sep 8, 2023

@takahser we don't want developers to add annotations manually. We cover that aspect in milestone 2.

@masapr
Copy link
Contributor Author

masapr commented Sep 14, 2023

@takahser to add to my comment above: The value of M1 is to lay the foundation for the final tool and to know if the approach of mocking part of the library code actually delivers the results we hope for.

@bhargavbh
Copy link
Contributor

Hi @bhargavbh I agree, that it is an open question how easy the process with the compiler will be. The idea is to have a fully automated process and that the user doesn't have to worry about the process. This is why we create a copy and then adjust the cargo.toml to point to the mock-libraries. I'm not entirely sure, if this approach is too naïve and we're overseeing something. But I think this could work and should give good results.

Fair enough, worth giving it a shot. Just changing the dependency might work. However, the important question again is how comprehensive are the mocks written by you and in how many pallets they can be substituted (with the mockups) to reduce the complexity.

I also agree, that the result might still be, that the complexity is too high, because of dependencies (and the variety of those). But to be honest: then it is generally nonesense to do static code analysis on pallet code. The whole idea was, that pallet code can be analyzed, because it is not too complex in itself and doesn't have too many dependencies.

It isa bit harsh to conclude static analysis on pallet code is nonsense because a particular tool times out. If the tool/technique times out, then generally the approach is to try and switch to different levels of abstraction to reduce complexity where the property (in our case incorrect origins) can still be expressed.

I updated the proposal text on the deliverables of milestone 1 & 2. Our plan is, to have SARP running against the examples, provided for the 2 vulnerability classes and to evaluate the tool against frame pallets. The actual adjustments needed for the frame pallets will then be done in milestone 3.

This basically highlights my point that the mockups you write are not universal. They have to be designed for each pallet. Writing mockups to bypass all the complex functionalities of pallets is not possible because you do not know what functionality a (new) pallet uses. At best this approach can described as the team writing mockups for the 5 pallet and running MIRAI on it, and not really considered a tool. There is very high likelihood that the mockups you write in M1 are not applicable for a new pallet. M2 however can be seen as a tool in itself. If and when there is possibility of running an analyser, the auto-generator for annotations can be plugged-in.

Copy link
Collaborator

@takahser takahser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masapr I currently see no value in M1, since by itself it's useless. In theory, you could just deliver M1 and cash-in USD 47k without ever delivering M2. Hence, I'm not going to approve M1.
Also, I'd like to hear your perspective on @bhargavbh's most recent comment.

@masapr
Copy link
Contributor Author

masapr commented Sep 14, 2023

This basically highlights my point that the mockups you write are not universal. They have to be designed for each pallet. Writing mockups to bypass all the complex functionalities of pallets is not possible because you do not know what functionality a (new) pallet uses. At best this approach can described as the team writing mockups for the 5 pallet and running MIRAI on it, and not really considered a tool. There is very high likelihood that the mockups you write in M1 are not applicable for a new pallet. M2 however can be seen as a tool in itself. If and when there is possibility of running an analyser, the auto-generator for annotations can be plugged-in.

Maybe this wasn't clear: we write mocks for the functions introduced by the pallet macros and the most basic functionality of pallets, like using storage elements. There is no plan (currently) to mock all possible libraries or pallets, someone might use. To make it more tangible, we commit ourselves to at least 5 working pallets from the frame pallets.

It isa bit harsh to conclude static analysis on pallet code is nonsense because a particular tool times out. If the tool/technique times out, then generally the approach is to try and switch to different levels of abstraction to reduce complexity where the property (in our case incorrect origins) can still be expressed.

Sorry, for being too harsh. There are two things I meant: 1. It was part of the original RFP: [Runtime Pallets] are usually concise pieces of standalone code with relatively few dependencies and clear specifications, hence tractable targets for performing static analysis and verification. So, this was always the assumption. 2. I honestly believe, that it is a huge problem to analyze very complex code. So if we introduce large complexity through libraries (and we cannot prevent the tool from analyzing the complex library), then we will always have a problem. No matter the level of abstraction. But I might underestimate the reduction of complexity we gain from choosing the right level of abstraction.

All in all: I believe our approach only works if the assumption:
[Runtime Pallets] are usually concise pieces of standalone code with relatively few dependencies and clear specifications, hence tractable targets for performing static analysis and verification.
holds true. If it holds true, we should be able to achieve a good analysis by mocking the most basic functions used in any pallet.

@masapr
Copy link
Contributor Author

masapr commented Sep 14, 2023

@masapr I currently see no value in M1, since by itself it's useless. In theory, you could just deliver M1 and cash-in USD 47k without ever delivering M2. Hence, I'm not going to approve M1. Also, I'd like to hear your perspective on @bhargavbh's most recent comment.

I could move parts from M1 to M2 and make M1 a smaller, pure research milestone with less deliverables. Then the damage is not as big, in case the approach doesn't work.

@bhargavbh
Copy link
Contributor

Hi @masapr
Would the mock functionalities be selected fixing the 5 pallets? Or the pallets be selected fixing the mock functionality? Either ways, how will this be used by a pallet developer who wants to analyse his newly written pallets.

At the fundamental level, the mockups are written to reduce the complexity of the analysis right? How are these functionalities identified. I would assume by looking at the MIR graph generated (on which MIRAI acts upon). Why not just collapse the complex functionality (which is inlined) directly at the MIR level. The whole exercise of writing mockups has the same effect, except the compressing/flattening of the graph happens at the source-code level, which requires additional replacement steps. By flattening, i mean collapsing the external inline call of the complex functionality and bypassing it.

Either ways, it may well work with 'mocking functionality' approach but i do not see how advantages of mockups approach. Both approaches require identifying the complex functionalities that time out, which is gathered at the MIR level, hence easier to rectify/patch at the MIR level.

As i said before, the mockups maybe interesting to get more insights into pallets static analysis and i would be happy to see the approach work. But some reasoning for the rational behind this approach in comparison to more conventional ones would be helpful. To summarise:

  • do i see direct utility of the tool in developer community?
    Probably no. if one is supposed to replace pallets with mock functionalities by changing dependencies, this is not going to be used much. Further, maintaining these mocks is even more difficult with ever changing functionality/pallet pairs.
  • is it a useful exercise which can provide insights and first attempt at static analysis?
    Probably yes.

@masapr
Copy link
Contributor Author

masapr commented Sep 19, 2023

I feel like the whole discussion got into a wrong direction. The mocks are just a technical detail to the implementation of the tool. We have no intend for users to write mocks themselves.
After analyzing how MIRAI works and talking with Herman, the main developer of MIRAI, we came to the conclusion to simply write mocks for the code, that we don't want to be analyzed. But this is happening in the background. The end-user should not see any of this.

We do want to deliver a useful tool here, and not perform an exercise. We chose the "5 frame pallets", because they are real world examples, that follow a certain standard. We want to give a first proof, that the tool is usable in real-world scenarios. (This is not perfect yet, of course, but a good starting point).

@masapr
Copy link
Contributor Author

masapr commented Sep 25, 2023

@bhargavbh Honestly, I really don't know what you expect from us. Do you want us to automatically identify on MIR-level, which parts of the code are complex and can be left out? I don't see, how we could identify those in a generic way. On the other hand, if we know, which parts we want to leave out from the analysis (explicitly, after analyzing which parts lead to a certain complexity), we might as well mock them.

Copy link
Contributor

@keeganquigley keeganquigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thorough answers @masapr and for the deep dive @bhargavbh I'm willing to go forward with it as well.

@semuelle
Copy link
Member

Hi @masapr. Thanks again for the application and thorough discussion. After internal debate, it's become clear that the application will not find the necessary support in the committee. I know that this isn't the news you were looking for, but I want to stress that we are very thankful for the insightful conversation and the work you put into the initial grant and application. I hope that this doesn't mean the end for the project, and that we will come together again in some capacity. In any case, best of luck going forward and feel free to apply again anytime.

@semuelle semuelle closed this Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review The project is ready to be reviewed by the committee members.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants