Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize #include #83

Open
stefnotch opened this issue Apr 16, 2024 · 17 comments
Open

Standardize #include #83

stefnotch opened this issue Apr 16, 2024 · 17 comments

Comments

@stefnotch
Copy link
Contributor

stefnotch commented Apr 16, 2024

I've noticed that quite a few projects have started using naga oil. 🎉 This has lead to reasonable feature requests in WGSL language servers and WGSL tools:

Would it be possible to fully document how the #include algorithm works? And, potentially fix any rough edges that it might have at the moment?

I would be willing to help with this. Is there a preferred place for communication? :)

@stefnotch
Copy link
Contributor Author

stefnotch commented Apr 16, 2024

As for the #include algorithm and syntax:

Ideally, we'd write up a specification, and define a WGSL extension for this. This it to avoid having include statements get interpreted differently by different tools.

// Tells language servers which extension is going to be used. 
// Theoretically, someone else could define a different wgsl importing extension, with a different name.
enable wgsl_imports;

 // Now this an unambiguous meaning
#import my_module;

See also gpuweb/gpuweb#4576 (comment) for how to not do it. (me complaining about GLSL tools)

Questions that aren't immediately clear to me from the readme:

  • What is a valid location for a #import? Only at the top of a file? Or inside functions?

    • What's the formal grammar? Should an import inside a comment be excluded? Should a comment be allowed before or inbetween an import? (/* */ #import my_module and #import /* */ my_module)
  • What would happen if there's any code before an #import. It shouldn't leak into the import, should it?

    • e.g. alias vec2f = u32; is probably valid WGSL code, and can very much screw up code that comes after it.
  • What exactly can be imported?

  • How does the generation of the final shader work? I assume a fair bit of of the following is going on

    • Parsing the shaders
    • Tracking the used types
    • Name mangling: Should have a documented algorithm

@robtfm
Copy link
Collaborator

robtfm commented Apr 16, 2024

i'm completely open to adding/exposing features to aid integration directly. i'm not very keen on writing a formal spec, but happy to describe the current implementation (and potentially change the implementation if a spec does evolve).

briefly:

  • What is a valid location for a #import?

anywhere is valid provided the import begins on its own line, but they are explicitly not scoped. imports are collected by the preprocessor and apply for the whole file, regardless of the "scope" of the import, whether it is declared before/after the usage, etc.

  • What's the formal grammar? Should an import inside a comment be excluded? Should a comment be allowed before or inbetween an import? (/* */ #import my_module and #import /* */ my_module)

comments are stripped before preprocessor declarations are parsed, so /* comment */ #import thing is a valid import, // #import thing is not.
a potential exception to this (in #81) is comments within quotes - it would be useful for bevy to treat comment markers in quotes as literals instead of as comments, and since quotes are otherwise unused in wgsl/glsl we are able to choose to specify that.

What would happen if there's any code before an #import. It shouldn't leak into the import, should it?

* e.g. `alias vec2f = u32;` is probably valid WGSL code, and can very much screw up code that comes after it.

it should not affect the imported items. the imports are technically done by building a naga module for each source individually/wihtout context, and then munging them together at the end.

  • What exactly can be imported?

as you've said. other things could potentially be supported, but each would need investigation.

How does the generation of the final shader work? I assume a fair bit of of the following is going on

* Parsing the shaders

yes, individually without context

* Tracking the used types

yep, and variables, functions, consts, etc

* Name mangling: Should have a documented algorithm

we can expose this (we discussed that in another issue i think), but the current scheme is {item}{pre_marker}{b64(module)}{post_marker}, where pre_marker is a magic string that is otherwise disallowed (X_NAGA_OIL_MOD_X iirc), post_marker is X, b64(module) is the base64 encoding of the module's import path, and item is the undecorated identifier being imported.

there are some non-obvious criteria for this to produce valid identifier names in each target language (e.g. double underscores are not allowed in wgsl, naga doesn't handle items ending in digits or underscores well, putting the item first potentially helps with cache lookups, etc), and obviously if you make a proper spec the pre-marker would need to change.

I would be willing to help with this. Is there a preferred place for communication? :)

i'm normally on the bevy discord in rendering-dev (you might need to ping me though), and other people there are more or less familiar with naga oil as well.

@stefnotch
Copy link
Contributor Author

First of all, thank you so much for taking your time to respond to all of this!

i'm completely open to adding/exposing features to aid integration directly. i'm not very keen on writing a formal spec, but happy to describe the current implementation (and potentially change the implementation if a spec does evolve).

Very understandable. I'm willing to do the work of writing up a specification.

briefly:

  • What is a valid location for a #import?

anywhere is valid provided the import begins on its own line, but they are explicitly not scoped. imports are collected by the preprocessor and apply for the whole file, regardless of the "scope" of the import, whether it is declared before/after the usage, etc.

Alrighty. Would it be reasonable if the specification said something along the lines "top of a file only", and naga_oil simply ends up implementing a superset of the specification?
The intention there would be to not add tricky requirements for WGSL parsers that parse everything in one go.

* Name mangling: Should have a documented algorithm

we can expose this (we discussed that in another issue i think), but the current scheme is {item}{pre_marker}{b64(module)}{post_marker}, where pre_marker is a magic string that is otherwise disallowed (X_NAGA_OIL_MOD_X iirc), post_marker is X, b64(module) is the base64 encoding of the module's import path, and item is the undecorated identifier being imported.

there are some non-obvious criteria for this to produce valid identifier names in each target language (e.g. double underscores are not allowed in wgsl, naga doesn't handle items ending in digits or underscores well, putting the item first potentially helps with cache lookups, etc), and obviously if you make a proper spec the pre-marker would need to change.

I think there's a name mangling scheme where one doesn't need a disallowed string. For example, using the ``{item}{pre_marker}{b64(module)}{post_marker}` scheme, we could always decode it by going backwards

  1. Take the post marker
  2. Take the b64 module. The pre_marker uses characters that don't appear in base64, so that's nice.
  3. Take the pre_marker
  4. Whatever is left is definitely the {item}

Now if {item} happens to include the {pre_marker} string, then that's not a problem. It's just a funnily named item, but doesn't cause any mangling or unmangling issues.

i'm normally on the bevy discord in rendering-dev (you might need to ping me though), and other people there are more or less familiar with naga oil as well.

Sweet, I just joined that Discord :) Looking forward to seeing where this goes, hopefully it'll end up being rather useful for the lovely bevy engie.

@robtfm
Copy link
Collaborator

robtfm commented Apr 25, 2024

The problem if you don’t have a magic marker is that users can make a raw item that happens to parse as a decorated item. I don’t see any way to avoid that without denying (part of) the marker.

This could probably be managed with ast context, but I specifically tried to avoid that (naga doesn’t facilitate it currently, and it raises the bar for other language support significantly).

Would it be reasonable if the specification said something along the lines "top of a file only", and naga_oil simply ends up implementing a superset of the specification?

Sounds very reasonable to me.

@atlv24
Copy link

atlv24 commented Apr 25, 2024

are imports preprocessor aware? can you conditionally import a file? do imported files take on the same defines as the first file?

@robtfm
Copy link
Collaborator

robtfm commented Apr 25, 2024

are imports preprocessor aware?

Yes - see eg the import of VertexOutput in bevy’s pbr_fragment.wgsl

can you conditionally import a file?

Yes, as above. Naga_oil also tree-shakes though, so you often don’t need to.

do imported files take on the same defines as the first file?

Yes, there is one set of defines applied through the whole module build, otherwise chaos would ensue.

@stefnotch
Copy link
Contributor Author

stefnotch commented Apr 25, 2024

@robtfm If we mangle every identifier[1], then I think we do not need a magic marker. This does mean mangling everything in the main module as well. This makes specifying the entry point for a WGSL shader a bit more annoying.

I'll properly write it up later, and I'll also look at what Rust and Swift are doing for their name mangling, just so that I don't miss any essential details.

There's also an alternative mangling scheme, which does "escaping" instead. e.g. All identifiers in the main module are valid as is, except if we encounter something that could have a special meaning, then we escape it. Including escaping an escape sequence.
(Similar to how one escapes "quotes \" and backslashes \\ in strings".)
This doesn't quite solve the entry_point specification, because a user could still write something that we have to mangle

@compute
fn main_X_NAGA_OIL_MOD_X() {
}

[1] Yes, every identifier. Globals can be used inside a function? Well, we'd have to mangle every variable inside a function. I suppose the escaping strategy would be useful there, since it only needs to mangle a few things.

@robtfm
Copy link
Collaborator

robtfm commented Apr 25, 2024

escaping won't work with the current status quo, because the modules need to be compiled by naga which enforces wgsl identifier rules (UAX31) - anything we can use, users can use too, unless we choose to deny something to users that's otherwise valid. so practically, escaping would have to be the same as using a magic marker and denying it to users.

mangling everything would work i think. we would also be able to undo that mangling on the top-level names (at least the entry points) once we've built the all pieces to naga, to avoid the ux issue. might be quite a lot slower though..

@stefnotch
Copy link
Contributor Author

stefnotch commented Apr 26, 2024

Counterintuitively, there is an escaping scheme that should work under that constraint. (Very convenient for us)

We pick some escape character, like A (we'll pick a better one, don't worry). And then, we want to

  1. Whenever the user tries to write something special, we escape it
  2. Afterwards add the whole {pre_marker}{b64(module)}{post_marker} at the end of imported stuff

To decode it, we first look at the stuff at the end. Afterwards, we undo the escaping to get the original identifier.

As in, whenever the user writes

  • cat: remains unchanged cat
  • main_X_NAGA_OIL_MOD_X: gets escaped to main_X_NAAGA_OIL_MOD_X
  • AAcatA gets escaped to AAAAcatAA
  • dog from another module first gets escaped, and then we append the stuff dog_X_NAGA_OIL_MOD...
    ...

@stefnotch
Copy link
Contributor Author

@robtfm I've been thinking about #imports for a while, and I have one major question: How does a language server find the relevant files?

Essentially, when a language server sees #import foo, then it should not have to search through all wgsl files in all folders to find the correct one to import. Instead, it should

  • either follow some easy-to-check rule like "look for a file with the name foo.wgsl in a specific folder"
  • or use an import map, which is a file where all shaders are statically defined
  • or something else?

@robtfm
Copy link
Collaborator

robtfm commented Apr 29, 2024

There’s an issue about making a cli which links to a simple one I knocked together. That shows how I would expect the resolution to work, in a lsp context you presumably have a known project root to search from, and additional folders could be configured from plugin settings (at least in vscode, I guess other containers would have a similar mechanism)

@robtfm
Copy link
Collaborator

robtfm commented Apr 29, 2024

Given the way resolution works there’s no way to avoid searching all files, but I guess you could generate a static map upfront and track changes if that’s too slow for building dynamically.

@stefnotch
Copy link
Contributor Author

@robtfm Could we change how resolution will work in the future, to avoid searching all files?
I think the semantics of #define_import_path are suboptimal in that regard.

@robtfm
Copy link
Collaborator

robtfm commented Apr 29, 2024

The main reason for this setup was that bevy needed to reference statically included shaders that didn’t have any associated path, so needed to be “self-identifying”. That’s changed now so that they can (optionally) have a path, and will probably change more as asset-sources improve.

But disallowing the #define_import_path directive completely would probably still need some fairly substantial changes in bevy for which I’m not sure the groundwork has been completed, and for which the ux would need to be evaluated - it is convenient in practice (for users if not for lsp-writers…)

@stefnotch
Copy link
Contributor Author

I understand. Would a reasonable path forward be:

The specification won't have anything like #define_import_path, but naga_oil will still support it for as long as reasonable.
And we come up with a new mechanism that language servers can use, and Bevy can migrate to?

@robtfm
Copy link
Collaborator

robtfm commented Apr 29, 2024

If you’re sure it’s too onerous for servers to support then ok. Someone did actually post a working lsp to the bevy discord yesterday though…

@stefnotch
Copy link
Contributor Author

There is one environment that I'd like to support, where the #define_import_path strategy would be very tough. It's "building WGSL shaders at runtime in a website".

Essentially, there one would specify the shader, and then a bit of Javascript would go over it to find all the other shaders that it can import. However, one can only do fetch(some constructed url) in a website, one cannot do fetch(all files).

The other thing that I have to keep in mind is making it possible for multiple tools to deal with #include. It would certainly make it harder for tooling authors if they had to intelligently read a lot of files and folders to figure out what to do next, and occasionally, tools would screw it up and read through entirely too many folders.

Of course keeping the existing behaviour intact is also important, hence my approach

  1. Spec out nice #import semantics
  2. Implement them, and keep all the existing behaviour intact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants