[Feature discussion] Plugin API #928

aaaaaa123456789 · 2021-10-19T03:19:59Z

aaaaaa123456789
Oct 19, 2021

(Fair warning: this is the most radical change I've ever proposed for the toolchain. I'm opening a discussion and not an issue because I know that, if something like this is real, it's probably even further away than user-defined functions were when I first asked for them.)

Looking at the recent features added to the toolchain, and the ones that have been requested, it should surprise nobody that people are using RGBDS for metaprogramming in their assembly projects. It's also known that many people use small tools in their builds that generate code, or what's harder, alter it prior to assembling.
This last task is what prompts me to open this debate. Projects sometimes write their files in some sort of "pre-assembly", i.e., some sort of assembly-like code that won't assemble as written and needs of some tools' preprocessing steps to be turned into valid RGBDS input. Writing a tool that processes this "pre-assembly" correctly requires reimplementing at least a significant portion of RGBASM's lexing and parsing functionality, and even some parts of the code generator (e.g., to process constant expressions); what projects tend to do is to give up and simply require their code to be written in some form that the tools will like and break on unexpected input. This is just as true of projects that use RGBASM's macro abilities to preprocess the code within the toolchain itself: even though recent improvements have made it possible to write complex macros more easily, macro-based build failures are nonetheless common.

Writing a preprocessing tool thus leads to an antipattern akin to Greenspun's tenth rule: any sufficiently complicated assembly preprocessor contains an ad hoc, non-spec-conformant, bug-ridden and slow implementation of half of RGBASM's lexer and parser. Instead of this madness, it seems more sensible to simply let RGBASM process the source. But this would require RGBASM to have a way to call into user code that does whatever job those tools are doing. Hence this proposal: expose a C API that libraries can implement, and let users call into it from their code. (I'm explicitly calling it a C API because that's the lingua franca of most platforms. A rewrite of RGBDS in a language like C++ or Rust should not block this, as most modern compiled languages can expose and interact with a C API; the same is true of libraries that users may write, regardless of the language they actually use.)

I'll be happy to elaborate on what an API could look like if anyone is curious. But the basic principle is this: a new keyword, something like loadmod, would load a library (.so/.dll/whatever is native for the platform) and call its initialization routine, perhaps passing it some parameters given to the loadmod statement, and passing it an opaque state object that the library can use to call into RGBDS functions. (The object doesn't have to do anything; it just has to be a pointer to a struct of function pointers so that the library can do rgbds -> foo(rgbds, ...);, perhaps with some opaque internal state wrapped in a void * member that RGBDS can consume.) The library can use RGBDS's functions to define macros: instead of executing RGBASM code, those macros would call into the library when encountered. (Ideally, it should also be possible to define new block declarations, but some special grammar would be required to allow this without requiring a lexer hack to parse these new block declarations dynamically. While this is possible, I'll stick to macros for now.)
Whenever one of these macros is encountered in the code, when the toolchain would replace the macro with its body, instead it calls into the callback defined by the library with the macro name and arguments and an opaque state object (which may or may not be the same as the one before; it's opaque, after all). The library can use RGBDS's functions to inspect the arguments (for instance, requesting RGBDS to evaluate one of them as an expression and return its result) and emit code; any code emitted would be reprocessed, exactly like code emitted by a macro is currently handled.
Once the toolchain is certain it can unload the library (might be explicit, the end of the translation unit, etc.), it calls a deinitialization function so the library can release its resources. (It might also call such a function if a macro is redefined or purged by the user.)

An API like this would easily enable users with complex build processes to write tools that interact with the source code of their projects without having to parse it manually, in possibly incompatible ways and without being able to take full advantage of RGBASM's source processing features — which are features that they're going to have available anyway, since the code will ultimately be processed by RGBASM after those tools are done running. Running a tool from within RGBASM allows all sorts of interesting build support processes, ranging from emitting metadata (which could then be easily inserted into the code itself) to generating data (and even code!) on the fly where needed.

I'll be looking forwards to people's comments. Once again, I don't expect anything like this to happen for several years at least. This is just a way to open the debate.

Rangi42 · 2021-10-19T03:44:50Z

Rangi42
Oct 19, 2021
Maintainer

My first thought is that exposing a direct ABI to certain functions within the implementation of rgbasm sounds way too low-level; it encourages people to write C/C++/Rust/whatever programs, compile them to some .dll, get it to interact properly with rgbasm, and then we have an ABI to support...

What about a way to somehow run command-line programs on the contents of rgbasm files and/or the printed output so far, and have those programs edit or append to the assembly?

I'd have to see more examples of other projects' preprocessing workflows to know what sort of tasks they do and whether something like that would be sufficient. The only one that comes to mind is Prism's textcomp to encode ctxt "Hello" as Huffman-compressed db TX_COMPRESSED, $a, $b, $c. If rgbasm could run commands like C's system(), it might look like:

MACRO ctxt
    ; pass all macro args to textcomp.exe, returns stdout as a string, which we 'db'
    db SYSOUT("textcomp.exe", \#)
ENDCMD

	ctxt "This toxic gas" \
	line "cloud is blocking" \
	cont "the way." \
	para "Trying to go" \
	line "through it could" \
	cont "be dangerous." \
	done

A CLI-style system would make it easier to write tools in shell scripting, Python, or whatever people are comfortable with.

Also I'd like to think that rgbasm's own macros are powerful enough to get a lot done; we've added structs and lists/arrays without needing them built-in. Even for those, a function to read a file as a string could help, so macros could run on its content. (That would still depend on >255-char strings.)

1 reply

ISSOtm Oct 19, 2021
Maintainer

Careful that allowing input files to arbitrarily execute code on the user's machine is a security concern. LaTeX (whose \write18 this reminds me of all too much) requires passing a flag specifically to allow that; but even then, I fear the user may not be fully aware of the potential consequences.

I find this "not as bad" as the ABI issues from dynalic lib loading (not to mention the distribution problems associated with those 3^rd-party binaries, but I disgress); however, it looks like LaTeX enough that I'm wary of this feature set due to the flakiness of the former.

ISSOtm · 2021-10-19T16:34:49Z

ISSOtm
Oct 19, 2021
Maintainer

Quickly chiming in: I think that something desirable would be for RGBASM to somehow be able to export an AST (since we technically end up with one after all expansions are said and done; it just can't be generated AoT in the general case). This is useful for e.g. code analysis and highlighting.

It doesn't cover all cases, so it's somewhat orthogonal to what has is being discussed, but I think it's worth mentioning. (It's another reason why I've been wanting to rewrite RGBASM from scratch, so that this could be integrated into the architecture, since the current one really doesn't play nice with that idea.)

6 replies

ISSOtm Oct 19, 2021
Maintainer

I was rather thinking about an in-memory representation and librgbasm; failing that, the export format would definitely be binary, anyway.

Rangi42 Oct 19, 2021
Maintainer

Why binary?

ISSOtm Oct 19, 2021
Maintainer

For processing efficiency, especially for the consumers.

Rangi42 Oct 19, 2021
Maintainer

That sounds like premature optimization. We don't know how large the ASTs would get, or what people would be doing with them, or where the bottlenecks would be. I would expect an AST to be on the order of the same size as the raw ASM, which is small enough that I've had Python tools that run on a whole repo of files and give good results. And I think text (JSON or custom) would have advantages for the same reasons Unix tools use it: it's human-readable with zero processing, JSON is a standard and S-expressions are trivially parseable, and the consumer doesn't need to know what particular byte values mean (let alone whether some field is 1 or 2 or 4 bytes wide) like they do with the .o file format.

ISSOtm Oct 20, 2021
Maintainer

We already know the ASTs would be large from the ASTs that we do already have. (Run a parser trace to get an idea of the thing's size.) The AST size would be the same size with a binary format, but a text format would make the metadata overhead significantly larger.

Human readability is of no use for an internals export like an AST dump, following the same reasoning as object files. We can provide a text-formatting utility like rgbobj for manual inspection, and give that a more friendly format than JSON.

Rangi42 · 2021-10-19T22:29:13Z

Rangi42
Oct 19, 2021
Maintainer

A few examples from talking with ax6 about current ways people need extra build tools with rgbasm.

Compressed text

foo.asm:

FooData:
    db "Hello"
    dw 42
FooFunction:
    add hl, bc
    ret
FooCompressed:
    text "Hello,"
    line "world!"
    para "Goodbye"
    line "now<ellipsis>"
    done

Before needing compression, that would just be text EQUS "db ", line EQUS """db "\n",""", para EQUS """db "\n\n",""", done EQUS "db 0", etc. To conveniently make text compressed, this requires a textcomp preprocessor to turn foo.asm into foo.ctx, using basic recognition of those text macros (which has to either hardcode a charmap or parse the expected charmap file itself, and has no support for advanced rgbasm usage like line STRUPR("first \1 then \2") or setcharmap unusual) to transform the text into bytes (duplicating what rgbasm could already do), compress it, and replace those lines with db START_COMPRESSED, $1, $2, $3, 0.

Compressed graphics

foo.asm:

MyTileset:: INCBIN "tiles.2bpp"

tiles.png: A rectangular graphic with five trailing blank tiles.
Makefile:

%.2bpp: %.png
    rgbgfx -o $@ $<
tiles.2bpp: tiles.png
    rgbgfx -x 5 -o $@ $<

It would be more convenient to specify the unique arguments at the INCBIN site.

Compressed data

foo.asm:

MyCollisionData: INCBIN "tiles.bin.pb16"

tiles.asm:

    tilecoll WALL, WALL, FLOOR, FLOOR
if VERSION == 2
    tilecoll WALL, FLOOR, FLOOR, LAVA
else
    tilecoll WALL, FLOOR, FLOOR, ICE
endc
rept 4 * 3
    tilecoll WATER, WATER, WATER, WATER
endr
    ...

If this were uncompressed, the INCBIN could be an INCLUDE, and tilecoll could just be a macro for db COLL_\1, COLL_\2, COLL_\3, COLL_\4. But to compress it, since the data depends on features of the rgbasm language, we have to assemble tiles.asm standalone (making sure to include the right macros and constants beforehand), build it with -x to get just the raw data, and compress that.

Metadata

Imagine you want to log which parts of your file are code and which are data, in a custom format like the map and sym files. You could define macros to print the appropriate lines depending on the current filename, line number, section, label scope, etc; but you would still have to redirect the printed output to a file yourself when building, and what if multiple unrelated things are being printed to stdout? The alternative would be to pre/postprocess the .asm file separately from building it, but that could potentially require parsing arbitrary rgbasm syntax.

External data

Data from a network connection, randomly generated numbers, the size of a file accessible as a numeric symbol; currently all of these need to be done via your build system. That's usually make on Linux, but Windows doesn't come with that, and even so you then have to invent intermediate file formats and extensions like myimage.pb16.size.

(All of these examples are basically file I/O, which I think could be more quickly covered by a more focused feature request. I'm sure plugins could potentially do a lot more, but that would involve identifying good API hooks throughout rgbasm, and given how we're not coming up with a bunch of neat uses for general-purpose plugins, I think it would take even longer for people to make use of it creatively.)

3 replies

ISSOtm Oct 19, 2021
Maintainer

For the compression Makefile snippet, you can remove the tiles.png explicit dependency, it's already there from the generic rule.

ISSOtm Dec 16, 2023
Maintainer

Update on the "Compressed graphics": at-files solved the issue of having to specify special flags in the Makefile since RGBDS 0.6.0.

Rangi42 Dec 16, 2023
Maintainer

Macros have also gotten more capable (and charmaps with CHARLEN/CHARSUB/ISCHARMAP), so the text compression example is now doable within rgbasm.

Rangi42 · 2023-12-16T15:38:37Z

Rangi42
Dec 16, 2023
Maintainer

This is like #930 but even more powerful.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature discussion] Plugin API #928

{{title}}

Replies: 4 comments 10 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[Feature discussion] Plugin API #928

aaaaaa123456789 Oct 19, 2021

Replies: 4 comments · 10 replies

Rangi42 Oct 19, 2021 Maintainer

ISSOtm Oct 19, 2021 Maintainer

ISSOtm Oct 19, 2021 Maintainer

ISSOtm Oct 19, 2021 Maintainer

Rangi42 Oct 19, 2021 Maintainer

ISSOtm Oct 19, 2021 Maintainer

Rangi42 Oct 19, 2021 Maintainer

ISSOtm Oct 20, 2021 Maintainer

Rangi42 Oct 19, 2021 Maintainer

Compressed text

Compressed graphics

Compressed data

Metadata

External data

ISSOtm Oct 19, 2021 Maintainer

ISSOtm Dec 16, 2023 Maintainer

Rangi42 Dec 16, 2023 Maintainer

Rangi42 Dec 16, 2023 Maintainer

aaaaaa123456789
Oct 19, 2021

Replies: 4 comments 10 replies

Rangi42
Oct 19, 2021
Maintainer

ISSOtm Oct 19, 2021
Maintainer

ISSOtm
Oct 19, 2021
Maintainer

ISSOtm Oct 19, 2021
Maintainer

Rangi42 Oct 19, 2021
Maintainer

ISSOtm Oct 19, 2021
Maintainer

Rangi42 Oct 19, 2021
Maintainer

ISSOtm Oct 20, 2021
Maintainer

Rangi42
Oct 19, 2021
Maintainer

ISSOtm Oct 19, 2021
Maintainer

ISSOtm Dec 16, 2023
Maintainer

Rangi42 Dec 16, 2023
Maintainer

Rangi42
Dec 16, 2023
Maintainer