Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add emilua to the list #95

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vinipsmaker
Copy link

@vinipsmaker vinipsmaker commented Apr 4, 2023

So, I've read the contribution guidelines, and I think my PR is not ready for acceptance. However, I think opening the PR will give a space for discussion so it can be improved until acceptance is ready.

For a start, the description at least in the section “Concurrency and Multithreading” is too large and should be cut down a little. However I felt it was necessary to have a comprehensive description there when opening this PR so the maintainers of LewisJEllis/awesome-lua have a good grasp of what is being added to the list before we cut it down.

A different taxonomy for concurrency would split the list into “shared-memory concurrency” and “shared-nothing concurrency” instead of “coroutine-based multitasking” and “multithreading”. Coroutines and threads are different things, but a single framework can still fit in both. Emilua attacks concurrency, not threads or coroutines, so it can be included in the two lists. Leaving taxonomy nitpicking aside, I think it'd be more appropriate in this PR to only include Emilua in the section multithreading, and cut the explanation for “shared-memory concurrency” (fiber concurrency) out entirely. However, I felt it was necessary to explain it when opening this PR so the maintainers of LewisJEllis/awesome-lua have a good grasp of Emilua.

In the contribution guidelines, you guys state things like “best parts of the Lua ecosystem”, “best tools and packages to work with”, and the likes. That's quite intimidating honestly, so I really felt pressured to explain Emilua in more detail so the PR is at least considered. We can cut theses descriptions in half before merging the PR. I felt the need to explain Emilua more carefully especially because Emilua advances the terrains already attacked by other Lua frameworks. Emilua attacks new terrains. As such, it'll be really useful to have such descriptions for the maintainers of LewisJEllis/awesome-lua.

For instance, LewisJEllis/awesome-lua lists both lanes and luaproc. It also hints that “[lanes uses] completely separate Lua states, one per OS thread” in contrast to luaproc that uses a shared thread pool that schedules ready VMs. Well, Emilua can do both, so, in a way, it obsoletes both. Therefore I firmly believe that there's enough ground already to include Emilua in the list. Emilua achieve this feat by splitting the responsibility to spawn new threads and to spawn new VMs into two different functions.

spawn_vm() creates a new Lua VM. By default, this VM shares the execution context (i.e. the thread pool) of the calling VM. Therefore, it works like luaproc by default. Initially only one thread exists for each execution context, and this number can be increased by calling spawn_context_threads(). So the use case for luaproc is covered.

One may call spawn_vm() passing { inherit_context=false } to have a new execution context (the thread pool) created to run the new VM. So the use case for lanes is also covered. It's just two functions (spawn_vm() and spawn_context_threads(), but beyond this simplicity lies a much more flexible system). One can use this same mechanism to have uses cases not covered by either lanes nor luaproc. For instance, one can reserve one thread to run the program UI (so it stays responsive) while CPU intensive tasks compete in a separate thread pool.

The system is very transparent in what it does. After the program layout is chosen and the VMs and threads spawn, there's no distinction (API-wise) between VMs from any execution context. spawn_vm() returns a communication channel with a method send(), and that's what you use to send messages between VMs.

This same system also implements the actor model as you can include the address of a VM (the channel/handle) as the contents of a message. That's also not taken care of by lanes nor luaproc.

local vm2 = spawn_vm(module)
vm2:send() -- <2>

If/when vm2 dies, the call in <2> will fail, and this kind of deadlock is avoided. That's done by neither lanes nor luaproc. I'm only giving this example to illustrate why it's desirable to have an actor system implemented in the concurrency runtime itself.

Emilua also uses the same API to also allow one to spawn VMs into isolated Linux namespaces for sandboxing purposes. For instance, you can make use of media parsing libs such as ffmpeg in sandboxes to dodge attacks from hackers that exploit the attack surface on buggy parsing libraries. That's a possibility that I don't see in other Lua frameworks. And it's the same API (except for setting up the new Lua VM), with channel:send() and all that. In the next releases, I plan to add support for Landlock and FreeBSD's Capsicum as well.

That's it for the shared-nothing concurrency model as attacked by Emilua. All that (except for Linux namespaces, which was only planned at the time) was ready years ago by the first Emilua release (0.1). Emilua is now at version 0.4.

Now, for the shared-memory concurrency model, Emilua has full support for fibers. So it's not only basic coroutine support that worries only about increasing the concurrency level (i.e. spawning new tasks). There's full established vocabulary (fibers) to tame problems that appear in concurrent environments. There's a great article on this topic by the maintainer of a SQL driver for the Python community: https://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/. Suffice to say that few concurrent environments take this problem seriously. Emilua does.

And then there's async IO. Async IO means IO happens asynchronously to the program execution. For some problems it does matter, but in reality very few applications care about that. What most applications really care about is concurrent IO. As in, don't block the progress of the program while some IO activity can happen in the background. For concurrent IO, operating systems offer either reactors (e.g. select(), epoll(), kqueue) or proactors (e.g. io_uring, IOCP). True asynchronous IO can happen when one uses proactors (actually the subject is not that simple as paging and memory reservation is also involved, but I won't enter in details here). In any case, true asynchronous IO gives to the program the same responsibilities that one would have when dealing with threads (e.g. the NIC — which can imagined as an “invisible thread” that runs in parallel to program execution — will be filling and reading application buffers in parallel to program execution). So, for sufficiently advanced IO, the same framework will need to deal with both — concurrency/multitasking, and IO. The problems are too close in proximity. Emilua attacks both (as that's how it should be done IMO).

For (networking) IO supporting, all the basics are covered (TCP, UDP, name resolution, etc). However that happened only recently as of release 0.4. That's why I think now was a good time to publicly tell the world about the project. I make use of Boost.Asio to back the IO implementation which was around way before libuv and has a long history. It's important to make use of old cross-platform libraries to abstract difference among OSes (libuv would also be acceptable) because there are usually many undocumented quirks involved for each system that one only gets to know at exposure. For the first releases I wanted to tackle the hardest task first (a robust execution engine). After I got that out of the way, I started to focus on comprehensive IO support which is starting to show results as of the recent 0.4 release.

So, for network IO, some support for concurrency is basically indispensable. If you do not offer concurrency support, the user will be limited to implement very serial half-duplex protocols... which is not really much. Given the whole focus of the first releases was literally on concurrency, you can be assured of good support on this area. For the alternatives listed in LewisJEllis/awesome-lua, we have things such as luasocket, which are popular among Lua users, but have (quite honestly!) very poor multitasking support (there's select(), and not a single scheduler to ease collaboration among library creators). Any solution mirroring NodeJS API will also be an ad-hoc solution to the problem of concurrent programming. I can discuss this topic further if there's interest, but I really think it's not the appropriate place here. For anyone curious, I can't recommend the blog post I already linked earlier enough. Go on and take your time: https://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/

Another important tool to deal with IO is support for cancelling operations. IO is special because many IO operations might not offer a guarantee to ever finish, so you must cancel and rollback operations that are no longer needed. The lazy way to develop a solution for this problem is to offer a timeout (as done in luasocket). This type of timeout is the right solution... but in a very very narrow-space: non-scalable low-latency oriented solutions (e.g. game event loops to handle user input). What I see is not developers offering this type of timeout because it's the right solution (after all, they're not developing solutions for the non-scalable low-latency world), but because the implementation is just easier for them to code for. Tarantool is one the exceptions. Tarantool is a Lua framework that has been around for a long time, and has been offering better single-VM concurrency than what I usually see in the Lua ecosystem. This lack of vision is not exclusive to Lua (this problem also permeates communities in Python and JavaScript, for instance). As for Emilua, I just have a tight integration between IO and the fiber runtime. Interrupting a fiber will transparently cancel the IO operation that is keeping the fiber in the state busy, and appropriately rollback the fiber and trigger every cleanup handler in the proper order for the ones who care about robust resource management. I haven't invented a single thing. I only used well documented semantics around POSIX thread cancellation which are the right solution for non-functional languages IMO (if you have state, there's just no way around it, and you do need to have tooling around the concurrency runtime to rollback operations and preserve state invariants that the program relies on). Alternatively, one might just as well defer all complexity around concurrency problems to a DB engine (as it's usually done in web frameworks), or use the actor model to restart whole actors on the first error (so you don't need to worry about broken states). These approaches are also applicable in Emilua.

I could talk endlessly about this topic, but I think I'll stop right here and see what you guys have to say. I've been studying event-oriented program for over a decade now, and a few years back I decided to develop a solution for Lua programmers, and it's finally here.

Other requirements from the contribution guideline for LewisJEllis/awesome-lua:

  • Documentation: https://docs.emilua.org/api/0.4/ (also available as PDF/ePUB, and installed on your system as manpages when you build the project).
  • Examples: Not many, right now, but you can find them in the repo, and scattered throughout the documentation.
  • Tests: ✔️
  • Contributes heavily to the overall picture of what's available and possible with Lua: a definitive yes
  • Hosted on GitHub: I hate Github. It's hosted on Gitlab: https://gitlab.com/emilua/emilua. If that's a problem, you can just reject the inclusion request.
  • Can be installed with LuaRocks: Nope. Every operating system exposes its own API to access system resources (e.g. IO and concurrency). Going one step further: IO and concurrency are intrinsically entangled in the design of the OS itself. You need to write in C (or similarly native language) if you have any ambitions to offer serious support for IO. LuaRocks may work for Lua, but it's definitively an anemic system for the needs of native programming. You must install Emilua by building it yourself (or downloading pre-built binaries), and then you can run your Lua programs through the binary emilua.
  • ...or is otherwise easy to set up: There are packages for ArchLinux, and guix. I'm in contact with a friend who creates packages for Debian, and he's slowly helping me to get Emilua in the Debian repos. As time allows, I hope to do the same for other Linux distros (Fedora, nix, ...). Windows packages are built directly from the CI, but you need to download some DLLs from Microsoft Visual Studio to run these binaries as well. I have a FreeBSD system on one of my machines, and I'll be working on FreeBSD packages as well as time allows (Boost.Asio itself already offers support for FreeBSD, and I'm constantly consulting BSD manpages when I develop my systems, so there shouldn't be too much work to have Emilua on FreeBSD).
  • Relatively production-ready: it has been in the making for a few years already. Now it's wait-and-see. We can only call production-ready if there's wide adoption.

@manipuladordedados
Copy link

Emilua is already available for Ubuntu, FreeBSD, NixOS, and many other distributions are on the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants