Quoth wikipedia:
In any Turing complete language, it is possible to write any computer program
Bollocks!
Okay, this is true in an abstract sense. Any computation, if it can be performed at all, can be performed in any Turing complete language. But a program is not, in general, just a computation. A program exists in an environment, and some programs need to interact with that environment.
Pick a Turing complete language like brainfuck or befunge or unlambda. How many of the standard unix utilities can you write in it?
Well, if you ignore the --help
and --version
options, you can write true
.
You could write usable replacements for uniq
and wc
and a few others by splitting their command-line options into seperate programs. (So you'd have uniq-c
, uniq-d
, uniq-cd
, uniq-i
, uniq-ci
....) You could write a usable replacement for yes
which reads a string from stdin instead of accepting it on the command line.
But echo
- ls
- test
- even false
- are all impossible. (Yes okay, you could do the same thing to echo
that you did to yes
. But then you haven't replaced echo
, you've replaced cat
.)
This Will Not Do.
And with sysfuck, It No Longer Does.
Sysfuck is a utility for bridging the gap between "Turing complete" and "actually useful". Pick a Turing complete language, and with sysfuck, it will be able to interact with the environment. Specifically, it will be able to make syscalls. Since syscalls are the basic tools for interacting with the environment, this means you'll be able to do anything that you could do in (for example) C.
You don't need to invent a new language, very like your favourite Turing complete language but actually useful. You don't need to reimplement or even recompile your favourite Turing complete language.
(Technically, Turing completeness is not sufficient. It's conceivable that your favourite Turing complete language is only capable of saying "yes" and "no". If that is the case, turn back now! For you will find no solace here. The way is closed to you; you cannot pass.)
But assuming you have a language which can read from standard input, and write any computable string to standard output, read on...
Most of the time, you won't call the sysfuck
program directly. When you write a program that needs to take advantage of sysfuck, you call it using the program sfwrap
, like so:
$ sfwrap program arg1 arg2 ...
In your program, printing works mostly like it would normally. If you print a byte that isn't 0x00 (ascii NUL), it will show up on the terminal, or wherever stdout was redirected to. If you want to send a NUL byte, print two of them in a row.
To make a syscall, print
-
a NUL byte;
-
followed by the name of the syscall;
-
followed by another NUL byte;
-
followed by a single byte, giving the size of the data that you want to pass by the syscall (eg. 0x0c for 12 bytes);
-
followed by the data itself.
The data will contain up to four bytes for each argument that the syscall expects. The first four bytes give the first argument; the next four give the second argument; and so on. sysfuck
is compiled in 32-bit mode, so all arguments to syscalls have a size of four bytes.
For example, to make the C syscall
write(2, 0x12345678, 260);
you would print, assuming a little-endian architecture, the following hexadecimal bytes:
00 'write' 00 0c 02 00 00 00 78 56 34 12 04 01 00 00
('write'
means to print the five bytes 77
(w) 72
(r) etc.) Here 0c
is twelve, the number of bytes that follow; 02 00 00 00
is the first argument, 2; 78 56 34 12
is the second argument; and 04 01 00 00
is the final argument.
The syscall's return value is written to your program's standard input as four bytes. If the above call succeeded, it would return 260, so you would read the bytes 04 01 00 00
. If it only wrote 20 bytes, you would read 14 00 00 00
.
If the data is smaller than the syscall expects, it will be padded with NUL bytes at the end. So the above call could also have been
00 'write' 00 0a 02 00 00 00 78 56 34 12 04 01
Note that 0a
matches the number of bytes sent, not the number of bytes expected.
As well as syscalls, there are some other commands available. You call them in exactly the same way.
-
memread(char *ptr, int n)
- this readsn
bytes from memory locationptr
and sends them to your program's standard input. -
memwrite(char *ptr, ...)
- this does the opposite ofmemread
. Afterptr
, you pass a number of bytes, which get written to memory locationptr
. For example, sending00 'memwrite' 00 06 78 56 34 12 ca fe
will cause the two bytes ca fe
to be written to memory location 0x12345678. Because this function is variadic, the number of bytes sent is significant; null padding is not performed. Nothing is "returned", ie. written to standard input.
-
strlen(char *ptr)
- this is just an interface to the C standard library function. -
argc(void)
- this returnsargc
, the number of command-line arguments sent to the program. The first argument is the program name, soargc
is always at least 1. -
argv(int n)
- this returns the n'th command-line argument, where the zeroth argument is the program name. -
getenv(...)
- likememwrite()
, you pass a string directly instead of a pointer to one. This returns a pointer to the value of the environment variable named by that string, so00 'getenv' 00 04 'HOME'
will return a pointer to a null-terminated string containing your home directory. Due to implementation details, you can only pass a string of up to 254 characters in length; if you try to pass a string of length 255, the final byte will be truncated.
stdout(int fd)
- set the file descriptor that output from the program gets sent to. The default is 1. Likememwrite
, nothing is returned.
Three examples are provided in the examples/
directory, to be run under sfwrap
.
echo.bf
is a replacement forecho
in brainfuck, but without any fancy features. You should either compile it or edit the shebang (#!
) line to point to whatever brainfuck interpreter you use; if you run it assfwrap my-brainfuck-interpreter echo.bf
then "echo.bf" will be treated as one of the arguments to echo. (See "Caveats" below.)
It's been tested with the bff interpreter, but hopefully works with others. Unfortunately bff
doesn't handle command line arguments nicely, so the current shebang line uses my cmd
utility to avoid that.
asciisf
is a perl script that lets you interact withsysfuck
in a human-readable manner. It takes input in the formatname(data)
. Herename
is the name of the syscall, anddata
is a whitespace-separated sequence of hex numbers and quoted strings. Somemwrite(12345678 'hello' 0)
sends00 'memwrite' 00 0a 78 56 34 12 'hello' 00
tosysfuck
.
asciisf
doesn't use Readline, but you can do rlwrap sfwrap asciisf
. (But not sfwrap rlwrap asciisf
.)
test-sfwrap.pl
is just a simple nonexhaustive test program. It should print "this should get printed" to stdout; then "with a null byte(\0) on fd 5" to file descriptor 5 (callsfwrap examples/test-sfwrap.pl 5>&1
to see this); followed by the first three bytes of itsargv[0]
.
Also in examples is bindump.pl
, even though it's not really an example. It's useful for testing sfwrap
on a lower level than asciisf
can. It's basically a reverse hexdump, and takes input in the form of hex numbers and quoted strings, like asciisf
's data. (But with bindump.pl
, hex numbers greater than ff
are printed big-endianly, which is uncool.)
If you have SCons installed, you can build sysfuck by simply running scons
. If not, you can do it yourself with these shell commands:
./gen_str_to_syscall.sh
gcc -o callbacks.o -c -m32 -Wall callbacks.c
gcc -o sysfuck.o -c -m32 -Wall sysfuck.c
gcc -o sysfuck sysfuck.o callbacks.o
You should probably copy sfwrap
and sysfuck
into your PATH. If sysfuck
is not in your PATH, you can tell sfwrap
where to find it with sfwrap -c /path/to/sysfuck program ...
.
sysfuck
is essentially a UNIX filter: it takes input (as described above), and produces output (as described above). But in between it makes syscalls. Additionally, it reads and writes on file descriptors 3 and 4 instead of stdin and stdout respectively. This is so that stdin and stdout can still be used to talk to the terminal when sysfuck
is being controlled by another process.
sfwrap program args...
executes program
with the specified arguments, but under the following conditions:
-
stdin and stdout are redirected to talk to a
sysfuck
process. This process has file descriptors 3 and 4 opened to talk toprogram
, and its argv is replaced withprogram args...
. (In particular, "sysfuck" does not appear in the argv;argv[0]
is "program".) -
File descriptors 3 and 4 are opened to wherever stdin and stdout went originally. If you need to use sysfuck, you probably can't use these, but they're there just in case.
asciisf
uses them. -
The utility
stdbuf
is used to stopprogram
from performing output buffering, to prevent deadlock. This affects programs which use the functions instdio.h
, which is most of them.
(Buffering causes deadlock because if program
attempts to make a syscall and read the result, but the output is buffered so that no data is actually sent, then both program
and sysfuck
will hang waiting for the other to say something.)
If program
exits, then sysfuck
will detect an EOF on its read handle and will exit with status 0. If sysfuck
exits (possibly from calling exit()
or from a segmentation fault), then SIGHUP is sent to program
unless that process has already quit; sfwrap
then exits with the same status as sysfuck
.
-
Be aware that many syscalls are wrapped by glibc, and sysfuck does not provide the wrapped version. One difference is that syscalls signal errors by returning negative error codes, where glibc sets
errno
. For example, a C call towrite()
might return -1 and seterrno
toEBADF
; the syscall itself will simply return-EBADF
. Sometimes this has other consequences: the direct syscallgetpriority()
returns a valuen
between 1 and 40 on success; the glibc wrapper returns20 - n
. Consulting man pages is advised. -
There's no way to get the value of macros like EBADF. You just have to know (or look up) what they are.
-
I don't know whether it's possible to usefully use
fork()
, since you would immediately get two processes reading from and writing to the same pipes. -
sfwrap
has no way to distinguish betweensfwrap interpreter program
andsfwrap program
. So specifying an interpreter on the command line will produce a differentargv
to specifying an interpreter with a shebang (#!
) line or using a compiled program. You can also specifysfwrap
in the shebang line, like#! /usr/bin/sfwrap interpreter
. In this case,interpreter
will be included in theargv
. -
When using
sfwrap
, a keyboard interrupt will by default simply cause the program to exit with status 0. I'm not sure whether this can be overriden. -
When you pass a byte to sysfuck to be printed (ie. a non-NULL byte), sysfuck does not check for errors after attempting to print it, so there's no guarantee that it worked.