-
Notifications
You must be signed in to change notification settings - Fork 1
A virtual architecture, with its own instructionset, and a dedicated assembler with enhanced features.
License
Obi-Wan/vARCH
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Index 1. COMPILE / INSTALL 2. WHAT IS IT 3. ASSEMBLY (two examples) 3.1 BRIEF DESCRIPTION 3.2 SUBROUTINES 3.3 CONSTANTS 3.4 PREPROCESSOR 4. LATEST DEVELOPMENTS 4.1 TEMORARIES / Registers Auto-allocation 4.2 CALLING CONVENTIONS 4.2 ELF Object FILES 4.4 SAMPLE FILES -------------------------------------------------------------------------------- 1. COMPILE / INSTALL Before trying to compile it, remember to run autoreconf. It will then be enough to type: ./configure make I'm not providing a predefined way to install right now. It's more like a toy, than a tool. -------------------------------------------------------------------------------- 2. WHAT IS IT vARCH is a virtual machine / interpreter of bytecode for a virtual architecture that I invented for learning purposes. It is not intended for performance or flexibility use cases. It's just a simple and easy to learn architecture. For the ease of development I also created a simple assembler. Explanation on how to write the asm code is aided by some practical examples. If you want to contribute to vARCH, you should ask for documentation/answers to me directly, because it is a spare time project and I don't have much time for documentation. What I will report here are samples of the Asm language: they need to be assembled and then either moved to the name "bios.bin" or soft linked to that name. -------------------------------------------------------------------------------- 3. ASSEMBLY The first example is a simple program for calculating the first n factorials. The "main" function is special in the sense that is always put at the beginning of the generated executable. ---------------- biosFactorial.s ---------------- ; calculate the factorial by definition of the first 5 numbers .function "main" .init: MOV, $1 , %R8 .start: ; the counter is post incremented after the copy MOV, %R8+ , %R1 MOV, $1 , %R2 .iter: MULT, %R1- , %R2 MO, %R1 , $1 IFNJ, @save JMP, @iter .save: PUSH, %R2 MO, %R8 , .maxnum IFJ, @end JMP, @start .end: HALT .end .global .maxnum: .i32_t $12 .end -------------------------------- Another simple program that shows how to give shape to subroutines. It also exploits a missing feature to ease the work of outputting text: it should launch the signal to the peripheral and than wait for the ready state, before sending another character to display, but it's just pretending that a terminal can work synchronously with the cpu. ---------------- biosTestAssembler.s ---------------- #include "std_conversion.s" .function "main" .init: MOV, @string1 , %T001 MOV, 4 , %R6 ; call the printing subroutine JSR, @print , %T001 ; call recursive subroutine MOV, 4 , %T002 MOV, %T002 , %T012 MOV, %T012 , %T013 MOV, %T013 , %T014 JSR, @recursive , %T014 ; call conversion subroutine MOV, @bufferEnd , %T003 SUB, @buffer , %T003 MOV, 15234 , %T004 MOV, @buffer , %T005 JSR, @integerToString , %T004 , %T005 , %T003 ; Verify result MOV, %R0 , %T008 EQ, %T008 , 0 IFJ, @error MOV, @buffer , %T007 JSR, @print , %T007 HALT .error: MOV, @stringError , %T001 JSR, @print , %T001 HALT .end .global ; string to write .string1: .string "test: yeah" .i8_t 10 ; '\n' .i8_t 0 .stringError: .string "Error" .i8_t 0 .end .function "recursive" .param %R1 %T001 .local .decrement: .const .i32_t 2 .numberOfCalls: .static .i32_t 0 .lowerBound: .i32_t 0 .stateRegister: .i32_t 0 .localString: .string "local_string" .i8_t 0 .end SUB, .decrement , %T001 INCR, .numberOfCalls LO, %T001 , .lowerBound IFJ, @exit JSR, @recursive , %T001 .exit: MOV, %SR , .stateRegister RET .end .global .uselessGlobal: .i32_t $4 .end ---------------- std_io.s ---------------- ; subroutine to call .function "print" .param %R1 %T001 ; address of the string to print .local .printCmd: .const .i32_t 131072 .printCmd1: .const .i32_t 131073 .endChar: .const .i16_t 0 .end MOV, 0 , %T002 .test: EQ, (%T001) : .i8_t, .endChar : .i16_t IFJ, @exit PUT, (%T001)+ : .i8_t, .printCmd1 INCR, %T002 JMP, @test .exit: RET, %T002 .end ----------------------------------- BRIEF DESCRIPTION OF EXAMPLES: There are 8 data registers R1, R2, .. , R8, and 8 general purpose address registers A1, A2, .. , A8. Then a specific register for the stack pointer SP, and USP for the priviledged executor that wants to access unprivileged stack pointer. (Issuing SP when privileged, an issuing SP when unprivileged gives two different stack pointers) There is another kind of registers, Tn, where n is any number, which is the set of temporaries, which are later assigned and optimized by the register allocation logics. As noted before, there are two kinds of execution: privileged and unprivileged. The bit identifying this is in the status register ( SR ). This was inspired by Motorola 68k family. Accessing data and registers is achieved in many ways with different meanings: Addressing registers and memory is again very similar to the way it's done for M68k asm, so the arguments of the instructions can be: - IMMEDIATE: prefixing constants with $ which uses those as numerical constants, or using @ (at) prefixed labels which accesses the pointer to the label. - REGISTER: prefixing registers with % (eg. %R1, %A3, %SP ..) which accessed the content of the registers. - DIRECT: where the explicit address of a memory location is used, either using . (dot) prefixed labels which accesses the content of the address in memory marked by the label, or prefixing constants with > which uses them as addresses (and addresses the memory address pointed by the constant). - REGISTER INDIRECT: surrounding registers with round parenthesis ( and ), (e.g. (A1), (SP) ..) which accesses the address in memory, pointed by the content of the register. - REGISTER MEMORY INDIRECT: it accesses memory using the address contained in a location in memory pointed by an address into a register, and this is achieved by the double parenthesis operator: ((A1)), ((SP)), etc. - DISPLACED: applies a displacement to the address contained into a register and accesses to the result, e.g. -20(SP), 3(A1), etc. Displacement is a 24bit signed integer. - INDEXED: uses one register as base address to a given location, and then another register as index for accessing elements in an array, e.g. (A1)[R1] - DISPLACED+INDEXED: same as the two together, but displacement in this case is a 16bit signed integer. There are then other useful methods for reducing code size and speeding up execution with a compact code: - prefixing (or postfixing) registers with + or - which in turn pre(post) increments or decrements the accessed data: + if prefixing with %Reg# it will modify the content of the register + if prefixing with (Reg#) it will still modify the register content (and not of the pointed data) + if doing this on (address), data pointed by the address will be modified ------------------ Some words about SUBROUTINES: Subroutines have a simple but strict syntax. You have to specify subroutine name after the .function marker, and identify the end of the subroutine using the .end marker. The name of the subroutine needs to be enclosed in double quotes, like a normal string. There is no possibility to use free code, outside of subroutines, since the main subroutine serves as entry point for the binary. Subroutines, when assembled do export their name as a global label and they can be accessed through . and @ semantics. To call a subroutine just issue a JSR @name_of_subroutine [arguments...] Since JSR stands for "Jump to SubRoutine". The subroutine calling conventions are managed by the assembler. This gives the freedom to the programmer to specify the arguments (expressed as temporaries) on the very same line of the function call. Functions and subroutines can also be recursive and have local variables which are allocated on the stack each time that the functions are called. The stack allocation is automatically managed by the assembler. ------------------ Some words about VARIABLES, CONSTANTS and their attributes: The .global marker opens the area where global/public constants should be located. This area is terminated with an .end marker. Variables or constants, are there visible from everywhere. It's possible to specify local/private constants for function calls only after the marker .local which is terminated by the .end marker, just like functions and globals. To specify access to these memory locations, it is also possible to use the .static specifier, which behaves exactly like in C. To optimize things, .const declared memory locations, are allocated like static variables, to prevent stack allocations at every function call. Local variables on the stack are supported just with the register auto-allocation logic, because it requires some code transformation. Labels make data visible and easily callable from outside. Finally labels can also receive attributes like .size and .num . While .num is not fully used now (tells how many times .size is repeated), and will extensively used for arrays, .size gives the size of the size, in bytes, of the data reached through the label. ------------------ Some words about PREPROCESSOR: As can be seen from the example, it is possible to include code from other files and compile it as if it was part of the current source. If any error happens in the included files, it will be reported in which file and at which line the error took place. ------------------ 4. ADDITIONAL FEATURES 4.1 - TEMPORARIES There is, as noted before, a new kind of registers: T[0-9]* which serve as temporaries for successive elaboration by the allocation logics. Avoiding explicit register use, for temporaries, frees the programmer from one of the most bothering tasks, letting the assembler decide which register allocation to use. For maximum performance, the programmer can still use direct register assignment when needed, while letting the assembler manage all the rest. 4.2 - CALLING CONVENTIONS The arguments of a function are explicitly defined using the ".param" keyword. While the returned (if returned) element is always in register R1. Caller-save registers are R[1-5] and A[1-5], while callee-save registers are R[6-8] and A[6-8]. 4.3 - WRITING ELF OBJECT FILES Compiling with "-c" flag now outputs an object file which respects the ELF file format. This object file can then be linked against other object files, to generate proper executables, using the very same syntax of many other famous compilers. 4.4 - SAMPLE FILES Some sample .s files are in the main folder: testNewAssembler.s std_io.s std_conversion.s The user can have a look to these examples. ------------------ If you feel this description incomplete, let me know. I hope you will enjoy this playground for learning.
About
A virtual architecture, with its own instructionset, and a dedicated assembler with enhanced features.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published