diff --git a/147mat.html b/147mat.html new file mode 100644 index 0000000..295fb8e --- /dev/null +++ b/147mat.html @@ -0,0 +1,37 @@ + + + + +1 4 7 + + + + 1 4 7 M R A I T X + M R + A I + T X + 98 76 43 SU Sδ +  I find myself often needing a matrix, or transpositioning +  to properly solve a math problem or fit things across the monitor, etc. + +  I call it a "1 4 7 Matrix". Consider an example where +  everything runs vertically. I have to write it as so, +  to understand it: + +  -- intended position -- matrix position + +  1 2 3 1 4 7 +  +  4 5 6 2 5 8 +  +  7 8 9 3 6 9 + + +  Part1, Part2 and Part3, in vertical order- map to the first 3 elements +  in the row, a horizontal order. And so on... +  We want vertically-running Parts, to appear in intended order, +  as the vertical ordering wraps back to the top. +  And in this example, they wrap back every 3 elements. + + + diff --git a/404.html b/404.html new file mode 100644 index 0000000..830330c --- /dev/null +++ b/404.html @@ -0,0 +1,15 @@ + + + + + +404 redirect + + + + +  "You've been redirected" + +  This is a test. + + diff --git a/COPYING b/COPYING new file mode 100644 index 0000000..3d3ac2d --- /dev/null +++ b/COPYING @@ -0,0 +1,18 @@ +Corresponding SPDX licence +BSD 3-Clause "New" or "Revised" License +Licence ID +BSD-3-Clause +Licence text + +Copyright (c) . All rights reserved. + +Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + diff --git a/Fixedsys Excelsior 3.01.woff b/Fixedsys Excelsior 3.01.woff new file mode 100644 index 0000000..9a59b20 Binary files /dev/null and b/Fixedsys Excelsior 3.01.woff differ diff --git a/NOR.xml b/NOR.xml new file mode 100644 index 0000000..da93191 --- /dev/null +++ b/NOR.xml @@ -0,0 +1,57 @@ + + + + + + A + !A + + + + + A + B + A OR B + + + + + A + B + A AND B + + + + + A + B + !(A OR B) + + + + + A + B + !(A AND B) + + + + + A + B + A XOR B + + + + + temporary description + /l/NOR.html + + + diff --git a/README.md b/README.md new file mode 100644 index 0000000..59dc66f --- /dev/null +++ b/README.md @@ -0,0 +1,7 @@ +# a + +- 👀 examples and assorted snippets for anyone to peruse or learn from + this will be presented as a website for science, thank you + + + for links on c standards go here diff --git a/arr.html b/arr.html new file mode 100644 index 0000000..bc49705 --- /dev/null +++ b/arr.html @@ -0,0 +1,280 @@ + + + + + +fly^p4rray + + + + +
--- fly^p4rray ---
+ + + int arr[3]; + +If `int arr[3] = ...`, were to equal something then we could gain access +and receive a normal integer value when specifying its index. + +otherwise the index of an array is random when uninitialized or its being +determined by the initializer e.g. + + int arr[] = {3, 4, 5}; + +The size of our array is just the amount of said elements, the 0th element being +the first number or character. + +Speaking of which, we can also create individual characters and access the third +element (or whichever you want) like so... + + char arr[3] = {'a', 'b', 'c'}; + +In C, an "un-parenthesized" single character not quoted is treated as an integer +representing the character's ASCII code. + +however, the behavior changes when you use an un-parenthesized character inside an +array initializer without quotes like the following... + + char arr[3] = {a}; + +in this specific case, it's not interpreted as the ASCII code. also, when you have +multiple characters together the compiler interprets them as individual characters... + +The following declares a single character and assigns it, but its not considered an +array (not directly storing the number 5) instead, it stores the aforementioned +"un-parenthesized" character whose ASCII code is 5 (the decimal ASCII code) + + char exp = 5; + +Another way to store a character's ASCII code, the ASCII character associated with +(\a) and not literally the character 'a'... + + char exp = '\a'; + +`\a` in ascii represents the alert (bell) character. and the following is referring +to the character constant (character literal 'a') + + char exp = 'a'; + +single quotes (') are used for character constants. They represent a single character, +even if you put multiple characters within them. + +character literals (like `char exp = 5`) represents a single, unchanging value. +For example, the character literal 'a' always corresponds to the ASCII value 97, +and as such this value is constant and will never change. + +double quotes (") are used for string literals. They represent a sequence of characters, +including spaces and special characters. + +In C, character variables are treated as a type of integer, often signed integers +(meaning they can hold both positive and negative values). when you assign, ` e.g. char exp = 0xFF ` +... (Compilers and modern systems tend to treat char as signed by default) + +The actual value stored depends on whether char is signed or unsigned. + +If char is signed, the upper bit (most significant bit) of 0xFF is interpreted as a +sign bit (1 for negative). This can result in a negative value depending on the +system's implementation. if char is unsigned, all 8 bits of 0xFF are used to +represent the value, resulting in 255 (decimal). + +Confusing right?!... so you can see there's quite the amount of variation when it +comes to characters and arrays. and we still havent shown what string literals look like yet... + + char str[] = "abc"; + +character arrays, specifically strings, are null-terminated, which means that +an additional null character (\0) is implicitly/ added at the end of the +initialized elements, even if you don't explicitly include it to the initializer list. + + char element = str[2]; + +The statement `char element[2] = str[2]` wouldnt be a valid way to copy a +single character from a string to a character array. + +`\0` (Null character or NUL) is a single character literal in C represented by \0. +It is essentially the ASCII character with the value 0 (zero). In string literals, +(\0) denotes the null-terminating character, which signifies the end of a string + +NULL on the other hand is a macro defined in several standard C libraries +(typically as (void *)0). It is used primarily to represent a null pointer, that +does not point to any object or function. It's commonly used in pointer contexts +to signify that the pointer does not currently refer to any valid memory location. + +returning to array declaration, + + int arr[3]; + +The name "arr" itself represents the address of the first element of the array. +In other words its equivalent to a pointer to the first element of the array. +Therefore, you can simply use the array name `arr` to point to the array... + + int *ptr = arr; + +`int arr[3]` is considered an "integer array", `int[]` with the name "arr" of +size (3)... As such its created a pointer to the integer named "arr" +(a contiguous block of memory with (3) integers)... however if we +had declared `int *` (a pointer variable w/ the address of an integer) +this would change what we could do with it. + +We could use the integer array to point to an array, or be used for dynamically +allocating memory w/ a variable-size array (which we explain below) + +while we have an array we can also declare a pointer to an array of an +unspecified size, cause why not, we're showing all possible entrees you can serve, + + int (*ptr)[]; + +you can decompose this to cast it, `NULL` being in the place of the variable +you would cast it to (come back to this if it dont make sense yet) + + ptr_cast = (int (*)[])NULL; + +or you can allocate memory for n arrays of e.g. 3 integers each: + + int (*ptr)[3] = malloc(n * sizeof(*ptr)); + +if that is too confusing, just write it down and come back to it later. +also, we go over malloc later. And we have a page dedicated to explaining it. + +now going back to our valuable array, to review what we've learned so far.. + + int arr[3]; + +we can conclude that theres a difference between indexing, accessing and +initializing. in the expression arr[3], the "3" is sometimes called the "subscript". + +The term "subscript" is used to describe the index or indices that are used +to access a specific element of an array. something else interesting, C allows +for arithmetic expressions within the brackets of an array's indexing, like so: + + buffer[bytes_read - 1]; + +this is the case as well for expressions assigned to a given variable. +im not gonna go into how it might appear in an entire program, i just +thought id touch upon that; Continuing on... + + int *arr = (int[]){1, 2, 3, 4, 5}; + +This is a compound literal. they can be of any type (except for void) +as the purpose is to create an anonymous instance `(type UniqName){}` +of a specific type and initialize its members inline. + +compound literals let you to create an array or structure on the fly +and use it immediately, which can be particularly useful for +initializing pointers or passing temporary arrays to functions. + +When calling a function you can have a compound literal as a parameter. +returning a value itself can be a compound literal. or you can have a compound +literal that recursively initializes arrays or struct members. + +we touch upon designated initializers in struct.html +but i thought i would mention it here first, to establish when you'd +want to use them, the main reason being when you want to initialize specific +elements, aggregates of, an array or structure, without explicitly +initializing every element. + + int arr[5] = {.n = {.a = 1, .b = 2}, .c = 3}; + +or in a function it would look like... + + int arr[5] = { + .n = {.a = 1, .b = 2}, + .c = 3 + }; + +for the sake of demonstration i made it nested, `.n = {.a = 1, .b = 2}` + +*note: designated initializers are specific to structs and arrays, and +you cannot directly apply designated initializers to a simple array like `char outer[5]` + +when used as designated initializers you have the added bonus of +initializing each member to zero (the remaining elements are set to zero) + +this rule is rooted in the language's rules for object initialization and applies +to arrays of any storage duration. In static or global arrays all elements are +initialized to zero if not explicitly initialized. + +only explicitly initialized elements are set, and uninitialized elements are +default-initialized (that is they are set to zero) + +according to the C99 standard (ISO/IEC 9899:1999), section 6.7.8: + + "if there are fewer initializers in a brace-enclosed list than there are + elements or members of an aggregate, the remainder of the aggregate shall be + initialized implicitly the same as objects that have static storage duration." + +also arrays declared at global or static scope are automatically initialized to zero too +however that is not the case if they are declared within a function and without `static`. + +both compound literals and designated initializers were introduced in C99, so you wont +see them in ANSI C. i can see how it might be confusing to have all of these different +versions of C, but here's all you need to know; the core syntax and semantics of C +have remained remarkably stable since C89, so i think you cant go wrong using any +version -from ANSI C and upwards. + +ANSI C is of course colloquially used to refer to C89 as ANSI played a big role in creating +the first C standard, but it in turn became an international effort under ISO. Therefore +some consider the term "ANSI C" to be incorrectly used, although its caught on w/ enough ppl + +returning where we left off w/ arrays and the number of ways/ directions they can go, all of +them up until now are considered fixed, wherein their size is determined at compile time... + + int arr[var] = {a, b, c}; + +thats not to say you cant allocate a new array w/ a larger size and copy the existing +elements from the old array to the new one (then, continue adding elements) + + int MaxSize = 5; + int *FixedArray = (int*)malloc(MaxSize * sizeof(int)); + +`malloc(MaxSize * sizeof(int))` dynamically allocates memory for an array of MaxSize +integers. `sizeof(int)` is used to determine the size of each integer element in bytes, +ensuring that enough memory is allocated for the entire array... + +while we're here, i can show how to make a variable-sized array + + int arr[MaxSize]; + +you would iterate over the array in a forloop adding + + FixedArray[i] = i + 1; + +increasing the size and copy the existing elements to the new array, and add the +elements to the new array. + +`(int*)` before malloc is assuring that the variable ahead of it is that type as well +(casting int* to malloc), youll see this alot in C, specifically for type compatibility +reasons. The compiler will perform this conversion implicitly and issue warnings for implicit +conversions that could result in data loss... there's a hierarchy of specific casting levels. + +and `malloc` is just a function w/ a `size_t`(atleast 16bit) parameter. + + void *malloc(size_t size) + +but just to push things off into a more regular area of mind, i want to show an example +of a program, as i think the inclusive nature of it is what makes it more distinct. + + int getIndex(int param) { + return param % 5; + } + + int main() { + int array[] = {1, 2, 3, 4, 5}; + int parameter = 7; // Example parameter for a subsequent argument... + + // Assigning an array element using a function's return value as index + int result = array[getIndex(parameter)]; + + printf("Value at index %d in array: %d\n", getIndex(parameter), result); + + return 0; + } + +7mod5 equals 2 and, following along w/ the rest of the logic, why does it say the array variable's value is "3"? + +It was one of the very first things we said about arrays and their unique properties, and should be obvious +you're not counting over twice, but looking at the arrays actual index#2, and the element of that position. + +see malloc on page8, or see struct on page5 + + diff --git a/asm.html b/asm.html new file mode 100644 index 0000000..01941c8 --- /dev/null +++ b/asm.html @@ -0,0 +1,533 @@ + + + + + +assembly + + + + +I have made a new assembly page, but its still under construction, les you view it in the meantime. If youve come here by accident, +you can continue on to the next page to learn about bit manipulation and attributes. + +I'm still debating how i want to present the information here. We'll likely go over assembly/assembler, disassembly, instructions, +registers, immediate values, subroutines, control flow, atomic operations, call stack, directives, non-section directives, headers, +segments, sections, relocation, general architectural information, etc. + +You interpret this instruction [reg/val]->[destination] reading from left to right. While in Intel +syntax, the destination goes in place of the [reg/value], so it appears as: instruction [destination]<-[reg/value] + + .global _start # This is the start (equivalent of `int main`) + hello_str: + .ascii "12345\n" # A string of characters (in this case numbers) + .text # This is a "section", where the code will reside + _start: # Start execution here + movq %rsp, %rbp # To save the stack pointer into the base pointer (%rbp) + + # Write string to stdout + movq $1, %rax # This corresponds to the write system call (sets %rax to 1) + movq $1, %rdi # This corresponds to file descriptor 1 (stdout) + leaq hello_str(%rip), %rsi # Our string goes to %rsi (a pointer to hello_str) + movq $6, %rdx # Our string length, as we move (n) amount of characters (our string has six total) + syscall + + movq $60, %rax # This corresponds to the exit system call + movq $0, %rdi # Exit Code (success) + syscall + +In assembly, when you use a dollar sign w/ a symbol, e.g. `$counter`, it represents the immediate value of the address of that symbol. +This means it's using the address of the memory location `counter` rather than the content that's stored at that memory location. +When used without a dollar sign, it represents the content stored at the memory location. You'll have to know what each keyword +is, what its for and how its used. For example you have directives, symbols, flags, string literals, operands (i.e. +labels, registers, values, memory addresses, etc), instructions, conditional codes, macros, a type attribute +(which specifies the nature of symbols), etcetera. + +If you want to see more examples that elaborate on whats going on, see this page on assembly + +Makes sense so far, right?.. I think ive mentioned before, that you compile code with -o and it also specifies the name of the program. +And, you can also use -c which lets you compile it to an object file, but without linking. When you create an assembly program, +you have to do this explicit object/linking step yourself, which can be accomplished without using -c. In GCC, -o not only specifies the +name of the output executable, but, when used with "AS" in the following example, it specifies the name of the output object file too. + + AS = as # Assembler + LD = ld # Linker + CFLAGS = -g # Flags + + hive: hive.o + $(LD) -o hive hive.o + + hive.o: hive.s + $(AS) -o hive.o hive.s + + clean: + rm -f hive hive.o + +Now just to give a brief explanation of some things from our example before we unload pandora's box... + +`.global _start` is a directive that makes `_start` visible to the linker, indicating that `_start` is the entry point of the program. + +`_start` is the label defining the entry point of the program. When the program is executed, the operating system begins execution here. + +The beginning sections define/organize various parts of the program into distinct areas that in turn say how the program is stored in memory. +Segments refer to parts of a program that are used for organizing and managing different types of code and data. They are a broader concept +that describe the executable image itself and how memory is managed or mapped into the address space for a process in memory, how code and +data are laid out in it. + +For example, there is a text segment that appears in the final executable that holds the machine code instructions. When the program is +loaded into memory, the text segment is where the code is placed. When you assemble and link your program, the `.text `section is +translated into machine code and placed into the text segment of the executable file. An example of a section is `.strtab` (String Table) +which contains string data (functions and variables) used by symbols in the symbol table, which is a data structure used in compilers and +linkers to manage and track symbols (such as variable names, function names, and other identifiers) + +When making a syscall, the syscall number is placed into the %rax register. This number tells the kernel which system call to execute. +You can interact with syscalls inside of assembly on linux. There is a table from syscall_64.tbl. This file defines the system call +table for 64-bit systems. Each entry in the table corresponds to a specific system call, including its ID (number), name, and other attributes, +connecting the system call number to the actual function implemented in the kernel. + +For example, the "sys_read" entry point refers to the specific function that handles the read system call. It allows user-space programs to +read data from a file descriptor into a buffer (`read(fd, buf, count)` is the user-space version of the function) Other syscalls are: +"1" associated w/ "write", "2" associated w/ "open", "3" associated w/ close, etc. At the time of writing this, there are approximately +456 system calls that are to be placed in the %rax register. + +The syscalls.h header file typically declares the prototypes for system calls and sometimes includes necessary macros and definitions. +The DEFINE_SYSCALLx macros are used in the kernel source code to define the entry points for system calls. The "x" in DEFINE_SYSCALLx stands +for the number of arguments the syscall takes (e.g., DEFINE_SYSCALL0, DEFINE_SYSCALL1, etc.). The macros expand to define the actual function +that implements the system call and associate it with its corresponding syscall number. + + #define DEFINE_SYSCALL1(name, type1, arg1) \ + asmlinkage long sys_##name(type1 arg1) + + DEFINE_SYSCALL1(getpid, void); + +This defines a system call getpid that takes no arguments and returns the process ID. When making a syscall, a file descriptor would be +passed into the %rdi register. Here's how some common syscall arguments map to registers: for example, making a read syscall, %rax +contains the syscall number, %rdi (first argument) would contain the file descriptor, %rsi (second argument) would contain the buffer's address +(buffer pointer), and %rdx (third argument) would contain the size or flags (the number of bytes to read) + +Speaking of, in the context of x86 assembly and architecture, there's a flags register, the EFLAGS (Extended Flags) register, which is used to +hold the status and control flags for the processor. The associated flags are specific to x86 (e.g. `clc` or clear carry flag, which is useful +for preparing the status flags for subsequent arithmetic or logical operations, ensuring that the carry condition is explicitly handled for the +needs of the program). These flags can potentially be used in an assembly instruction. The GAS (GNU assembler) or whatever assembler you choose +will provide the instructions to manipulate the said-additional features of the architecture. + +Mastering low-level system architecture and code analysis requires more than just knowing how to write instructions; it involves a deep +understanding of the underlying CPU architecture, interactions, CPU front end, how data is structured and managed, and how it interprets +and how it executes code. Each of these architectural features—that is, registers, selectors, descriptors, stack frames, opcode encoding, +data types/structures, paging structures, condition codes, exceptions and interrupts, CPUID, global, local and interrupt descriptor tables +and task state segments, etcetera—is what you'd need to know to understand x86 and assembly in general. + +I'll just very briefly discuss a couple of what those that i said. The Interrupt Descriptor Table (IDT) is a data structure used by the CPU to +map interrupts and exceptions to their corresponding handler routines. When an exception or interrupt occurs, the CPU looks up the IDT to find +the address of the appropriate handler that should be executed... + +Global descriptor tables define global memory segments and their attributes, while local descriptor tables define local memory segments, +specific to individual tasks. The task state segment is a special segment that contains information about a task’s state. This includes CPU +register values, stack pointers, and other information needed to resume a task after a context switch. We wont be going over context switches. +

Summary

+These segments are of course different to the sections or segments (parts of an executable or object file) that we are going to be +referencing in the context of ELF format. Instead, these segments are part of the CPU’s memory management and protection mechanisms. +Theres alot more to x86 architecture that we wont be going over here. Its best that we dont turn this into a bottomless pit of information, +and focus on how to read and write assembly code, as well as how to debug code on your own. + +Disassembly is the process of converting machine code back into assembly language. A disassembler reads the binary machine code and +translates it into human-readable assembly instructions. The assembler ("as" in GNU toolchain, masm, nasm, etc.) converts this assembly +into machine code, producing an object file (`.o` extension). An object file contains information such as the sections we talked about, +including headers with metadata, a symbol table for linking and debugging, relocation information for address adjustments, debugging +information for source code mapping, and a string table for names used in the object file. + +Even though an object file contains machine code with the aforementioned context, it isn't fully ready to run because the addresses of +variables, functions, and other resources aren't yet fully determined. That is, it does not define an entry point, and does not have +a program header yet. It does contain unresolved symbols and relocation entries, which arent resolved until the linking phase. + +This is where relocation comes into play. We'll talk more about this later. For now, lets go into the contents of an +assembly file, then afterwards i will touch on the relationships between assembly, disassembly and the ELF file structure. + +Disassembling your assembly or compiled code shows you both the instructions and the machine code that implements them. +Not only are there hundreds of different x86 instructions, there can be dozens of different machine code encodings for +a given instruction (more on this later) + + ASM MACHINE CODE DESCRIPTION + add 0x03 ModR/M Add one 32-bit register to another + mov 0x8B ModR/M Move one 32-bit register to another + mov 0xB8 DWORD Move a 32-bit constant into register eax + ret 0xC3 Returns from current function + xor 0x33 ModR/M XOR one 32-bit register with another + xor 0x34 BYTE XOR register al with this 8-bit constant + +

Registers

+General-purpose registers (64-bit Registers) may look like: %rax (accumulator register), %rbx (base register), %rcx (counter register), +and %rdx (data register). Additional 64-bit Registers are: %rsi (source index register), %rdi (destination index register), +%rbp (base pointer register) and %rsp (stack pointer register) + +16-bit and 8-bit Versions (lower part of the corresponding 32/64-bit register): +- %ax, %ah, %al: Accumulator (full, high, low) +- %bx, %bh, %bl: Base (full, high, low) +- %cx, %ch, %cl: Counter (full, high, low) +- %dx, %dh, %dl: Data (full, high, low) +- ... and so on. + +Special-purpose registers: +%rip: Instruction pointer (contains the address of the next instruction to be executed) +%rsp: Stack pointer (points to the top of the stack) +%rbp: Base pointer (used to point to the base of the current stack frame) +%flags: Flags register (contains various condition code flags) +%rflags: Full register including flags +%r8 - r15: Additional general-purpose registers + +Segment registers: +%cs: Code segment +%ds: Data segment +%ss: Stack segment +%es, %fs, %gs: Extra segments (often used for additional purposes like thread-local storage) + +Suffixes like b, w, l, and q denote the size of the data being operated on: "b" (byte) is 8 bits, "w" (word) is 16 bits, "l" (long) is 32 bits, +"q" (quad) is 64 bits. Load/Store instructions, for example: "movb" moves a byte of data, "movw" moves a word of data, "movl" moves a double +word (or long) of data, and "movq" moves a quad word of data. + +Other data movement variants are the "movs" instruction for moving and optionally sign-extending or zero-extending data from one location to another. +In simpler terms its used to move data between strings (we'll explain the terms above later) + +Control Transfer Instructions: jmp (unconditional jump), je, jne, jg, etc. (conditional jumps), call (to call a procedure/subroutine), +ret (to return from a procedure) + +Conditional Move Instructions: cmov (conditional move based on flags) + +A Procedure, "subroutine", or what i'll just be referring to as a "function", transfers control to a specified address and saves the return +address, allowing the program to return to the original point after the subroutine completes its execution. This function is called `print_hello`: + + print_hello: + # Write string to stdout + mov $1, %rax + mov $1, %rdi + lea msg(%rip), %rsi + mov $13, %rdx + syscall + ret + +The call instruction handles pushing the return address, e.g. (call print_hello). It invokes a function from within _start or another function, +where the ret instruction is used at the end of the function. It pops the return address from the stack and jumps to that address, effectively +returning control to the point right after where the call was made. So, you do not need to manually push or pop the return address onto the +stack when using call and ret instructions. + +`push` and `pop` are used when you need to manually manage data on the stack. These instructions are useful for saving and restoring the +values of registers, passing parameters to functions, or managing local variables. + +Unlike higher-level languages, assembly doesnt have a builtin called a structure or union. Instead, you manually manage memory and access +fields using "offsets". Control flow in assembly often involves manipulating flags and using conditional jumps to change the execution path. + +Arithmetic Instructions are: add, sub, div, imul (signed), mul (unigned multiplication) + +Packed decimal operations are essential in applications where exact decimal representation is important. Unlike binary arithmetic, which can +introduce rounding errors in decimal calculations, packed decimal arithmetic ensures precision by maintaining the decimal format within operations. + +Packed decimal operands, also known as Binary-Coded Decimal (BCD) operands, handle decimal arithmetic operations in a way that's directly aligned w/ +decimal digits. Each decimal digit is stored in a 4-bit nibble (half of a byte). This allows two decimal digits to be stored in a single byte. +For example, the decimal number "93" would be stored as 0x93 in packed decimal format, where 9 is represented by 1001 and 3 by 0011 in binary. + +BCD (Binary-Coded Decimal) is a binary-encoded representation of integer values where each digit of a decimal number is represented by its own +binary sequence. Packed BCD as mentioned has two decimal digits per byte, where unpacked BCD has each decimal digit stored in a separate byte. +Operations on packed decimal formats often involve specific instructions designed to handle the peculiarities of decimal arithmetic: + +AAD (ASCII Adjust AX Before Division), i.e. `aad` instruction adjusts the AX register to prepare for a division of BCD numbers. It converts +packed BCD in AX to binary before performing a division. If you have packed BCD digits in AX and you need to divide these digits, AAD converts +them to binary form so that a division can be performed correctly. + +Floating-Point Instructions: fld, fstp (load and store, for floating-point values), fadd, fsub, fmul, fdiv (floating-point arithmetic) + +Floating Point Registers are used by the x87 floating-point unit (FPU) to perform floating-point arithmetic. In x86 architecture, the FPU is depicted +by the x87 FPU stack, which consists of 8 registers (st(0)—st(7)) which are eight 80-bit wide floating-point registers. The x87 registers work as a +stack, where operations typically push and pop values to and from the stack. Instructions like fld (load), fadd (add), fsqrt (square root), and +others manipulate these registers. + +For historic value, x87 refers to the specific co-processor model number, the 8087, which was the first FPU (released in 1980) designed to +work alongside the 8086/8088 CPUs. The 8087 handled floating-point arithmetic that the base 8086/8088 CPU did not directly support. P.S. +advanced features like out-of-order execution, superscalar architecture, and dynamic branch prediction didnt come out until much later. + +Logical Instructions: bitwise AND is `and`, bitwise OR is `or`, bitwise XOR is `xor, bitwise NOT is `not` + +Bit manipulation and common idioms: shl, shr (shift left/right), rol, ror (rotate left/right) + +XOR (`xor`) can also be used to set the value of (zero'ing out) a register to 0, and is a common idiom in assembly. `xor` is a logical +operation that doesn’t depend on the previous value of the register. This means that using `xor` to zero a register can break data +dependencies, allowing for better pipelining in modern CPUs. + +Setting a register to -1 (all bits set to 1) is often done with (`or`) or (`not`). The `test` instruction is similar to `and` but doesn’t +store the result, just sets the flags. It’s often used to check if a register is zero: + + test %rax, %rax + jz zero_label # Then jump if zero + +Multiplication by a power of 2 can be done more efficiently with a shift left operation. + + shl $3, %rax # Multiply %rax by 8 (2^3) + +These instructions are used to sign-extend values from smaller to larger registers. + + mov %al, %eax # Zero-extend 8-bit value to 32 bits + cbw # Sign-extend %ax to %eax (convert byte to word) + +For an 8-bit "signed" integer, the range is from -128 to 127. The number -12 (decimal) is represented in binary as 11110100; When extended +to 16 bits using sign extension, the result would be 11111111 11110100 (the most significant bit is "1", indicating a negative number). +The original 8 bits are preserved, and the additional 8 bits are filled with 1s to maintain the negative value. + +Sign extension is used when you need to preserve the sign (positive or negative) of a value when converting it from a smaller size to a larger size. +We mentioned sign extension earlier. There are other instructions its performed with like cwd (convert word to doubleword) cdq (convert doubleword +to quadword) cqo (convert quadword to octoword) You can move with a sign extension, i.e. movsx (move with sign extension) movsxd (move with sign +extension doubleword) Sign extension is always used with signed data. + +Truncation, or reducing the size of a value by discarding higher-order bits can lead to loss of data. That is, in the same way we filled the +additional 8 bits with ones, the same is true for zero extension, but using "zeroes" instead. Zero extension (e.g. movzx) converts a smaller +unsigned value to a larger size, and thus it must always be used with unsigned data. + +`nop` (no operation) does nothing, consuming a single clock cycle. It is used for padding instructions, often in aligning code +or creating delay loops. + +Comparison Instructions: cmp to compare two operands + +String Instructions are: cmps (compare strings), scas (scan string), + +Stack Instructions are: push (push data onto the stack), pop (pop data from the stack) +

Call Stack

+The call stack is divided up into contiguous pieces called stack frames ("frames" for short), wherein each frame is the data associated with +one call to one function. The frame contains: (1) the arguments given to the function, (2) the function's local variables, and (3) the address +at which the function is executing. + +When your program is started, the stack has only one frame (that of the function "main"). This is called the initial frame or the outermost frame. +Each time a function is called, a new frame is made. Each time a function returns, the frame for that function invocation is eliminated. +If a function is recursive, there can be many frames for the same function. The frame for the function in which execution is actually occurring +is called the innermost frame. This is the most recently created of all the stack frames that still exist + +There's a conceptual idea about how the stack grows from the *"bottom->up"*... Higher memory addresses are at the "bottom" of the stack, and +lower memory addresses are at the "top." *push* meaning it goes onto the stack, *pop* means it gets removed, hence "popped-off the stack" +—which means it gets popped off from the "TOP" of the stack—where the lower memory addresses are at... Thus the stack pointer then gets incremented +(increased in value) to point to the new top element, where memory is descending, as we've so aptly illustrated. + +Each architecture has a convention for choosing one of those bytes, whose address serves as the address of the frame. Usually this address is kept +in a register called the frame pointer register, while execution is going on in that frame. + +The memory allocator (malloc, free, etc. in C) can check if there is enough space to expand the heap. The OS can enforce limits on stack size (e.g., +via ulimit settings in Unix-like systems) to prevent the stack from growing indefinitely. If a collision is imminent, the OS can terminate the process +or raise an error to prevent corruption. + +The compiler generates code that manages stack allocation and deallocation for function calls. It inserts instructions to adjust the SP register and +manage the stack frames. The memory allocator handles requests for dynamic memory allocation. The internal mechanism of our memory allocator keeps +track of free and allocated memory blocks within the heap using data structures such as free lists (data structures used by memory allocators to manage +and organize available memory blocks of specific sizes within the heap) or binary trees. + +When a program requests memory, the allocator finds a suitable free block, marks it as allocated, and returns a pointer to the program. When memory is +freed, the allocator marks the block as free and may merge adjacent free blocks to reduce fragmentation. + +The OS manages the overall memory space for each process. It provides system calls like brk and sbrk to increase the size of the heap. Modern systems +may use more advanced mechanisms like mmap for large allocations. The OS ensures that the heap and stack do not collide by imposing limits on their +growth and monitoring their usage. + +There's two main kinds of interrupts, "software" and "hardware" interrupts... software are identified by the "int" assembly instruction... and these trigger +instructions within the program (e.g. system calls/request from os)... hardware are strictly generated by external devices or internal processor events... +such as (keyboard input, timer events, disk operations)... InterruptServiceRoutines (IRS) are special routines that handle interrupts. +

More Instructions

+Atomic Operations: +These are some of the instructions used for parallel and atomic operations. They provide mechanisms for ensuring atomicity, synchronization, and +ordering of memory operations in multi-threaded or multi-processor environments. + +lock: This prefix is used to ensure atomicity when performing operations on memory locations shared between multiple processors. + +Atomic Compare-and-Swap Instructions: +cmpxchg: Performs a compare-and-swap operation on a memory location. +cmpxchg8b: Performs an 8-byte compare-and-swap operation on a memory location. +cmpxchg16b: Performs a 16-byte compare-and-swap operation on a memory location (available on 64-bit CPUs). + +Atomic Increment and Decrement Instructions: +lock inc: Atomically increments the value of a memory location. +lock dec: Atomically decrements the value of a memory location. + +Atomic Exchange Instructions: +xchg: Exchanges the contents of a register with a memory location atomically. +xadd: Atomic exchange and add operation. Exchanges the contents of a register with a memory location and then adds the original value of the +memory location to the register. + +Fence Instructions: +mfence: Memory fence instruction ensures that all memory operations before the fence are globally visible before any memory operations after the fence. +lfence: Loads fence instruction ensures that all load memory operations before the fence are globally visible before any memory operations after the fence. +sfence: Stores fence instruction ensures that all store memory operations before the fence are globally visible before any memory operations after the fence. + +Unique Instructions, e.g., lea: Load Effect Address computes the address of a memory location and loads it into a register, but it does not access the +memory at that address. It performs address calculation and is often used for arithmetic operations that involve memory addresses. +

More Registers

+SIMD Vector Registers, for example: 64-bit MMX (mm0-mm7) i.e. MultiMedia eXtensions perform operations on multiple integer values simultaneously and +in parallel, such as w/ packed bytes, words, and doublewords. + +SIMD Floating-Point Vector Registers are used for SIMD (Single Instruction, Multiple Data) operations, which can perform the same operation on multiple +data points simultaneously as well. Examples include 128-bit XMM registers (used with SSE and SSE2 instructions), 256-bit YMM registers (used with AVX and +AVX2 instructions), and 512-bit ZMM registers (used with AVX-512 instructions) (e.g., %xmm0–%xmm15, %ymm0–%ymm15, %zmm0–%zmm15) + +Instructions like addps (Add Packed Single-Precision Floating-Point Values), mulps (Multiply Packed Single-Precision Floating-Point Values), vaddps +(Vector Add Packed Single-Precision Floating-Point Values) operate on these registers. + +Control Registers are used to control various aspects of the CPU's operation, such as enabling protected mode or paging. Examples include CR0, CR2, CR3, and CR4. + +Debug Registers are used primarily for debugging purposes. They allow setting hardware breakpoints and control debugging features. Examples include DR0, DR1, +DR2, DR3, DR6, and DR7. + +Model-Specific Registers are used to control and report on various CPU-specific features, such as performance monitoring, power management, and system +configuration. They are often accessed using the rdmsr and wrmsr instructions. + +Table Registers include the Global Descriptor Table (GDT), Local Descriptor Table (LDT), Interrupt Descriptor Table (IDT), and Task Register (TR) + +

Directives

+Directives provide additional information to the assembler, helping with data allocation, defining sections, etc... +The following are specific to `.section ...`, e.g. + + .section .text + +.data defines a section for initialized data +.bss defines a section for uninitialized data (BSS is Block Started by Symbol) +.text defines a section for the program's code + +There are quite a few more than this of course... Lets describe each section: The Data Section is reserved for initialized data, constants, and possibly +space for uninitialized data (BSS). For example in, + + .section .data + mylabel: + .long 1, 2, 3, 4 + +mylabel is the label for an array with (.long directive) 32-bit integers, and in this context it initializes the data w/ 1, 2, 3 & 4 + +The Text Section (Code Section) contains the executable code. A Function Prologue prepares the function for execution. It saves register values that need +to be preserved (push). It also allocates space for local variables and function parameters. The Function Epilogue cleans up after the function execution. +It deallocate space for local variables and parameters (add esp), restores saved register values (pop) and returns from the function (ret). +

Non-Section Directives

+.if, .elif, .else, .endif, are conditional directives that allow you to include or exclude parts of the assembly code based on certain conditions. +.set defines a symbol with a value, similar to .equ, but can be redefined within the same assembly source. +.align, aligns the data or code to a specified boundary. This is useful for optimizing memory access or satisfying some required width. +.offset is used to calculate the offset of a symbol relative to a base address, often in conjunction with linker scripts. +.global declares symbols as global, making them accessible from other files or modules. For example, .global _start makes the _start symbol available +for linking; .local marks a symbol as local to the file, meaning it is not visible outside of it (restricted to the current file) +.extern declares symbols defined in other files (external and defined in another file) +.comm declares a common symbol, which is a global symbol that is allocated space in memory. The linker will resolve this symbol. +.equ defines a symbolic constant. For example, .equ BUFFER_SIZE, 1024 creates a symbolic constant named BUFFER_SIZE with the value 1024. +.type sets the type of a symbol. Commonly used with ELF file formats to define symbol types (e.g., STT_FUNC for functions). +.size specifies the size of a symbol. This is useful for debugging information and certain linker operations. +.file sets the current file name for debugging information, helpful for tools that process debug information. + +.macro and .endm, define and end macros... and .rodata is associated w/ readonly data, much like the `const` keyword +.weak declares a symbol as weak, meaning it can be overridden by a symbol of the same name with higher precedence. +.previous reverts to the previous section settings; Useful when you have multiple sections in the same file and want to switch to an earlier section. +`.hidden _internal_symbol` is an example of marking a symbol as hidden from the dynamic linker, preventing it from being exposed in shared libraries. + +You can use directives like .byte, .word, .long, and .quad to define sequences of memory with specific initial values. Each directive is used to reserve +and initialize a block of memory with data of different sizes. + +.space is for reserving a `n` amount of space in the section it's used (typically .bss for uninitialized data) and is a more general directive to operand + + .section .bss + fd: .space 8 + +This reserves 8 bytes (quadword) for the file descriptor... There's many reasons you might want to reserve something. In this case, you store the +file descriptor for later use. This can be useful if you need to keep the file descriptor around for multiple operations, wherein it needs to be accessed +or modified in different parts of the program, or maybe you just want to clearly delineate the logic this way. +

More Assembly/Instruction

+In many situations it may be necessary to use the Scale, Index, and Base (SIB) byte when you need to access memory locations based on a combination of +registers and constants. An example is for arrays and multi-dimensional data, where the SIB byte provides a direct and efficient way to calculate the +address of elements based on their index and the size of each element. This is important when dealing with large datasets or when performance is critical. + + mov eax, [ebx*4 + esi] + +Here, ebx is the index register, 4 is the scale factor (since each integer is 4 bytes), and esi (base address) is a displacement that points to the +start of the array. + +Many instructions use a single byte for the opcode. For example, the mov instruction can use the 0x8A opcode for moving data between a register and memory. +Some instructions require a two-byte opcode. For example, the mov instruction with a 0x0F prefix indicates a two-byte opcode is needed for certain operations. +The MOD R/M byte follows the opcode (whether one or two bytes) to specify the details of the operands. For example, for the instruction `mov %ebx, %eax`, +the MOD R/M byte specifies that both operands are registers. The MOD R/M byte is just a part of the instruction encoding that specifies how operands are +addressed in an instruction. + +More specifically ModR/M specifies what the source and destination are. Separating it into its constituent parts (MOD, REG, R/M): MOD (2 bits) which acts as +a selector (a field within a byte(s) that specifies a particular option or operand) is indicative of whether R/M is treated as a plain register or a memory +address. It also determines if there is additional data, such as displacement bytes for memory addressing. REG specifies the register involved in the +operation, which is usually the destination register and determines the column in the ModR/M table. And a final register "R/M", which usually specifies the +source register, and selects the row of the ModR/M table. + +This is just one part of the machine code instruction format, which refers to the binary encoding of instructions, including prefix bytes, opcodes, etc., +I'm not gonna go into every part as it would take significantly longer to explain. Look online to learn about it. +

Executable and Object File Information

+When you run the objdump command on a compiled binary, you're inspecting the low-level details of the binary, including its disassembled code. +And it will show disassembly information, such as for a given function e.g. + + 0000000000400b00 <do_something>: + ... (disassembly of the function) + 400be0: 48 89 e5 mov %rsp,%rbp + 400be3: 55 push %rbp + +The line with `0000000000400b00 <do_something>` indicates the starting memory address of the function do_something (within the binary), +w/ the hex address 0x400b00 showing where the function begins in the program's memory space. The proceeding lines are machine instructions translated +into assembly language. The left side (400be0, 400be3) shows the offset within the function where each instruction occurs, relative to the start +of the function. The right side shows the machine code as hexadecimal bytes (e.g., 48 89 e5) and the corresponding assembly instruction (e.g., +mov %rsp,%rbp). The objdump command can give you alot of information about a program, including a comprehensive view of the binary's structure +and contents. +

ELF File Structure

+The next question we need to ask is, "How is this data structured, and how is it executed?" An ELF (Executable and Linkable Format) file serves as +a container for compiled code and data, allowing programs to be executed by the operating system. It contains sections for code, initialized and +uninitialized data, and metadata needed for execution. For executable files, it includes details such as the entry point now, where the program +starts, and instructions for loading and linking dynamically. Essentially, an ELF file provides the structure necessary for a program to run, +whether as a standalone executable or as part of a larger application. + +The ELF header is conceptually placed at the beginning before everything, so you wont actually see it. Instructions like mov, call, jmp, etc., +directly manipulate program state and control flow, and are tied to specific segments loaded into memory. Sections are named blocks like .text, +.data, .bss, .rodata, .symtab, .strtab, etc. and the ELF header helps make sure that these sections are correctly placed and managed during program +execution. + +The Program Header describes a segment within the ELF file, for how portions of the file should be loaded into memory, and a Section Header defines +attributes for each section within the ELF file. + +Instructions like mov -0x20(%rbp), %rax or mov 0x98(%rax), %rax are accessing specific memory locations relative to the base pointer (rbp) or other +registers (rax, rdx, etc.). The offsets and addresses used in these instructions (a negative offset in this case) align with the segment and section +definitions in the ELF structure (Program Header and Section Header), for the proper memory map/access during execution. + +The ELF file contains a symbol table that lists all the symbols referenced or defined within the file. Each symbol entry has a name, type +(function, variable, etc.), and potentially a section index indicating where it's defined. It may also contain one or more relocation sections. + +Many symbol table entries are associated with a section. This association tells the linker (during program creation) where to find the symbol's +definition within the object file. The section index within the symbol table entry points to the corresponding section header in the object file, +allowing the linker to resolve references between symbols across those files. These sections hold entries that specify how to adjust symbol +references based on their relocation type. You wont typically see them directly in a segment/section view when looking at functions, etcetera + +

Relocation

+Relocations are entries within the ELF file that instruct the linker/loader on how to adjust symbol addresses at runtime. While these entries might +reference the symbol itself, they are distinct from the symbol information that you see. For example, say we have an ELF formatted file generated +from a program called hive that references a function do_something defined in a shared library mysharedlib.so. The symbol table in hive +would have an entry for do_something, and there might be a relocation section indicating that references to this symbol needs to be adjusted by a +certain value when loaded into memory. + +Each entry in a relocation table contains information such as: location, type and symbol. Location refers to the address within the section where +the relocation should be applied, specified by the offset field. An offset is the address within the executable where the symbol reference needs to +be adjusted. + +The relocation type is part of the info field and tells the linker how to process the relocation, a key part of how the info field is used. +Some relocation types might involve adding the symbol's value to the offset. `info` encodes the relocation type (like, adding a base address, +absolute address or relative address) and an index into the symbol table (symbol index). Some relocation types might involve adding the symbol's +value to the offset. The symbol index within the info field points to the symbol table entry from which the value will be used for relocation. + +Linking is the process that's responsible for resolving placeholders and offsets using the information in the relocation tables (to produce the +executable w/ all of its addresses set). Function calls and variable references contain placeholders or offsets. Offsets are relative distances +from a certain reference point, typically used within the same module. When generating machine code, the compiler inserts this placeholder (often +a zero or an address that can be easily identified as needing replacement) wherever the actual address of a function or variable is needed. + +These placeholders indicate unresolved references, which must be replaced with the addresses during the linking stage. Sometimes, especially for +internal references within the same object file, the compiler uses offsets (relative addresses) as opposed to the absolute addresses. Offsets can +indicate the distance from a certain base address (like the start of a function) to where the actual code or data resides. + +Relocation is thus the process of adjusting these addresses so that the code can run properly at the point where everything is put together. +So to reiterate, first the compiler translates source code into object files. Each object file contains machine code, symbol definitions (functions and +variables it defines), and symbol references (functions and variables it needs). These symbols aren’t fully resolved; their addresses are placeholders. + +The linker takes multiple object files and combines them into a single executable or shared library. During this process, the linker resolves all symbol +references by updating the placeholders with actual memory addresses. The linker updates the machine code with these addresses so that function calls and +variable accesses point to the correct locations. Therefore, the relocation tables in the object files guide the linker on where adjustments are needed. + +After relocation, only the addresses are present in the final executable. Here's some extra points to consider: We know sections in an ELF file are +defined in the section header table. Each section has an address field called sh_addr in its section header. The purpose of these addresses is to +facilitate linking and relocation processes. sh_addr, is where the section should reside in the virtual address space of the process when loaded into +memory, sh_offset (file offset) is the position within the ELF file where the section's data starts. The section header has several fields like this. + + diff --git a/asm2.html b/asm2.html new file mode 100644 index 0000000..7623924 --- /dev/null +++ b/asm2.html @@ -0,0 +1,45 @@ + + + + + +assembly2 + + + +Note, you can convert any C file to assembly w/ gcc -S hive.c -o hive.s + +Lets begin to experiment. We can try to generate an assembly out of a simple C source file. +At that point, we can try mirroring what we began doing by adding the _start to it somewhere... + +When you run the compiled binary, it starts executing from the function _start provided by the +C runtime, which performs various initializations, like setting up the stack and environment, etc. + +After these initializations, the CRT's _start calls main(), the user-defined entry point. + +`Scrt1.o` is the default entry point provided by the C runtime, which gcc links against by default. +If we were to provide our own _start symbol, it would conflict with this predefined entry point. + +Our custom program must be able to bypass this process, and by using the -e _start flag, you tell +the linker to treat your custom _start function as the entry point, instead of the default entry +point provided by the C runtime. We must also construct the Makefile differently by compiling w/ +`gcc` specified, telling gcc not to use its default startup files, instead, to use our custom +entry point. + +all: hive + +hive: hive.o + gcc -o hive hive.o -nostartfiles -e _start -lc + +hive.o: hive.s + as -o hive.o hive.s + +clean: + rm -f hive hive.o + +-nostartfiles tells gcc not to use the standard startup files (like Scrt1.o) +-e _start sets the entry point to _start (custom entry point) + Without this flag, the linker expects to start at main +-lc links against the C standard library (libc) + which is necessary for printf + diff --git a/byte.html b/byte.html new file mode 100644 index 0000000..d2fad0c --- /dev/null +++ b/byte.html @@ -0,0 +1,31 @@ + + + + + +byte + + + +You can represent 256 things w/ a byte (8 bits = 1 byte) +The bits in a byte have numbers. +The rightmost bit is bit 0, and the left hand one is bit 7 +Those two bits also have names. The rightmost is the least +significant bit.   The leftmost within that set of bits +would be the most significant.   The largest number you can +represent with 8 bits is 11111111, or 255 in decimal notation. +00000000 is the smallest in that set. Logical operators compare + +A 32-bit signed integer is an integer whose value +is represented in 32 bits (i.e. 4 bytes). +Bits are binary, meaning they may only be a zero or a one. +Thus, the 32-bit signed integer is a string of 32 zeros and ones. +The signed part of the integer refers to its ability to represent +both positive and negative values. A positive integer will have its +most significant bit (the leading bit) be a zero, while a +negative integer will have its most significant bit be 1 +Because of this,   the most significant bit of a signed integer +is typically called the 'sign bit', since its purpose is to denote the +sign of the integer + + diff --git a/compiler.html b/compiler.html new file mode 100644 index 0000000..2c88b12 --- /dev/null +++ b/compiler.html @@ -0,0 +1,220 @@ + + + + + +compilation steps + + + + +Introduction +If you're interesting in the compiler specifics, we should have +a high level overview of it... keep mind there may also be optimi- +zation steps interspersed throughout the process, as well as specific +implementations/intermediate representations specific to C/GCC... + +Preprocessor +This involves processing the source code before actual compilation begins; +Expands macros (textual substitutions), includes header files and handles +conditional compilation directives. For macro expansion, it'll replace macros +w/ their corresponding code per `#define` statement. Then it processes +`#include` directives to include the content of headers into the source code. +Conditions are handled such as; `#ifdef`, `#ifndef`, `#else`, `#elif` and `#endif` +directives to include (or exclude) portions of code based on preprocessor- +defined conditions. Comments are removed as well during this time. +And lastly, it generates line information for the compiler to use +in error messages and debugging. + +Lexer +In order to intuit the lexical analysis and any subsuquent stages, we have +to understand what it means to do such a thing. In summary, we arent making +a regular program, but rather we're defining a language construct or a comp- +onent of how that language construct comes to be. We begin first w/ the input +source code itself, which is recognized as lexemes. This means that the lexer +identifies fundamental constructs such as literals, keywords, literals, operators, +identifiers, whitespace. Then each construct is associated w/ a token TYPE and a +VALUE, representing its category and specific content. For example, the character(s) +`5` might be recognized as a numeric, literal token called `NUMBER` with a +value `5`, while the characters `int` might be recognized as a keyword token +`TYPE_SPECIFIER`, w/ a value of `INT`. The output of the lexical analysis stage +looks like a sequence of tokens, each representing the "recognized construct" +in the input source code. While the lexer identifies tokens and their types +values, it also processes individual characters to recognize token boundaries +and patterns. For example, when the lexer encounters the characters `int`... +it'll recognize that as being a keyword token (as in the bilateral method we +described), but it also processes each character individually (i, n, t) and +determines if they match the pattern for a said keyword token. So in other +words, before a word becomes tokenized, each character has to be analyzed. + +Parser +After lexical analysis it does infix to postfix conversion w/ a special algorithm. Then, +takes the stream of tokens produced by the lexer and constructs an Abstract Syntax Tree (AST) +using the Context Free Grammar (which describes the syntactic structure of the programming +language) Production rules specify how it'll be composed of other constructs. One aspect +to this would involve the precedence and associativity of operators. Parentheses ( ) are +used to indicate grouping or precedence in expressions. They will help clarify the order +of operations and ensure that expressions are evaluated correctly. Terminal symbols are +those TOKEN/VALUES that the lexer produced. Non-Terminal symbols are a broader concept of +token that says something about the relationship between both Terminal and non Terminal +(For example an identifier and a declaration) So they can be expanded into sequences of +terminal OR non-terminal symbols. Non-terminal symbols represent abstract syntactic cat- +egories or constructs in the language, such as expressions, statements, declarations, etc. + +A parse tree is a hierarchical representation of the syntactic structure of a program, +where each node is associated with a non-terminal symbol in the grammar, and each leaf +corresponds to a terminal symbol in the input. Trees (such as a leftmost derivation) +describe these structures where each right-hand production rule is replaced with the +leftmost non-terminal symbol in the current sentential form, therefore starting from +the left-most symbol and iterating through the rest of the grammar rules we make a +representation of the language that can be further analyzed or parsed. Parentheses and +other grouping symbols in the input are reflected in the structure of the parse tree, +along w/ their associated nodes (nodes that represent the grouping of expressions) + +Shift-reduce parsing comes into play during this phase. It is a bottom-up parsing +technique used to construct the parse tree. The shift operation reads the next input +symbol and pushes it onto a stack. The reduce operation looks at the top of the stack +to find a sequence matching the right-hand side of a production rule and replaces it +with the corresponding non-terminal from the left-hand side of the rule. +This continues until the entire input is consumed and the stack contain +the start symbol of the grammar, indicating successful parsing. + +A parse tree also has parse leaves, and these parse leaves are considered the terminal +symbols at the bottom level of the parse tree. The bottom level is the "Last" remaining +things evaluated in a tree such as a digit (e.g. 3), or a TOKEN that was once an identifier +Once the parser reaches a parse leaf, it has successfully recognized a complete unit of the +input language, and no further parsing is required for that subtree. + +In recursive descent parsing, each non terminal symbol in the grammar is associated with a +parsing function. These parsing functions are responsible for recognizing and processing +a prior language construct represented by the non-terminals, for example if there's a +non-terminal symbol `Expr` representing an expression in the grammar, there would be a +parsing function named `parseExpr` to handle expressions. The parsing process typically +starts with a designated non-terminal symbol representing the entire statement. + +Each parsing function contributes to the construction of the parse tree, building the +tree from the root (start symbol) down to the leaves (terminal symbols). The starting symbol +serves as the entry point for parsing the input sentences. + +When the parser encounters a non-terminal symbol during parsing, it calls the associated +parsing function to handle that symbol. These parsing functions are considered recursive, +as they refer to a nested structure, for e.g... a parsing function for `Expr` may recursively +call itself to handle sub-expressions. Each parsing function has a base case that handles +terminal symbols in the input. When a parsing function encounters a terminal symbol, it +matches the token against the expected input and consumes the token if it indeed matches. +If the token does NOT match the expected input, the parsing function may report an error, +where it'll backtrack and try alternative parsing paths to recover from errors. + +Semantic Analysis +Checks the meaning and consistency of the program beyond its syntactic structure. +It goes beyond the grammar rules and examines the program's semantics to catch potential +errors and ensure that the program behaves as intended. + +Intermediate Code Gen +The AST is translated into an intermediate code representation. This code is typically +closer to the target machine code but remains independent of the specific hardware architecture. +This code generation handles complex expressions, assignments, control flow, structures, +and other language constructs, translating them into a form suitable for optimization. +The compiler will manage a symbol table which keeps track of variable names, types, +and other relevant information. This information is crucial for later stages. + +Optimization +This stage invokes those simplifications that may require constant folding, reducing algebraic +expressions, and common subexpression eliminiation. It analyzes and modifies the code structure +to enhance control flow; This can include loop unrolling, loop fusion, and other techniques to +enhance branch predition. It'll also examine how data is used and propogated through the program, +renaming variables and eliminiating any dead code. Then it replaces function calls with the actuial +code of the function, reducing the overhead of said call instructions. This optimization can span +multiple functions, and it'll make sure to manage memory process registers and any concurrent loops. + +Code Generation +Maps the abstract operations in the intermediate code to specific machine instructions or +assembly language instructions; Assigns variables and values to processor registers, +As well as optimizing exectution time and minimzing memory access. This means it'll +determine how memory addresses are calculated and accessed, as well as the order of +instructions theat make the most efficient use of the processor's resources. +It also inserts code to handle exceptions and interrupts. Next, it allocates +and manages space on the call stack for funtion parameters, local variables, +and return addresses. Finally it'll generate the machine code or assembly +based on the decisions made during instruction selection, register alloc- +ation, and other considerations. + +Assemblygo to asm +The compiler translates high-level code (through intermediate representations) into assembly +language, which the assembler then processes, or to put it plainly, it generates assembly +code from the AST. The assembly code itself consists of human-readable mnemonics and +operands that correspond to the machine instructions of the target architecture. + +Symbol resolution is required to maintain a symbol table that tracks labels (symbolic +names) defined in the assembly code. It assigns memory addresses to labels, either explicitly +or during later stages. Instruction encoding is for translating assembly instructions into +machine code opcodes (operation codes) and encoded operands. It'll map each assembly +instruction to its corresponding machine code instruction, potentially involving +multiple machine code instructions for complex operations. It inserts opcodes and +encoded operands based on instruction type, operand types, and addressing modes +specified in the assembly code. + +Relocation Processing is optional, in scenarios where absolute memory addresses +cannot be determined completely during assembly (e.g., linking with external libraries), +relocation entries might be generated. These entries mark locations within the object +file that require adjustment during the linking stage when final memory addresses become available. + +The final output of the assembly stage is an object file. This file contains the machine code +instructions translated from the assembly code, along with additional information such as: +ymbol tables (if not stripped), Relocation entries (if applicable) Header information +describing the object file format. + +Assembler +Translates assembly language code into machine code. It takes the human-readable assembly +code and converts it into the binary code that the computer's CPU can execute directly. +The assembler performs tasks such as resolving symbolic addresses (like labels) to actual +memory addresses, generating machine code instructions, and producing an object file +containing the translation of binary instructions and additional information. + +Linking +If your program consists of multiple source files or modules, the linker combines the +object files and resolves references between them. It ensures that functions and variables +used in one module are correctly linked to their definitions in other modules. +The linker may also incorporate external libraries into the executable. +The output of the linking process is an executable file that can be +run independently. + +For static linking, the library code is copied directly into the executable file +at compile time, resulting in a larger executable that includes all necessary code. +This ensures that the executable can run without needing external library files at runtime. + +For dynamic linking, the executable contains references to shared libraries, where libraries +are not copied into, but are instead "linked" at runtime. This keeps the executable smaller +and allows multiple programs to share the same library code in memory. + +Loader +The loader has an important responsibility, so i though i would give a comprehensive look +at what a real loader is doing, and what its duties would ential. It is a separate step +handled by the operating system's loader, and doesnt come into play unless you run /exe + +First you have to "validate and identify"; That is when the loader first validates the +file to ensure it's a valid ELF file. This involves checking the ELF header's magic number, +architecture compatibility, and other essential information. + +Based on the program header table, the loader allocates memory space for the various segments +of the ELF file. (Segments again define how the executable should be laid out in memory, +e.g. code, data, read-only sections) + +The loader reads each section from the file based on the section header table. Some sections, +like .text (code) and .rodata (read-only data), are loaded into memory according to their +permissions. Other sections, like .bss (uninitialized data), may be allocated memory but +left uninitialized. Then the loader processes relocation information (typically stored +in sections like .rel.text or .rela.text) to adjust symbol references within the +loaded code. This ensures functions and variables are addressed correctly based +on their actual memory locations. + +Then, it sets up the program's execution environment, including the stack pointer +and program arguments. It then transfers control to the program's entry point +(usually the _start function) to begin execution. Additional considerations +might include the ELF file references symbols from shared libraries, +wherein the loader will locate and load these libraries dynamically +at runtime. + +It'll also set appropriate memory permissions (read, write, execute) for different +program segments. (Security checks may be performed at this point as well) + diff --git a/cpu.html b/cpu.html new file mode 100644 index 0000000..0732bf7 --- /dev/null +++ b/cpu.html @@ -0,0 +1,137 @@ + + + + + +cpu + + + + +The following is a very rough, compact description of how i view the internals +of the cpu, starting with dram and then everything else from the cpu's perspective. +This is what i have come to understand so far and should in no way be +taken as a total understanding. This is my attempt to summarize cpu components +in a broad sense as well as the underlying memory cell/logic gate structuring. +The complexity and elegance of the broader design shouldnt be taken for granted. +It should be noted however that we are focusing on the more elaborate and hard -to- +parse components, such as: registers, control unit, ALU, clock, mmu, and cache, +albeit in elementary terms. Therefore this is only a basic description of a +processor, even still, everything can be understood as a circuit of tiny wires, +capacitors and transistors that carry an electrical charge. Knowing what your +basic logic gates are is an important first step: NOT OR AND NOR NAND XOR + +         "cpu.register>>dram" + +("a summary on dram"): Part 1 How dram? + +Part 2 How computer = cpu.register>>dram? +[fetch][decode][execute] + +Storage in dram is volatile (meaning it relies on power or electrical charge to +maintain said data's integrity) and dram is intimately connected w/ the cpu and its +ability to calculate as well as in determining which instruction to perform. +And this cpu.register>>dram relationship has an affinite connection with the +rest of the computer/hardware, or 'device drivers'; in terms of this loop; + + cpu -> to a storage device -> and back to cpu.register>>dram. + +So coupled w/ the real-time clock, used to keep counting, in conjuction with the +low standby power CMOS Static RAM, ssd/hdd retain a small level of voltage/or charge +to keep memory persistent. In conclusion its regarded as safe, but generally not +recommended to unplug storage drives and keep it un-housed from its host netgate. + +register (technical facts) = in cell terms, it is a group of flip-flop circuits +that store 1 or 0. There are many different kinds of registers, but this one is +for storing a binary word, and one flip-flop is needed for each bit in the word. +What are the internals of a flip-flop?!__! + +A register in actuality is itself a storage element for addresses. +During the fetch phase, it is meant to retrieve an instruction from dram. +The Address register is wired into dram. dram interprets that as a value and then +sends back the associated address to the Instruction register. Next, opcode, +or the first 4 bits of the address corresponds to Load-A instruction. +The other bits of the address correspond to a dram address. +Here are some registers.... General-purpose register, stack pointer, +status flag register, vector registor, control register... accumulator = stores data for ALU +program counter = points to next instruction +instruction register = stores instruction +data register = stores data +address register = stores address +temporary register = stores temporary data +input/ouput registers = stores input and output data + +control unit () = which is responsible for decoding and executing instructions, determines +how to translate each high-level instruction into the appropriate micro-operations. +There's typically only one high-level instruction like `ADD` available to developers +in assembly language. However, under the hood, the CPU's microarchitecture breaks down +this single high-level instruction into multiple micro-operations to execute it efficiently. +The choice of micro-operations for a particular high-level instruction depends on: +First off, the microarchitecture of the CPU determines how that instruction is executed internally. +Different CPUs have different microarchitectures, which dictate how instructions are decoded, +executed, and retired. Each microarchitecture may have its own strategies for breaking down +instructions into micro-operations based on factors like pipeline depth, instruction scheduling, +and available execution units. It also checks for data dependencies to ensure that instructions +are executed in the correct order and waits (if necessary) until the required data is available. + +The control unit also ensures that the necessary resources, such as registers, execution units, +and functional units, are available to execute the micro-operations. It coordinates the allocation +and scheduling of these resources to optimize execution efficiency. **Branch prediction** is crucial +as well, and itll determine the "flow" of execution... Therefore, the decoder may predict those outcomes +of "conditional branching" -therein fetching and decoding a subsequent instruction(s). + +The control unit of a CPU includes an optimization technique known as pipelining, which allows multiple +instructions to be executed concurrently by dividing the execution process into sequential stages. +Each stage of the pipeline is responsible for a specific task, such as instruction fetching, decoding, +execution, and memory access. As one stage is fetching an instruction, another related instruction is +simultaneously being decoded. This handoff between stages allows for efficient overlap of instruction +execution, maximizing throughput and performance. + +Control signals are generated based on the decoded instruction and specify the actions to be taken by +each stage of the pipeline. These control signals determine which functional units are activated, +which data paths are selected, and how the instruction progresses through the pipeline stages. +Control signals are encoded to facilitate efficient instruction execution and minimize delays in the pipeline. + +Pipeline encoding includes mechanisms for detecting and handling pipeline hazards, such as data hazards, +control hazards, and structural hazards. Techniques such as forwarding, stalling, and branch prediction +are used to mitigate the impact of hazards on pipeline performance. Control signals are encoded to trigger +these hazard detection and resolution mechanisms as needed. + +Data forwarding and bypassing mechanisms are used to transfer data directly from one pipeline stage to another +without waiting for it to be written to memory or registers. + +In superscalar and out-of-order execution pipelines, instructions are scheduled for execution based on availability +of resources and dependencies between instructions. Instruction scheduling algorithms determine the order in which +instructions are issued to execution units, taking into account pipeline constraints and dependencies. + +control unit (technical facts) = It recognizes Load-A instruction circuit by matching the opcodes 4 bit address. +The physical circuit layout IS the match. In other words, each transistor will output the opcode correctly or not. +Which leads to the execute phase. The output of that Load-A checking instruction turns on dram's read/enable line and sends +the remaining bits of the dram address. dram retrieves the value at that address. Because it stems from a Load-A instruction, +the value has to be stored in register A, (none of the other registers). When dram's data wires are wired to 4 data registers, +the Load-A match circuit turns on the write/enable of only register A. For the next instruction, everything is turned off. +The Address register is then incremented by 1 bit, and we do the entire process again. + +Again the control unit is responsible for selecting the right registers to pass in as inputs and to configure the ALU to +perform the right operation. So for example, the Control Unit enables register B and feeds its value into the first input of +the ALU. It also enables register A and feeds that into the second input of the ALU ALU (lookahead carry circuit !__! +composed of AND/OR gates) The 'add' instruction determined by the opcode, is a 2 bit address. The add opcode is passed +into the ALU. The output will be saved to register A. For this, the Control Unit uses an internal register of its own to +temporarily save the output, turn off the ALU and then write the value into the proper register. +Then we begin again, incrementing the Address register by 1 bit. It goes through the usual fetch and decode. +The address is then passed to dram but this time its a 'STORE' instruction. So instead of read-enabling, it write-enables. +At the same time it read-enables register A. This opens the data line to pass in the value stored in from register A. + +clock = () Triggers an electrical signal within specific intervals. It advances the operation of the cpu, corresponding with specific registers. +What are the internals of TODO!__! +[Describe a pulse transition detector here] + +mmu = () is a memory unit responsible for mapping addresses to ram. It keeps track of shifts between virtual and physical addresses. + +cache = () a storage element. It is sram, by which it does not have a refresh step. It finds application in the correspondence of main memory blocks +and those in the cache. This is specified by the specific mapping instruction. +Cache internally is also a group of flip-flop gates. + +... to be continued... + + diff --git a/dram.html b/dram.html new file mode 100644 index 0000000..2719c4b --- /dev/null +++ b/dram.html @@ -0,0 +1,134 @@ + + + + + +dram + + + + +         Dynamic Random Access Memory + +   This is a brief summary describing dram, or a +   1T1C memory cell, and its role during the aquisition +   of a 32 bits. +     We'll discuss how it reads, writes +   and refreshes said address. + +    NOTE: the angular position of this diagram +    should be 90 degrees rotated for the proper +    orientation and illustration + +    wl +    -|-----.------- +    | | +    bl | _|_ +    |___| T |____| +    | | +    | __|__ +    | c _____ +    | +    _|_ +    - + +  1a bitline[row], transistor = 1 or 0, wline[column], capacitor + +   [read][write][refresh] + +   .dram= is a dual inline memory module. It is a physical hardware component with a viscious amount of cycles to maintain. +   Its purpose is to prefetch, move data before it is needed. It is connected to the cpu via channels along the motherboard +   i.e. memory controller to physical channel. + +   Channel A Channel B (two memory channels) +   They can accept 32 bits, divided into 4 integrated circuits, so they only read and write 8 of those bits at a time. +   Power goes through the motherboard to its power controller. + +   During the Address Input process, the cpu send a 31 bit address. +   3 of those bits go to the bank group. +   2 go to the bank, and 16 to the row decoder. + +   The remaining 10 are for the column multiplexer. + +   PAUSE!: Remember, there's no such thing as an address, however what is real is a bit, if you consider the +   ramifications of voltage in this context as a bit + +   Therefore we might say this is all in attempt to describe a bit. + +   The wordline comes first. It lies on the bottom layer, directly connected to the transistor and capacitor. +   Applying a voltage to the wordline turns on the transistor and channels to the bitline, but not always as +   in the case of (see below: When a wordline is active). + +   This voltage is so that a capacitor can retain a charge of 1 or remain a 0, uncharged. When the wordline +   is off the transistor is off, thus the capacitor is uncharged. When a wordline is active, capacitors of +   that row are active as well. However the bitline remains inactive or else everything in every row and +   column would be active, which would defeat the purpose. + +   It is this quilted-pattern cell made of metal-oxide (MOS), a metal-oxide semiconductor that makes this +   process so. The full term for this kind of transistor used is MOSFET (plus the help of the capacitor) + +   During the read process, a 31 bit address is sent from cpu to dram. +   5 bits select a specific bank. Next, all the wordlines must be turned off in that bank, to isolate +   capacitors and precharge ALL the bitlines to 0.5v. Next, the 16 bit row address turns on a row and then +   all of the capacitors in that row are connected to their bitline. + +   If an individual capacitor holds a 1 ~ charged to 1.0v, then some charge flows from the capacitor to these +   0.5v bitlines, and the voltage on the bitline increases. The sense amplifier then detects the slight change +   on the bitline and amplifies it by pushing the voltage on the bitline up to 1.0v. + +   However if a 0 is stored in the capacitor, charge flows from the bitline into the capacitor, and the 0.5v +   bitline decreases. The sense amplifier then detects this change, amplifies it, and drives the bitline voltage +   down to 0 volts (or ground). + +   Now the bitlines are 1 or 0 volts corresponding to the stored charge of the capacitors in the active row! +   The state of this row is considered to be "open". + +   During the write process, +   {write command} {address} {8 bits to be written} are sent from the cpu to dram. +   And like before the bank is selected, the capacitors are isolated, and the bitlines are precharged to 0.5v. +   Using a 16 bit address, a single row is activated. The capacitors perturb the bitline, and the sense amplifier +   detects this, driving the bitlines to a 1 or 0 (thus opening the row). + +   Next, the column address goes to the column multiplexer. Because the write command was sent, the multiplexer +   connects the specific 8 bitines to the write driver, which contains the 8 bits the cpu had sent along the +   data wires and requested to write. These drivers will override whatever was previously happening on said +   bitlines ~ driving each of the 8 bitlines to 1.0v for 1, or 0 volts for 0 + +   This new bitline voltage overrides the previously stored charges in each of the 8 capacitors in the open row, +   thereby writing 8 bits of data to the memory cells corresponding to the 31 bit address. + +   Note that writing and reading happens concurrently. + +   During the refresh process, all the rows are sequentially closed, the bitlines are precharged to 0.5v, +   and a row is "opened". + +   For this, again, the capacitors perturb the bitlines and the sense amplifiers drive the bitlines and capacitors +   to an "open" row, 1.0v (or down to 0 depending on the stored value of the capacitor). + +   This process of row close, pre-charging, opening and sense ampliying happens row after row until ALL of the rows are refreshed. + +   When the cpu sends a read or write command to a row that is already open it's called a "page hit". +   This can happen over and over. A page hit skips all of the steps required to open a row, and just uses the 10 bit +   column address to multiplex a different set of 8 columns aka bitlines. This connects them back to the read or +   write driver thereby saving alot of time!... A "row miss" is when the next address is for a different row +   which results in the dram closing and isolating the currently open row, and opening a new row. + +   Lastly, there's a couple other optimizations native to dram. By having multiple bank groups the cpu can refresh +   one bank in each bank group at a time, while utilizing the other three. This reduces the overall impact of refreshing. + +   For a 'burst buffer', 128 wires connect to 128 bit buffer locations. +   10 bit column address becomes two different parts. +   6 bits used for the multiplexer. +   4 bits for the burst buffer. + +   [for a read command], 128 memory cells, bitlines, are connected to the burst buffer using the 6 column bits, +   thereby temporarily loading or caching 128 values into the burst buffer. Using the 4 bits for the buffer, +&meps;  8 data locations in the burst buffer are connected to the read drivers, and the data is sent to the cpu. + +   By cycling through these 4 bits, all 16 sets of 8 bits are read out, and thus the burst length is 16. +   A new set of 128 bitlines are connected and loaded into the burst buffer. + +   For the sense amplifier's design optimization +   (see; cross coupled inverter) + + diff --git a/err.html b/err.html new file mode 100644 index 0000000..26735a4 --- /dev/null +++ b/err.html @@ -0,0 +1,67 @@ + + + + + +common errors + + + + +In the context of programming, a "memory error" typically refers to an issue related to how +your program manages computer memory. This can lead to unexpected behavior, crashes, or even +security vulnerabilities. + +Segmentation Fault (or segfault) is a common type of error that occurs when a program tries to access +a memory location that it's not allowed to. For example, consider the following; + + int *ptr = NULL; + int x = 10; + //*ptr = 20; Wouldnt be safe yet (undefined behavior) + ptr = &x; + *ptr = 20; // Now it's safe + +Wherein `*ptr=20` wouldve written to a null pointer that we hadnt dereferenced. +It may have contained a garbage value leading to an attempt to write to, or read from an +invalid memory location (resulting in a segmentation fault) + +You also have to consider the variable's lifetime and initialization. +When you set a pointer to a valid memory location, at any point during its lifetime, +it becomes safe to dereference it, which means its safe to assign a value to. Dereferencing +a pointer means accessing the value stored at the memory location the pointer is pointing to. + +Segmentation faults can occur in other cases too, such as attempting to access a value +that exceeds your arrays index or accessing invalid memory addresses. + +Buffer Overflow, yet another common error, happens when a program writes more data to a buffer than it can +hold, potentially overwriting adjacent memory. + + char buffer[10]; + char long_string[] = "This string is too long"; + strcpy(buffer, long_string); // Buffer overflow occurs here + printf("%s\n", buffer); // Undefined behavior + +This overflows the buffer because `buffer[10]` can only hold a maximum of 10 characters (which includes the null +terminator character (\0) that marks the end of a string) and the string "This string is too long" is significantly +longer than 10 characters; Therefore, when `strcpy` tries to copy this string into the buffer, it will write beyond +the allocated space, overwriting memory that belongs to other variables or data structures. + +You should ensure that the destination buffer is always large enough to hold the source string. +"Use After Free" errors are another kind of error you may encounter. This occurs when a program tries to use +memory after it has already been freed, leading to undefined behavior. There are plenty of issues that are +bound to happen to you at some point, but from experience you can learn to avoid them. + +Memory Leaks are another common error. They can happen when a program successfully allocates memory, +but fails to free it when it's no longer needed, leading to gradual memory depletion. + + int *ptr = (int*)malloc(sizeof(int)); + +Later on, if we dont free it or do not provide an if statement where its free'd because of a certain condition, +than much like the other errors that lead to malformations in memory, it'll cause issues at some point. + +Mind you, you can go without noticing an error/bug for awhile... And, its not until the program starts +having issues w/ performance, or it crashes at some stage that you then discover there was a bug somewhere. + +Therefore you always want to catch those bugs sooner than later. + + diff --git a/f64.html b/f64.html new file mode 100644 index 0000000..0e35de9 --- /dev/null +++ b/f64.html @@ -0,0 +1,52 @@ + + + + + +f64 + + + + 1.024e3 scientific notation example +including the sign(+,-), these are the parts that +comprise and describe a floating point number. +A significant difference between decimal point and +floating point is that floating points are binary, +so these have, what's known as a binary point. + +standard representation of this sometimes will show +an address in normalized scientific notation, +split into three parts where the most significant bit +represents the sign, then 8 bits for the exponent and +the remaining bits toward the trailing significand +or rather, the mantissa with its leading 1 omitted. +This address is normalized first after conversion to +the form of a × 10^n. 0 takes the form of all zero's +to clearly illustrate the use of zero. +Now that's just the finite numbers but there's also inifinty +and a special value called 'not a number'. +For now this is enough to prime someone before using this +type unconsciously + + int main() { + int64_t x = 1024; + double fp = 4096.1234; + + printf("int64 value : %d", x); + printf("64bit fp value : %f", fp); + printf("x and fp : %f", (x+fp)); + return 0; + } + +use case of format specifiers and 64bit floating-point +w/ multiple varieties in a print statement + +double a = 1234.56789; +double b = 299792458; +double c = 6.62607e-34; + +printf("Using %%f (fixed point): %f %f %f . \n", a, b, c); +printf("Using %%e (force exponent): %e %e %e . \n", a, b, c); +printf("Using %%g (best fit): %g %g %g . \n", a, b, c); + + diff --git a/fdelete b/fdelete new file mode 100644 index 0000000..d7b1f43 --- /dev/null +++ b/fdelete @@ -0,0 +1,12 @@ +note on things to remove: + +
+ + + + + test + + test1 + +
diff --git a/func.html b/func.html new file mode 100644 index 0000000..c51fec5 --- /dev/null +++ b/func.html @@ -0,0 +1,476 @@ + + + + + +func + + + + +
statements are the instructions executed by the program
+Lets go through simple examples; a declaration of an integer variable x... + + int x; + +The following is a declaration/initialization of "y" (or we might say its +explicitly declared, and assigned to the value "10") Initializing something +refers to assigning an initial value to a variable when it is declared. + + int y = 10; + +This value can be set explicitly at the time of declaration, ensuring that +the variable starts with a known state. + + x = x + 5; // An expression or a statement + char S = 'D'; // Declaration and initialization of a character + char str[] = "some string"; // Initialization of a string + char *str = "some string"; // Pointer to a string literal + +to learn more, see arrays +Anyway, lets not get stuck in the semantic meanings and trying to define things in more ways than one. + +--- data types --- + +In C, a variable or object can be of any data type, including primitive types +(int, float, char) or even user-defined structures. + +When a function is defined, its signature (return_type function_name(parameter_list)) +includes the return type, which tells the compiler what kind of data the function +will return (if any) + +char type is considered a small integer type and is typically used to represent +characters, however those characters have to be stored as integer values +(thus, small integer type). The range of numbers that an integer type can +represent in C depends on the specific integer type and whether it is +signed or unsigned (e.g., signed char is −128 to 127) + +The "size" of a data type refers to its memory allocation in bytes, and is directly +proportional to the range and precision of the values it can store. Wide data types +have larger sizes and greater capacity for storing extensive character (characters that are +outside the ASCII range) while regular data types are more limited in what they can express. + +a floating point type represents decimal number values of some precision-that is, +the number of digits a floating-point number can accurately represent. + +
void pointer
+Pointers in C are very important. they can point to objects of any data type... +"void" simply means "no return type".. it is still possible to include a return +statement within a void function. "int" or (non-void) functions MUST have a return type. +It is not legal to have a parameter of type void; Although a pointer to a void is legal +because its representative of a function thats passing a pointer of any data type; +the function can treat it as a generic pointer without knowing its specific type. + +You are going to be using them anytime you know a variable will be used to allocate +memory at runtime (such as `void *`, who's size is unknown before hand), as well as +anytime you want to access data indirectly, (which is more efficient than copying +around or rather, passing around by value) You can therefore reach the original +data of a function or variable as long as you've properly pointed to it- +if not the compiler can help catch these errors for you. +
+Here's the easiest approach to pointers: + + int *B; + int *A = B; + int *C = A; + int *D = C; + +Experiment w/ this in the context of type-casting back nd forth +and using different types. also when youve declared a variable and then +set a pointer (e.g., `int *ptr`) to make it point somewhere (ptr = ...) +you just use the name of the pointer (or use & to reference): + + int A = 4; + int *ptr; + ptr = &A; + +in printf functions, use the %p specifier and (&A) for example, to print out +an address (assuming its of type `int`, you would cast it in the following way) + + printf("%p\n", (void *)&A); + +or declare `*ptr` again to dereference the original pointer. +and while the & or (address of) operator is used to print a variable's address, +its primary role is to facilitate pointer operations and enable functions to +modify variables indirectly through their addresses... + +There are situations where you cannot directly assign the *ptr to the address of +a variable. Specifically, this restriction applies to cases where the variable's +address is not known at compile time, or when dealing with certain types of variables. + +You can use pointers when passing the address of a variable to a function. +this function can then modify the original value through the pointer +(note: you can also pass value, address, array and struct w/ parameters) + + void accessExample(int *ptr) { + *ptr = 1; + } + +`*ptr = 1;` dereferences the pointer called `ptr`, meaning it accesses the +integer variable located at the memory address stored in ptr. + +We can access the address of said variable +from another function like so + + void anotherFunction(int *ptr) { + printf("access in another function %d\n", *ptr); + } + +function parameters in C are local to the function in which they are defined. +this means they exist only within the scope of that function and do not +affect other functions. As such you can reuse parameter names across different +functions without conflict. This local scope is very useful for modular and +clear code design. + +here's an example of type-casting, + + int A_Value = 1; + int B_Value = 2; + int *SRC_REG = (float *)A_Value / B_Value; + +at first sight only `A_Value` will be typecasted. However, in C, when performing operations +between different types, the compiler implicitly promotes the integer `B_Value` to float to +match the type of `A_Value`. Therefore, the division is performed as float divided by float. + +Lets assign a pointer to a type-casted value. Directly assigning a memory address isnt safe. + + int *SRC_REG = (int *)0x1000; + +You'll have to learn for yourself how to preserve the safety in the context of your own program + + volatile int *SRC_REG = (volatile int *)0x1000 + +volatile represents a kind of transparent gate that always checks the actual state of the +data each time it is accessed. We want something to go IN.. but that thing going in is +going to be assigned once.. When it goes OUT to external places it has the +possibility of being changed, hence changing it when those external factors go IN. + +We want to preserve the fact that the program itself cannot change it, but that +the external factors can. This is crucial when dealing with data that might be modified +by external influences (like hardware registers or other threads) outside the +direct control of the program. + +explicit type casting w/ dereferencing would look like... + + DST_REG = *(int *)(SRC_REG + offset); + +When you pass buf to the following function, you're actually passing a copy of +the pointer itself. This copy points to the same memory location as the original +buf in the calling function, which affects the value of `buf` in the end: + + void seti(int *buffer, int value, size_t len) { + while (len--) { + *buffer = value - 1; + } + } + + int main() { + int buf; + seti(&buf, 42, 1); + printf("%d\n", buf); + + return 0; + } + +Had we not worked with a pointer, there'd be questionable results. In short, pointers +allow direct access to a memory address, providing us a way to manipulate and interact +with data at a low level—essential for tasks like memory allocation (`malloc`) and +working with complex data structures. The previous example also demonstrates the +significance of `int main` as every function has a main. Thats the starting point. +It needs to know how to begin this cascade of execution, and calling a function +is one way might initiate a sequence of events. + +Passing a pointer to a function is often more efficient than passing large data +structures by value because only the memory address is passed, not the whole data. + +Double pointers say something additional about a value that they're pointing to. +its a way to get the "value of the value", that is, to hold the address of +ANOTHER pointer `**ptr` + +dereferencing a double pointer allows access to the value of the variable +that the single pointer (*ptr) points to. + +Call it common convention, a general rule of thumb or routines that i find +myself reusing, one such being to initialize some variable (objects) +`int *ptr = NULL;` for example, signifies that the pointer doesnt +currently point to a valid memory location, and i often use this +in the context of pointer initialization. + +Although to ensure the value is set to zero for primitive data types, +you'd typically do, e.g. `int num = 0;` or `float value = 0.0` + +For larger structures of data, or an array, its more efficient to use `memset()`, +which'll set all elements of an array to zero. Its good practice to initialize +objects to their appropriate default values when they are created. There are +of course some intricacies to be mindful of so you dont accidentally set +something to an invalid memory location. For more info see common errors + +So you already know about `main`, and how it's the entry point of the program. +Its essentially its own function and should be treated as such. You might sometimes +see parameters such as `argc` (argument count) or `argv` used as parameters in main. +`int argc` is an integer that represents the number of command-line arguments +passed to the program. The value of argc includes the name of the program itself +as the first argument, so it is always at least 1. + +`char *argv[]`, or sometimes `char **argv` ,interchangeably represent an argument +vector, which is an array of strings (character pointers) representing the actual +command-line arguments. Considering that by default argc is always "1", you can +check for a single argument passed to argv, e.g. + + printf("%s",argv[0]); + +This'll print out the name of the program on the command line, since argv will +see the first argument it sees, which is the name of the program you've executed. + +A variable is a named object in C. It's an identifier that you use to access a +particular object (memory region). For example, when you declare (e.g. int x) +`x` is a variable that refers to an object capable of storing an integer value. + +You can use objects before they are explicitly assigned or defined in two very specific +situations — which ties in w/ two specific features in C, neither of which are exclusively +synonymous with objects automatically springing to life, but may mistakingly be used to +describe it. One is called forward declaration, and the other is relaxed +declaration ordering from C11—onward. + +A forward declaration is used to declare the existence and type of a function or +variable before its full definition. It informs the compiler about the identifier +so it can understand its usage even if the definition comes later (often in separate files). + +So short, it lets you have prototypes in headers or at the beginning of source files. + +Relaxed declaration ordering on the other hand allows the use of a variable within a block +before its formal declaration, so long as the declaration appears later within the same block. +The compiler performs a special name lookup to ensure the identifier will be declared properly. + + void an_example() { + // Use the variable 'value' before its declaration + int result = value * 2; + + // Declaration of 'value' (later in the same block) + int value = 10; + } + +Implicit declaration (automatically creating a variable when used without prior declaration) +is generally discouraged because it can lead to unexpected behavior and compilation errors. + +Variables in C can are thus normally declared before their use, and this is typically referred +to simply as a declaration rather than a forward declaration. +Example: `extern int thumb;` declares thumb without defining it. + +This is a declaration rather than a forward declaration because it's not about defining it +later in a separate file but rather about declaring its existence and type. + +
--- variadic function ---
+since we use `printf` to see the results of everything we should understand what kind +of function it is—its a special function that takes a variable amount of arguments, +that is, they are presented with an ellipsis "..." within a given function parameters + + int printf(const char *format, ...) + +This is the definition provided by the standard lib header, so linking w/ +`#include <stdio.h>` lets you use `printf`. Keep in mind, you include C library +headers w/ <file.h> and user-defined headers w/ "file.h" + + printf("Example text %d, %d, %d\n", var1, var2, var3); + +So here we are demonstrating what both a function declaration is, and how `printf` works. +Question: What are function parameters even for?... They allow a function to encapsulate +its behavior, in the sense that they are inherited by the function body, and accessed +accordingly. + +This is true of variadic functions as well, and, because its variadic, it may conceivably, +take an infinite amount of variables. More specifically, the first part (the part "in +quotation marks") will accept your personalized text as well as format specifiers. + +These format specifiers align with each proceeding variable (found after the comma) +`printf` in particular is used for printing formatted output to the screen, hence print -f(formatted) + +Now lets back up for a second and verify what a function is for. A function brings a +specific set of instructions, depending on how you define them. + +Lets say you create a function prototype that is, you want to tell the program that +theres a function defined somewhere and you want to call it + + void func(int x); + +it would have to find this function defined within the program somewhere... + + void func(int x) { + } + +r-values represent parts of a given expression (typically on the right side) that +are attributed to the value of an expression. An l-value is attributed to a location +in memory for a said value. A pointer thats assigned to the r-value (literal or +function call) becomes an l-value, designating the memory location. + +It should also be said that a function can be the r-value assigned to a given variable. +One notable consequence is the ability to directly capture the internal instructions of +the function: + + int func(int a,int b) { + int c = a + b; + return c; + } + + int main() { + int a2 = 2; + int b2 = 3; + int intern = func(a2, b2); + printf("intern is %d\n", intern); + return 0; + } + +This then demonstrates how some function with a given return statement should work +and how new variables (passed in as arguments) should take effect. + +You should recall that any function w/ a type (return type) other than void should +have a return statement. here we are returning a variable. when the type of the value +returned by a return statement does not match the declared return type of the function, +the compiler typically issues a warning or error; ergo, assigning variables in such +a way to make sure that its return type is the same as the function it is within +will help ensure you are using the compatible data type at the end. + +or just create that function with the same data type you know you'll be returning, +to prevent casting w/ possible data loss scenarios if you can. + +in conclusion, functions can use return statements not only to pass back computed +values but also to indicate success or failure of their operation. +You can have arithmetic expressions directly in a given return +statement, as well as w/ comparison (`return a >= b`) + +It's a common convention to use `return 0;` to indicate success and non-zero values +(typically 1 or -1) to indicate errors or some other kind of failure (that is, for +functions or branches that are indicative of some kind of error too). For functions +that return pointers, `NULL` is often used to indicate an error or failure to +allocate memory. This is particularly common in functions that are expected to +return a pointer to a dynamically allocated resource. When NULL is returned, +it signifies that the requested resource could not be created or allocated. + +it's also useful to define more specific error codes sometimes, thereby providing +more detailed information about the nature of the error that occurred. + +
--- function pointers ---
+regarding function pointers... they can seem a little odd at first, especially when +combined with unnamed parameters or return types, however these unnamed functions or +function parameters are legal in C, nevertheless function pointers are a powerful +feature of the language, allowing for dynamic dispatch, callbacks, etc. + +first create the function + + void Function(int param) { + } + +declare a function pointer + + void (*pointerToFunction)(int); + +assign the function pointer +to point to the function's address +(capture its internal instructions) + + pointerToFunction = &Function; + +now its equal to some original function we had in this example you dont even need to +include braces, but you should note that you dont include void anyway when you call +a void function from within another function + +however function origins that do not have parameters should be filled out with void +i.e. `func(void) {}` + +last but not least, you can call the function through the function pointer, +and assign a value to its parameter + + pointerToFunction(42); + +here's the next example + + void ThisFunction(void (*NewParam)(int)); + +you might also call this the function pointer's signature it would have to find this +function defined within the program + + void ThisFunction(void (*NewParam)(int)) { + } + +continuing on, heres a function that matches its signature + + void SomeNewFunction(int Param) { + } + +you can call it now since it shares the same signature + + ThisFunction(&SomeNewFunction); + +i'll leave it up to you to experiment with it... for now, here's an even simpler +illustration of a function pointer... + + void printNum(int num) { + printf("Number: %d\n", num); + } + + int function(void (*ptr)(int)) { + (*ptr)(2); + return 0; + } + + int main() { + function(printNum); + return 0; + } + +you can use `typedef` in function pointers (not for regular functions) this is VERY useful +for creating parameters (example from C11 threads) + + typedef struct thrd_t_struct thrd_t; + typedef int (*thrd_start_t)(void*); + int thrd_create(thrd_t *thr, thrd_start_t func, void *arg); + +since we made `thrd_start_t` a type, you can use it as a `type ofsomething` bearing in mind, +im demonstrating whats possible with functions. once you understand function pointers and +matching signatures, rest assured everything else will be a cakewalk. + +here are functions of `struct` type, the first one being a function pointer + + struct fourth (*proc_ptr)( + const struct fourth* insert, + float mode, + const struct ftres* amount, + float color, + float width + ); + +we're just showing whats possible in the land of hypothetical... +when i say whats possible i mean what is feasible, conceivable and functional, +and not like, mis-match-o'nomics and going to the edge of whats possible within the +builtin constructs, and without regard to language conventions. these are then +things that ive either seen in source code or have picked up from somewhere else. + + struct unboundedint { + }; + + struct unboundedint constructor(int num_blocks) { + struct unboundedint result; + } + +this ones just a regular function that happens to be of type `struct`. In this context, +a "constructor" refers to a function that initializes a particular data structure or object. + +when you use the __attribute__((constructor)) and __attribute__((destructor)) attributes +(which are GCC-specific) GCC places references to these functions in special sections of +the object file, specifically `.ctors` for constructors and `.dtors` for destructors in +the ELF (Executable and Linkable Format). + +for more information on attributes, i made this page + +constructors are automatically executed before the main() function is called. +when an ELF executable or shared object (dynamic library) is loaded, the dynamic linker +(ld.so on Linux) looks for the .ctors section. + +if this section is found, the dynamic linker calls the functions referenced in it, +and goes upon initializing resources and setting up the environment. + +learn more about structures or continue w/ functions like memcpy or malloc + +else go to next page... + + diff --git a/fw.html b/fw.html new file mode 100644 index 0000000..ba3ce31 --- /dev/null +++ b/fw.html @@ -0,0 +1,65 @@ + + + + + +frameworks + + + +

Application from the Framework

+ Choosing the right framework for your application is everything, as it means managing your project in a specific + way that may or may not have the right balance, workflow and maintainability you desire. It goes without saying + that clean, hygienic, intuitive code is what you're aiming for. We should also consider the foundation and try + to then imagine what the end result looks like. In order to do this we need to take a speculative look at the + internals and rely on our experience and intuition to determine what would best serve our project. + + Of course, unless you have had the experience yourself and probed at each framework, you may have absolutely no + idea what im talking about. Rest assured its something you should learn for yourself in order to comprehend unique + and novel programming. + + Frameworks consist of header files and implementation files. Header files (often referred to as framework headers) + contain declarations for the functionalities provided by the framework. +

Framework Interdependencies

+ Regardless of the approach, frameworks (both header and implementation file alike) inherently link to and depend on + other header files or libraries. This linkage ensures that all necessary declarations and definitions are available + during compilation and linking phases. Even if a file is linked by another file already, frameworks will reference + these dependencies to pull in or reference the required header files, ensuring all components are correctly integrated. + + In a perfect world, we could have a single translation unit that could bring in each file in a cascading manner that + wasnt dependent on redefining a header over and over in its particular section of the code, but that could go downstream + to say whether or not it had already been *picked up*. While this concept aligns with some behaviors of the linker and + the builder, it is not fully achievable due to several inherent limitations. + + First, the C/C++ preprocessor lacks the intelligence to track header inclusions across multiple translation units + effectively. It relies on include guards (#ifndef, #define, #endif) or #pragma once to prevent multiple inclusions within + a single translation unit, but it does not manage inclusions across the entire project scope. + + Second, while linkers do handle symbol resolution and can merge identical symbols, they do not operate at the level of + header file inclusion during compilation. They work with object files generated after the preprocessing and compilation + stages, meaning they cannot influence how headers are included. + + Finally, the dependency management and build optimization performed by modern build systems are designed to handle these + complexities to some extent. They track dependencies and only recompile files when necessary, but the cascading inclusion + approach remains impractical due to the inherent limitations of the C/C++ compilation model. + + (1) Extern Declarations and Precompiled Headers: Frameworks may utilize `extern` declarations to reference functions or + variables defined elsewhere. This is often done to leverage precompiled header files, which can improve build times by + storing commonly used declarations in a single file. + + (2) Direct Definitions and Efficiency: Some frameworks define everything within their *own* header files, avoiding external + dependencies. This approach can streamline the build process and potentially improve efficiency by reducing the need for + the compiler to search for references across multiple files. + + (3) The Hybrid Approach adopts a scheme that can balance both performance and clarity. It is best carried out through what + are called *Unity builds*; that utilize precompiled headers to store frequently used declarations for faster build times. + As such, it is divided into modules, wherein each module has its own header file. This promotes code organization and + reusability. It may also selectively utilize `extern` declarations for specific functions or variables defined in the + core engine to avoid redundant definitions, and thus streamlining the build process. + + However, unity builds are generally unavailable for systems that dont wish to use CMake, or some other equivalent, as its + really a feature that exists only in the realm of *extended* build automation and is less critical for applications that + leverage their build systems in more specific ways. In conclusion, its best to choose the right approach for the type of + application you're making and how you think it should be made. + + diff --git a/headers.html b/headers.html new file mode 100644 index 0000000..e2d2ef6 --- /dev/null +++ b/headers.html @@ -0,0 +1,44 @@ + + + + + +C standard library + + +

C standard library is defined by the following

+ 'assert.h' condition that compares arguments to zero + + 'errno.h' +'pthreads.h' standards based thread api +'fenv.h' floating-point status flags and control modes +'iso646.h' alternative operators as well as digraphs and + trigraphs +'limits.h' ranges of integer types +'locale.h' localization +'stdarg.h' variadic arguments '...' +'stddef.h' macro definitions as well as 'stdbool.h' + +'stdint.h' fixed-width integer types + +'inttypes.h' + 'ctype.h' + + 'stdio.h' standard io utilities e.g. printf + + 'stdlib.h' includes dynamic memory allocation e.g. malloc + +'signal.h' 'setjmp.h' + +'string.h' includes handles to strings and character manipula- + e.g. memcpy + +'tgmath.h' includes 'math.h' and 'complex.h' + + 'float.h' + +'time.h' time/date utilities +'wchar.h' multibyte and wide character utilities +'wctype.h' functions to determine the type contained in wide + character data + diff --git a/html.html b/html.html new file mode 100644 index 0000000..a106fc3 --- /dev/null +++ b/html.html @@ -0,0 +1,115 @@ + + + + + +html + + + + +What a URL is composed of: + + protocol://host:port/path#anchor "anchor" is also known as fragment + + You can add a Query String, additional parameters for the request, often in the form of key-value pairs. + It starts with (?), for example, in `https://example.com/page?name=Bob&age=30` + + A "slug" is a URL-friendly version of a resource identifier, typically derived from a title or name. + It’s used to create readable and SEO-friendly URLs (search engine optimization). Slugs often replace + spaces with hyphens, and they may convert uppercase letters to lowercase. P.s. Everything that comes + after "port" in the URL is case sensitive. + +The following command is deployed for auditioning said html page: + + python -m http.server it'll say, "..serving you the + http on server 8000" or something + desktop url localhost:8000/page.html + private ip (needed on mobile) 192.168.0.1:8000/page.html + port forwardng requires your public ip but thats a different topic + +First off, what do we call these things... + +<!-- This is a comment - that will not be displayed in the published html page --> + +Tags mark up existing content to define its presentation; they're also used in the creation of elements (as we demonstrate towards the end) +They are always surrounded by angled brackets. As such, you enclose said elements w/ an opening and closing tag. + + <p> and </p> + +Attributes provide additional information about HTML elements. They are included in the opening tag. +Attributes come in several flavors: Quoted or unquoted attributes (w/ a string or numeric value), as well as boolean attributes. + + <a href="https://example.com"></a> + +Elements consist of a start tag, content, and an end tag. If the tag doesn't have an end tag, it can be a void element. +This is a regular element: + + <p>Yadayada yada</p> + +The following is a void element. Void elements do not have any content or end tag, as they are considered self-closing: + + <img src="image.jpg" alt="Description of image"> + +Anchors are used in html to jump you around in the document. +Here's an example of a link that is also an anchor: + + <a href="func.html#void-anch">Hyperlinked words that are anchored now</a> + <div id="eg-anch">Example words for other end of the link</div> + +I will go over what those tags and attributes mean. + + <HTML> ... </HTML> Encloses the entire document and + overrides other filetyping mechanisms + <TITLE> ... </TITLE> The title of the document + <BODY> ... </BODY> Encloses the body of the document. + <Hn> ... </Hn> Section heading, n=1 (biggest) to 6 + <PRE> ... </PRE> Encloses block of text to be shown verbatim. + +<A NAME="..." HREF="URL"> ... </A> + Creates a link (HREF) or (NAME) or both. They are attributes, NAME being an example + of something deprecated in HTML5 (which may or may not be true, test it for yourself) + <a> being for hyperlinks, and <href> to specify the URL of the link. + +<DIV> ... </DIV> is a block-level container element used to group other + elements and apply/manipulate sections. So <div> may be called a tag + (when referring to the specific parts) and an element (when referring to the whole structure). + <div id="main-content"></div> + +"main-content" is the value assigned to the id attribute. This value serves as the unique identifier for that particular element. + +Inlined image + <IMG ALIGN="..." SRC="URL" ALT="..."> + Inserts an image from SRC, or text + from ALT if the image can't be used. + ALIGN is one of top, middle, bottom (default) +Text flow + <br> Force a line break + <p> Add a paragraph break +
Horizontal rule (pseuedo page break)(i sometimes prefer a dotted line) + +Hints +You can go to 'more settings' or use <Key>Ctrl+U or type view-source:https://anywebsite.you/desire + to view, copy, analyze the source code of any document. + +Entity name examples +Overline ¯ ¯ ¯ +Pilcrow (paragraph) ¶ ¶ ¶ +Georgian comma · · · +Cedilla ¸ ¸ ¸ +UPPERCASE RHO Ρ Ρ Ρ +lowercase rho ρ ρ ρ +UPPERCASE SIGMA Σ Σ Σ +lowercase sigma σ σ σ +UPPERCASE TAU Τ Τ Τ +lowercase tau τ τ τ +UPPERCASE UPSILON Υ Υ Υ +lowercase upsilon υ υ υ + +Issue: This line spacing was changed, doesnt work as intended. +
  • This is &emsp; +
  • This is &ensp; +
  • This is a regular space. +
  • This is &nbsp; + + diff --git a/httpd.conf b/httpd.conf new file mode 100644 index 0000000..7425854 --- /dev/null +++ b/httpd.conf @@ -0,0 +1,10 @@ + + + Options Indexes FollowSymLinks + AllowOverride All + Require all granted + + ErrorDocument 404 /404.html + + + diff --git a/index.html b/index.html new file mode 100644 index 0000000..883be24 --- /dev/null +++ b/index.html @@ -0,0 +1,34 @@ + + + + + +select + + + jump to level + + 242l ▸control flow ▂▄__ start here + 242l ▸format specifier + 242l ▸storage class specifier + 242l ▸type qualifier + 476l ▸data type ▂▄▆_ + 476l ▸void function + 476l ▸pointer + 476l ▸variadic function + 476l ▸function pointer u graduate from the triathlon upon reaching this point + 280l ▸array ▂___ + 303l ▸struct + 308l ▸macro +
    → + .-=========-. + \' -=======- '/ + _| .=. |_ + ((| {{1}} |)) + \| /|\ |/ + \__ '`' __/ + _`) (`_ + _/_______\_ + /___________\
    + + diff --git a/install.html b/install.html new file mode 100644 index 0000000..619d55b --- /dev/null +++ b/install.html @@ -0,0 +1,363 @@ + + + + + +install instructions + + + +   There are many many distributions of linux. +   Once you know which you'd like to install you should +   go about finding that distro's installation page. +   You can also take these instructions as a solid +   foundation on what you need to do. + +   Artix Install w/ runit (however +   OpenRC, s6, or dinit are potentially easier) + +   : Instructions: +   : [```]=optionals +   : Helpful Shortcuts: +   : Shift+ZZ=save file +   : Switch Esc w/ Caps_lock key, just a suggestion + +   See ArtixLinux.org for more information, e.g. +   https://wiki.artixlinux.org/Main/runit +   https://wiki.artixlinux.org/Main/Installation +   **Legacy Tree Example** +   sda disk solid state drive e.g +   --sda1 /boot +   --sda2 /part2 +   --sda3 / +   sdb disk hard drive e.g. +   --sdb1 /example +   --sdb2 /part2 +   sdc disk flash drive e.g +   --sdc1 /open +   --sdc2 /encrypted + +   burn iso w/ `dd` selection/command +   file system: fat32 is the most compatible file system but +   we'll be exploring the use of other file systems as well +   note, make sure everythings plugged in during this time + +   **BIOS** +   Once computers on, +   Press the “Del” +   or “F1” , “Esc”, “Fn 2” +   or “F10”, “F2” or “F12” +   to open the BIOS... +   “Alt” is sometimes hidden settings. +   select the "usb" boot option in BIOS + +   **Important** +   The term "BIOS" is often colloquially used, and can mean either UEFI or PC BIOS... +   BIOS or UEFI is sometimes considered a 1st-stage bootloader, while GRUB is the 2nd... +   (U)EFI System Class 1 and Class 2, has a BIOS compatible mode called "Legacy BIOS" or +   "CSM" (short for Compatiblity Support Module), which makes the UEFI behave like a PC BIOS. +   UEFI System Class 3, the standard since around 2020, no longer has a CSM... +   If you plan to make an MBR (dos) partition table, than you should be in +   "Compatible Support Module" or "Legacy Mode", but if you're going w/ a +   GPT partition table you should be in "UEFI" boot mode. + +   || NOTE: Make sure you have internet/wifi +   || Your keyboards last config WILL persist +   || Save bluetooth, sound, etc devices until afterwards +   || Advanced: If you need to re-discover and chroot into +   || an existing filesystem, this page + +   root:root,pass:artix +↓ + +
    +   **Commands**: + +   ls list + +   dd if=artix-base-runit-10110010-x86_64.iso of=/dev/sdc status='progress' && sync +   reads/writes artix-base.iso to device +   ```(optional if already installed)``` + + +   lsblk list your partitions +   the following example will acount for making (3) partitions on 1 drive +   ```you may need to run `swapon --show` to see if a swap partition is being used +   wherein you can run `swapoff /dev/sda5`, or whichever one it listed for you``` + +   fdisk /dev/sda + +   p list partitions throughout the process + +   d ```delete 3 OPTIONAL``` + +   d ```delete 2 OPTIONAL``` + +   d ```delete 1 OPTIONAL``` +   When it comes to sectors, fdisk will automatically convert human-readable formats e.g. (30GB) into sectors +   and you can then accept the default value ([Enter]) for the last sector. +   However, in case you do have to calculate both sectors, if you see something like "2048-1465149134", that represents the "start-end" sectors +   available... What i do is specify "2048" as my First sector, and then i calculate the amount i want to manually specify (starting w/ 1GB) +   and using this conversion principle:... `1GB * 1024MB = 1024MB/GB, 1024 * 1024 = 1048576KB/MB, 104857 * 1024 = 107373568 bytes/KB, +   107373568 / 512 bytes = 209714` ... and thats what youd specify as the Last sector for the first partition... for the second partition, +   you take the TOTAL amount you want to specify (31GB), calculate the conversion factor in the same way we did... And that would be the +   Last sector for the second partition, where you specify the First sector as "one step, or sector" greater than the Last sector of the +   first partition. + + +   g ```to create a new empty GPT partition table``` + +   o ```to create a new empty MBR partition table``` +   Creating a separate boot partition depends on the system's boot method (BIOS/UEFI) and partitioning scheme (MBR/GPT). For BIOS-based systems +   using MBR partitioning GRUB embeds its core image in the MBR gap, so a separate boot partition isn't necessary. On UEFI-based systems using +   GPT partitioning, there's an EFI System Partition (ESP) for storing boot loaders. GRUB for UEFI systems is installed to the ESP, which typically +   requires only a few hundred megabytes of space. It's essential to allocate sufficient space for the ESP and other partitions based on your system +   requirements. GRUB will automatically install to the ESP on UEFI systems. For BIOS-based systems, while a separate boot partition isn't typically +   required, ensuring adequate space for the core image is necessary, so creating a small boot partition for it is recommended. + + +   n new partition [Enter] [Enter] [+1GB] [y] + +   t ```whichever partition is going to be your boot partition, youll need to establish w/...``` + +   1 ```press 1 for EFI system, optional...``` + +   n new partition [Enter] [Enter] [Enter] [+30GB] [y] + +   n new partition [Enter] [Enter] [Enter] [Enter] [y] + + +   w writes, finalizes the partitions/&leaves fdisk + +   q if for some reason you did something wrong then you can quit and redo it +   and if for some reason you want to start completely over, you can reboot -h now + + +   lsblk list your partitions + +   mkfs.ext4 /dev/sda3 +   You can name these however you want and order them in any way however, it must remain consistent here after + +   mkfs.ext4 /dev/sda2 + +   mkfs.vfat /dev/sda1 +   Assuming this is for your boot partition, mkfs.vfat will create a FAT32 if said partition is large enough. +   +   ```EFI system partitions hold EFI boot loader files and related data, so it doesn't typically contain a traditional filesystem like Ext4, and.. +   instead it usually has a FAT32 filesystem. Therefore you dont run `mkfs.ext4` on that partition if you've created an "EFI system"... +   and you run the following `mkfs.fat -F32 /dev/sda1` which is a more explicit version of mkfs.vfat``` + +   ```These are optionals if you had a swap partition``` + +   mkswap /dev/sda5 + +   swapon /dev/sda5 +   Keep in mind, we'll be creating things in the `/mnt` directory because that is where we are recreating the directory structure of the system. +   Since /mnt is the mount point for e.g. /dev/sda3, it treats this as the base for the root filesystem, where the rest of the root directory structure will belong, +   so you HAVE to do `/mnt` first BEFORE you mkdir and mount the others.``` + +   mount /dev/sda3 /mnt + +   mkdir /mnt/part2 + +   mkdir /mnt/boot + +   mount /dev/sda2 /mnt/part2 + +   mount /dev/sda1 /mnt/boot + +   ls /mnt List those mounted directories, while making sure they reflect what you made and that no corruptions occurred in the process +   Use `umount` to unmount if something went wrong. + +   NOTE: You do not always receive a confirmation after using a command. You can of course test whether a command runs successfully or not w/ e.g. + +   (man runit && echo "command ran successfully") || echo "Error: command was not executed" + +   ls -lap | more +   its worth learning how to use `more` and `less` commands for viewing by pages, e.g. man dir | less +   Try `command --help` for available options, as --debug or -v --verbose results + +   *Network*: +   For ethernet you simply do this: + +   sv start connmand +   And for Wifi, you do the following... + +   rfkill unblock wifi + +   ip addr show +   Here you'll see the interface name e.g. "wlan0", an IPv4 address, "inet 192.168.1.10/24" and a broadcast address "brd 192.168.1.255", etc + +   ip link set interface up +   replace interface w/ wireless network e.g. wlan0 + +   "Connman" and "NetworkManager" seem to interfere w/ each other.. So consider using "connmanctl" first to connect, +   and ignore the NetworkManager until the end, where you can presumably use connmand or nmcli (NetworkManager's controller). + +   connmanctl agent on + +   connmanctl scan wifi + +   connmanctl services + +   connmanctl connect wifi_1234567890 + +   connmanctl passphrase EXAMPLE + +   connmanctl exit + + +   dmesg | grep firmware +   checks for firmware being loaded + +   dmesg | grep iwlwifi +   ```to identify any issues optional, +   see; Installing driver/firmware https://wiki.archlinux.org/title/Network_configuration/Wireless +   or Dynamic_Kernel_Module_Support``` + +   ping 185.199.108.133 -c 4 + +   *Basestrap+configuration*: + +   basestrap /mnt base base-devel runit elogind-runit linux linux-firmware vim + + +   fstabgen -U /mnt >> /mnt/etc/fstab +   ```mkdir /mnt/etc``` if one does not exist + +   blkid ext4 /dev/sdb1 >> /etc/fstab +   ```an example to append the UUID of a drive to your fstab``` + + +   artix-chroot /mnt + +   bash You can use this shell or (`sh`) which can be exited at any time + +   export EDITOR=vim +   ```After artix-chroot, you may have to make sure all mountpoints are listed and correspond w/ +   the /etc/fstab If the mountpoints/something else does not appear..``` + +   cat /etc/fstab +   ```and if thats the case blkid /dev/sda1 manually construct and append``` + +   echo "UUID= /boot vfat rw,relatime 0 2" | tee -a /etc/fstab +   e.g. +   UUID=ABCDE-123-1234 /boot vfat rw,relatime 0 2 + +   pacman -S grub efibootmgr +   grub install and EFI system + +   grub-install --target=i386-pc /dev/sda +   alternatively, `grub-install --target=x86_64-efi --efi-directory=/boot --bootloader-id=grub /dev/sda` +   if for some reason its not writing to a non-FAT filesystem, i have a list of steps here you can try + +   grub-mkconfig -o /boot/grub/grub.cfg + +   *Language and region*: + +   vim /etc/pacman.d/mirrorlist +   prioritize top,to bottom... ... e.g. Server = https://us-mirror.artixlinux.org/$repo/os/$arch + +   ln -sf /usr/share/zoneinfo/America/New_York /etc/localtime + +   ls -l /etc/localtime +   list view, localtime + +   hwclock --systohc +   system to computers time + + +   vim /etc/locale.gen +   list localizations language +   en_US.UTF-8 UTF-8 en_US ISO-8859-1 (save file) + +   locale-gen for generating locals + +   vim /etc/locale.conf new file +   LANG=en_US.UTF-8 (save file) + + +   vim /etc/hostname +   exComp your computer's name here +   (save file) + + +   vim /etc/hosts +   ~ +   ~ +   127.0.0.1 localhost +   ::1 localhost +   127.0.0.1 exComp.localdomain exComp +   exComp replace w/ your own ComputerName +   (save file) + +   *Warning*: +   There's a slight issue, that is connman (albeit present during root installation) is not present on the canonical +   system, which can be resolved if you download it prior to reboot. This is an issue because if you dont have connman, +   you have no way to get on the internet, and that includes accessing through ethernet. So download/or configure every- +   thing you need thats internet related before hand or youll have to start from the beginning again. + +   pacman -S connman connman-runit networkmanager networkmanager-runit +   Download both Connman and NetworkManager to ensure you have a way to access internet. These are just the front ends +   for `iw` & `wpa_supplicant`, which you dont have initially until you download those packages that require them. +   some people find networkmanager gives them trouble, but for me its always connman and the others that dont work + +   *Reboot or shutdown*: +   Dont forget to set a password for root before shutting down, or you will not be able to log in. + +   passwd New password: EXAMPLE + +   You should consider rebooting to make sure that you did everything up to this point right. +   It will eliminate some upcoming variables on the off chance you're troubleshooting a problem. + +   exit You can do this by exiting back into the initial root system, and running... + +   umount -R /mnt and then reboot or reboot -h now (its recommended to un-mount for proper shutdown) + +   *Lastly*: Booting into the canonical system, for Wifi and Ethernet + +   ln -s /etc/runit/sv +   Lists auto-startups + +   ln -s /etc/runit/sv/NetworkManager/ /run/runit/service +   normal start-up option + +   ln -s /etc/runit/sv/NetworkManager/ /etc/runit/runsvdir/current +   auto-matic start-up + +   sv start NetworkManager +   which is what you use to start the ethernet connection. + +   nmcli device wifi connect YourSSID password YourPassword + +   nmcli connection show + +   visudo uncomment `% sudo` + +   export EDITOR=vim + +   useradd -m newuser +   is imperative to make a regular user account and, passwd newuser. + +   X Windowing system as well as some inclination of a gpu driver... +   Beyond that, i recommend dwm as its a decent window manager... + +   If you run into any compromising situations... + +   jobs lists all the jobs still running, fg to resume a job + +   ps aux display information about running processes +   <Key>Alt+Shift+Q default key to close all or (Alt+Shift+C individual window) +   <Key>Ctrl+C, ^C control sequence which sends a SIGINT to all processes. +   <Key>Ctrl+D, Ctrl+Z, or Ctrl+Alt+Del... +   <Key>Ctrl+Alt+Fn+2 opens a new TTY[Fn, wherein you can kill a session from outside those running instances. + +   killall -u user's_name +   to kill a user's process or session + +   Welcome to Linux!YAY! + diff --git a/ln.md b/ln.md new file mode 100644 index 0000000..20dce60 --- /dev/null +++ b/ln.md @@ -0,0 +1,4 @@ +### HTTP 303 See Other + +The requested resource can be found at [**Redirect to Web Page**](https://snarlferb.github.io/a/std.html). This status code is to redirect the client to a different resource, typically after a POST request, to ensure that the client retrieves the updated resource using a GET request. + diff --git a/macro.html b/macro.html new file mode 100644 index 0000000..0442577 --- /dev/null +++ b/macro.html @@ -0,0 +1,308 @@ + + + + + +macro preprocessor + + + + +
    --- macro ---
    + +Something i like to use macros for is defining a macro within the header path of +the same name its defined w/, wherein the named macro can be used w/ `#include` +(not exluding its macro use case) e.g., + + #define HI_H "/path/to/it.h" + +and then you can use it like `#include HI_H`, where by the preprocessor replaces it with the +actual header file path during the preprocessing stage. Next here's an example of ifdef.. + + + #ifdef PLATFORM + #include <platform.h> + #define MESSAGE "Hello from whatever platform!" + + #else + #include <stdio.h>// Assume some other platform + #define MESSAGE "Hello from non-native planetform!" + + #endif + + int main() { + printf("%s\n", MESSAGE); + return 0; + } + +`ifdef` checks if a macro named "PLATFORM" is defined. if so, the line `#include ` +gets included because the condition is true. And presumably, you can deduce the rest from there. + +be careful not to get `ifdef` mixed up with `ifndef`, which checks if a given macro is NOT defined. +its exactly the same as `#if !defined`, but a less obvious; They're both used interchangeably + +Before we go forward, theres something you have to understand. You might be familiar w/ the order +of operations; We sometimes refer to them as (PEMDAS) where "P" means parentheses. + +C has specific rules for operator precedence (which operators are evaluated first) and associativity +(the order in which operators of the same precedence level are evaluated). I often find that the +confusion arises in situations due to the way it looks w/ extra parentheses, the added protection +around a given token(s). Macros depend on having this terse separation (parenthesizing) so you will +often see it exaggerated. + +Next time you see something that fits this description try stripping away the parentheses; +and then look at it again, and it should be more clear and in a way that requires following +the normal hierarchy of precedence within the expression. Macros, being that they are like +their own compile-time language, are just another perspective on how to separate and +interpet those entities in C. + +Lets talk about the utility of a macro. Macro's can simplify an expression. +Heres an example related to rvalues (This is prior to simplification) + + int max_val = type_max(typeof(var)); + int min_val = type_min(typeof(var)); + +Then when we change it to become a macro, and assign it after. + + #define TYPE_MAX(val) type_max(val) + #define TYPE_MIN(val) type_min(val) + + int max_val = TYPE_MAX(var); + int min_val = TYPE_MIN(var); + +which you can see is simpler now to call. + +This behavior applies to all other forms of C and its decree... +Heres a macro to access some element of a tuple-like structure (array) + + #define MACRO(x) ((x)[2]) + int tuple[] = {10, 20, 30, 40}; + +MACRO(x) defines a macro named MACRO that takes a parameter x. +inside the macro, (x)[2] accesses the third element of the array x. + +tuple[] is an integer array with four elements, ` {10, 20, 30, 40} ` +now, if you use the macro MACRO with the array tuple: ` int third_element = MACRO(tuple);` +expands to the following: + + int third_element = ((tuple)[2]); + +which effectively accesses the third element of the array tuple. In this case, +tuple[2] refers to 30 (because arrays are zero-indexed) + + tuple[0] is 10 + tuple[1] is 20 + tuple[2] is 30 + tuple[3] is 40 + +so, MACRO(tuple) will evaluate to tuple[2], giving you 30. +and the same is the case if `x` were a string, then MACRO(x) would access the third character of the string. +and that gives you a re-elucidates what you've learned so far about arrays and such. + +The macro `#define C(x) ((x)-'@') ` , i.e. (x-64) is another example that can be used in various contexts +to denote <Key>Ctrl, characters (e.g. 65 - 64 = 1, and Ctrl-A is the ASCII control character SOH +(Start of Heading) with an ASCII code of "1") making it easy to recognize and handle "Ctrl-X" mappings, + + if (key == C('S')) { + save_file(); + } + +the next macro shifts 2 to the left (n)bits, and subtracts 1. + + #define b(n) (2 << (n)) + #define a(n) ((b(n))-1) + +if you call b(3), it would be equivalent to 2 << 3, resulting in 16 +if you call a(3), it would be equivalent to (2 << 3) - 1 +resulting in 15, which is a binary number with three bits set to 1 (111 in binary) + + printf("b is %d\n", b(3)); + printf("a is %d\n", a(3)); + + 8 in binary = 1000 + 4 in binary = 0100 + 2 in binary = 0010 + 1 in binary = 0001 + +3 in binary is a combination of 1 and 2, yadayada... Instead if we start with 2 +and shift it over three places to the left, we have 100000, and 1 is in the 16th spot +making it 16 + +give these examples time to marinate, the Taj Mahal wasnt built in 45 seconds. + + #define MY_MACRO(ptr) ((*ptr) * 2) + +and you use it as follows: + + int array[5] = {1, 2, 3, 4, 5}; + int *ptr = array; + int result = MY_MACRO(ptr); + +during preprocessing, `MY_MACRO(ptr)` will be replaced with `((*ptr) * 2)` +the `(*ptr)` is just a textual replacement that occurs before actual compilation. + +these are variations of the backslash (\) character +in which it cancels-out the proceeding character (we talk about the behavior of backslash characters in other sections) +but in any other case an escape sequence is determined by the proceeding character e.g. "\n" newline + + char s1[] = "Ca\\ncel"; // ASCII + char s2[] = "Ca\134ncel"; // octal + char s3[] = "Ca\x5Cncel"; // hex + +technically speaking, the backslash is an ID token that goes until it hits whitespace +in which case it might have the effect of continuating a line (or removing the line break +whichever way u prefer to see it) which may fall into the running logic of the rest of the code + +this'll make sense as your brain unlocks the harder-to-grok details. +when you compile and run the following program, it will output: "sequence"... + + #define QUOTE(seq) "\""#seq"\"" + + int main() { + printf("%s\n", QUOTE(sequence)); + return 0; + } + +... by preserving the " " air-quotes around sequence. + +note, the preprocessor has specific ways in which it expands a variable such in +our example `#seq` which acts as the stringification operator, converting the +macro argument seq into a string literal. + +if you read everything on the page about functions, you would have learned that you can use +them to make a `type` of some variable... You can do the same with macro's here: + + #define DEFINE_SERVICE( name ) \ + typedef struct Service_ ## name ## Rec_ \ + Service_ ## name ## Rec ; \ + typedef struct Service_ ## name ## Rec_ \ + const * Service_ ## name ; \ + struct Service_ ## name ## Rec_ + +`name` is a placeholder that will be replaced with the actual service name when the macro is used. +the line, ` Service_ ## name ## Rec ;` defines a structure named "Service_", followed by the +provided name argument, and ending with "Rec_". + +the proceeding part defines a pointer type for the service structure. It uses the same structure +name with "Rec_" appended, followed by const *. This creates a pointer that can point to a +constant Service_ structure (read-only). + +these all come together via the `##` operator that does concatenation, and once you've defined the +service structure and pointer type using the macro, you can declare variables of those types: + + DEFINE_SERVICE(AppendThis) + Service_AppendThis Hello; + +heres another macro you can try... + + #define GIVE(return_type) return_type + #define A_Func(this) \ + Generic(umbrella) \ + That_A_Func(this) + +now when you declare it as a function declaration, it'll look like... + + Generic(umbrella) That_A_Func(AnothType *this); + +the purpose of the first line is to enable flexible declaration of function return types using macros, +and without it the macro Generic(umbrella) would not be expanded correctly in the definition of `A_Func`. +since the #define follows the first one, the definition of `GIVE` is still active, and it will be +available for use within the definition of `A_Func`. So you can kinda see how the preprocessor works. + +`do-while(0)` within a macro is a common C programming idiom. It allows the macro to be used as a +single statement in all contexts, particularly in if-else statements, without causing issues, e.g., + + #define GOOD_MACRO(x) \ + do \ + { \ + if (x) \ + foo(); \ + bar(); \ + } \ + while (0) + + // Call the macro somewhere + GOOD_MACRO(y); + +The backslashes are purely cosmetic for readability and you can alternatively have it defined like... + + #define GOOD_MACRO(x) do { if(x) foo(); bar(); } while(0) + +The next example demonstrates an offset (youll remember we explained how +structs use offsets to access members) note: on my machine `size_t` is a 64 bit unsigned int + + #define offsetof(s,m) ((size_t) & (((s*)0)->m)) + + struct s { + char *a; + int b; + size_t c; + }; + + int main() { + printf("a -- %zu\n", offset(struct s, a)); + printf("b -- %zu\n", offset(struct s, b)); + printf("c -- %zu\n", offset(struct s, c)); + return 0; + } + +`(*s)` dereferences the pointer to the structure. +`(*s)->m` accesses the member `m` within the structure. +`&((*s)->m)` gets the address of the member within the structure. +`(size_t) &((*s)->m)` casts this address to `size_t` +and then, from __main__ our macro, *offset(struct..* takes 2 arguments `struct s, x` + +if for some reason you just cant stomach the macro, here's an alternative method +to offsetof, to set a field's value indirectly. + + struct example { + int a; + double b; + int c; + }; + + int main() { + struct example ex = {0, 0.0, 0}; + + // First calculate offset + size_t offset_c = offsetof(struct example, c); + + int new_value = 42; + *(int *)((char *)&ex + offset_c) = new_value; + + return 0; + } + +In practice, you can simplify your expression by adjusting the cast scope. For instance, +take the example of a preprocessor condition, whereby we're doing something that may not +be valid C—as explicit casting is not meaningful w/ preprocessor directive like `#if` +that do not support runtime constructs—though they are in the case of normal macros.. +That doesnt mean that your allowed to have a directive w/ a macro called in the +condition, that itself handles the casting. It will cause a compilation error. + +Therefore it probably will not work to try this in `#if`, but it helps for demonstrating w/. +Lets say we've casted `int_example` type to the first level, tangent to each variable... +If we move the cast to one of the outermost levels, we'll have, in turn, casted to each variable +encapsulated within the same level of the shared parentheses... does that make sense?... + + #if !((int_example((E + 1) / A) << (F - 2)) & (int_example)G) + +In conclusion, we're trying to use our type `int_example` to cast within a preprocessor +condition (but as we said the preprocessor will not understand or perform type casting) + +Here ive chosen to cast at the second parentheses level, which means ive excluded `G` +from the `int_example` association, and so ive added the cast tangent to it. +You could very well just add the cast to a parentheses that encapsulates ALL, +and have it applied to all of them, and thats usually what is the easiest to do. + +Preprocessor directives like #if, #ifdef, and #ifndef are used in conditional compilation +to include or exclude parts of code based on certain conditions. We can of course make +our example into a macro instead, for a relative notion of what the difference is now. + + #define TRYEXAMPLE(E, A, F, G) \ + (((((E + 1) / (A)) << ((F) - 2)) & ((int)(G)))) + + Congratulations... +You've reached the end. + + diff --git a/malloc.html b/malloc.html new file mode 100644 index 0000000..d906bc4 --- /dev/null +++ b/malloc.html @@ -0,0 +1,66 @@ + + + + + +malloc + + + + +malloc + +malloc is a function that provides a way to allocate memory dynamically at runtime. +it takes the size of the memory block you want to allocate in bytes as input (size_t size). +it returns a void* pointer, which is a generic pointer type. You typically need to cast it to +a specific pointer type (e.g., int*) to access the allocated memory. The allocated memory +resides on the heap, a region of memory separate from the stack. The heap can grow and +shrink as needed during program execution. + +as such, `malloc` and `free` are the go-to functions for most memory allocation needs in C programs: +"managing a pool of data", is a perfect example. Many data structures are dynamic in size, i.e. +the number of nodes in a linked list can grow or shrink as elements are added or removed, +trees can expand or contract as data is inserted or deleted; And thus arrays can be dynamically +resized to accommodate a changing number of elements. + +Here is how i like to think of malloc and memory allocation; think of metrics that track total +memory on a device, which comes in three parts: RAM-based memory consumed, local-memory storage, +CPU% percentage of a given process. You then might ask the question, "why is this application only +5KB... when clearly its doing a whole lot more!" It stores 5 Kilobytes, but then its dynamically +changing size and interacting with the system. this is why we need memory allocation. + +One thing we need to acknowledge is how memory is being allocated at runtime. The program doesn't +know the exact size it needs beforehand, and malloc provides a way to request memory from the +heap based on the program's needs during execution. + +The stack is a fixed-size memory region used for function calls, local variables, and arguments. +its size is determined at compile time. The heap is a more flexible memory pool that can +grow and shrink dynamically at runtime; Alas malloc allocates memory from the heap, and +thus satisfies this notion of runtime allocation. + +The main distinguishing factor being the time at which allocation or execution decisions are made. +Compile time decisions are made before the program runs and are fixed; runtime decisions are made +as the program executes and can change dynamically based on the program’s needs and behavior. + +`malloc` btw is just a void pointer: + + void *malloc(size_t size); + +and so it returns a pointer to the allocated memory block. Thats it. + +offsets (specifically `offsetof()`) helps in determining the correct size +to allocate when using memory allocation (*malloc, calloc, etc..*) for a structure: + + struct Point { + int x; int y; + }; + + struct Point *myPoint = malloc(sizeof(struct Point)); + +you then might use offsetof like: + + size_t structSize = sizeof(struct Point); + offsetof(struct Point, y); + +youll be able to use myPoint and yOffset to access y member + diff --git a/memcpy.html b/memcpy.html new file mode 100644 index 0000000..1ecba00 --- /dev/null +++ b/memcpy.html @@ -0,0 +1,88 @@ + + + + + +memcpy + + + + +memcpy + +memcpy is a standard library function in C that copies a specified number of bytes from a +source memory location to a destination memory location. It is commonly used for copying +blocks of memory, such as arrays or structures. The function operates quickly but has a +limitation: it assumes that the source and destination memory regions do not overlap. + +If there is overlap, the behavior is undefined, which can lead to corrupted data. +In contrast, memmove also copies a specified number of bytes from a source to a destination, +but it is designed to handle overlapping memory regions safely. It ensures that the copy +operation completes correctly even if the source and destination areas overlap. +This makes memmove more versatile, though potentially slightly slower than +memcpy due to the additional checks it performs. Assume we have boxes... + +[Box 1] [Box 2] [Box 3] [Box 4] [Box 5] [Box 6] [Box 7] [Box 8] [Box 9] [Box 10] + 1 2 3 4 5 6 7 8 9 10 + +when tryiong to move boxes or a region of to some other region w/ memcpy, it can lead +to this overlap situation we see here... + +[Box 1] [Box 2] [Box 3] [Box 4] [Box 5] [Box 6] [Box 7] [Box 8] [Box 9] [Box 10] + 1 2 1 2 3 4 5 8 9 10 + +In other words, we were trying to copy the original `Box 1-7` to `Box 3-9` however we did +not handle the overlap, and so the corruption occurred w/ `Box 1` went to `Box 3`, +however the copying overwrote parts of the source data before it could finish. +And the same happened when `Box 2` went to `Box 4`, and so on. + +Obviously you dont want this kindve staggered behavior - which'll happen if the source +and destination regions are overlapping. That is when you should use `memmove` or, +`safe_copy` which is a user-implemented version of memcpy that takes advantage of +a secure data type `rsize_t` (which provides context about the size/boundaries of +memory regions) and which is safer to use. When correctly implemented it can also +be used for this overlapping memory situation that we discussed. + +Next, i thought id show an example of how one uses memcpy. +memcpy btw is still used very often. its used any time you have non-shared regions +of memory. blitting is a classic example of a situation where memcpy is used. + +for simplicity, let's just copy a subset of the source buffer. + + #define BUFFER_SIZE 10 + + void resizeVideo(int *source, int srcWidth, int *dest, int destWidth) { + memcpy(dest, source, sizeof(int) * destWidth); + } + + int main() { + int sourceBuffer[BUFFER_SIZE] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; + int destinationBuffer[BUFFER_SIZE]; + + int sourceWidth = BUFFER_SIZE; + int destWidth = BUFFER_SIZE; + + int scalingFactor = 1; + + if (scalingFactor == 1) { + // Directly copy the content from sourceBuffer to destinationBuffer + memcpy(destinationBuffer, sourceBuffer, sizeof(int) * destWidth); + } else { + // Resize the video (in this example, copy a subset of the source buffer) + resizeVideo(sourceBuffer, sourceWidth, destinationBuffer, destWidth); + } + + // Print the content of destinationBuffer + for (int i = 0; i < BUFFER_SIZE; i++) { + printf("%d, ", destinationBuffer[i]); + } + + printf("\n"); + return 0; + } + +In this example, `memcpy` is used when the scaling factor is 1 to copy the +entire content of `sourceBuffer` to `destinationBuffer`. If the scaling factor +is different, a hypothetical `resizeVideo` function is called to perform some +resizing operation (in this case, just copying a subset of the source buffer) + diff --git a/page10.html b/page10.html new file mode 100644 index 0000000..e2fe9ec --- /dev/null +++ b/page10.html @@ -0,0 +1,172 @@ + + + + + +manip nd attrib + + + + +

    ___ Bit Manip ___

    +In the following we are going to look at bit fields and bitwise operations. +You cannot declare expressions that perform bit-level manipulations DIRECTLY, +as such you must define them in a struct for example, then use them. +Bit fields allow you to specify how many bits that variable should hold, +effectively adding more bit positions-or rather a field of bits, +of which can be either 0 or 1 + +The following declares bit field (1000) i.e. the 4th bit set, in a struct. +This implies you can group related flags together within a structure. +Afterward it is set yet again when we do (1 << 3) which is equivalent to (1 * (2^(3))), +or you could say we shifted a 1 bit (0001) three positions to the left +Whereby we OR (1000) and (1000) together, which equals (1000), as thats what +OR does w/ equivalent bits (as it does e.g., 1010 | 0101 = 1111 w/ indifferent bits) + +When you think of OR, think of amalgamation or always active when indifferent or +better yet refer to AXONN for memorizing every logic gate easily. + + struct { + unsigned int is_hidden : 4; + } FilePermit; + + int main() { + FilePermit.is_hidden |= (1 << 3); + + // Check if the 4th bit is set + int isFourthBitSet = (FilePermit.is_hidden & (1 << 3)) != 0; + printf("4th bit set to: %d\n", isFourthBitSet); + + return 0; + } + +Here's an arbitrary example that does the same thing w/ bitwise shift. + + #define KERMIT (1 << 0) | (1 << 1) | (1 << 2) | (1 << 3) + // Individualized flags, for clarity + #define THEFROG (1 << 0) + #define THETOAD (1 << 1) + #define THEPIG (1 << 2) + #define THEGOOSE (1 << 3) + + struct FilePermit { + unsigned char is_hidden; + }; + + int main() { + struct FilePermit permit = {0}; + permit.is_hidden |= THEGOOSE; + int isFourthBitSet = (permit.is_hidden & THEGOOSE) != 0; + printf("4th bit set to: %d\n", isFourthBitSet); + return 0; + } + +Experiment with it. + +

    ___ Wat dem der attibutes uh fer? ___

    + +e.g., __attribute__((__noreturn__)) + +The "attribute" keyword is considered the beginning of the attribute.. Where as e.g., "noreturn" specifies a characteristic of a given entity. + +[[deprecated]] +[[__deprecated__]] +[[deprecated("reason")]] ... which indicates that the use of the name or entity declared with this attribute is allowed, but discouraged for i.e. reason +[[__deprecated__("reason")]] +[[fallthrough]] ... indicates that the fall through from the previous case label is intentional and should not be diagnosed by a compiler that warns on fall-through +[[__fallthrough__]] +[[nodiscard]] +[[__nodiscard__]] +[[nodiscard("reason")]] ... encourages the compiler to issue a warning if the return value is discarded +[[__nodiscard__("reason")]] +[[maybe_unused]] ... suppresses compiler warnings on unused entities, if any +[[__maybe_unused__]] +[[noreturn]] +[[__noreturn__]] +[[unsequenced]] ... indicates that a function is stateless, effectless, idempotent and independent +[[__unsequenced__]] +[[reproducible]] ... indicates that a function is effectless and idempotent +[[__reproducible__]] + +every standard attribute whose name is of form attr can be also spelled as __attr__ and its meaning is not changed. +this means that, for example, __attribute__((attr)) can be simplified to __attr__ ... And likewise __attribute__((packed)) +can being written as __packed__ ... or to better illustrate this, take the e.g., + + void do_something(int x) __attribute__((noreturn)); + +which is declaring a function that uses this attribute. It could instead be written as... + + void do_something(int x) __noreturn__; + +it is not always the case that you can simplify them, as you'll see in the GCC-specific examples. +note, that you can declare these functions before their definition (separating declaration from implementation) + +but here's a few additional things or outliers i can allude to... + + __attribute__ ((aligned (16)) char stack0[4096 * NCPU]; + +this array is intended to allocate a separate stack for each CPU core in a +multi-core system, each stack being 4096 bytes in size (4096 bytes is a +common size for a stack on many systems). + +generally speaking, its considered attribute that specifies alignment, +and declares an array of characters. It combines alignment specification +(aligned (16)) with array declaration char stack0[4096 * NCPU]; +to allocate a contiguous block of memory (stack0) that is both aligned +on a 16-byte boundary and sized to accommodate multiple stacks for a +specified number of CPU cores (NCPU). + +`4096 * NCPU` calculates the total size of the array. If NCPU is, for example, + 4, then stack0 would be 4096 * 4 = 16384 bytes (16 KB) + +it ensures that the memory allocated for stack0 starts at an address that is +divisible by 16. This is relevant for scenarios where hardware/ software +requires data to be aligned to certain boundaries for efficient memory access, +or to optimize the performance for multi-core systems. + +an example might be w/ SIMD instructions in processors which often +require data to be aligned on boundaries such as 16 bytes to perform efficiently. + +here's another example of an attribute... + + void my_function() __attribute__((noreturn)); + +this `(noreturn)` indicates that my_function does not return to its caller. + + typedef int my_int_type __attribute__((aligned(4)); + +this example aligns instances of my_int_type on a 4-byte boundary, and +they can be used to all sorts of things like this. + +you can apply multiple attributes to a single declaration... + + __attribute__((aligned(16), packed)) struct my_struct { ... }; + +here, aligned(16) aligns the struct on a 16-byte boundary, and packed ensures +that the struct's members are tightly packed without any padding. + +these are of course specific to gcc, nevertheless it gives you some fairly +useful utilities like this. it also provides hooks for function entry and exit, + + void __attribute__((no_instrument_function)) __cyg_profile_func_enter(void *this_func, void *call_site) { + printf("Entering function %p from %p\n", this_func, call_site); + } + + void __attribute__((no_instrument_function)) __cyg_profile_func_exit(void *this_func, void *call_site) { + printf("Exiting function %p to %p\n", this_func, call_site); + } + +when you compile your program with the -finstrument-functions flag, it'll +automatically insert calls to __cyg_profile_func_enter and +__cyg_profile_func_exit at the entry and exit points of +every function in your program. + +other common attributes you might see, + +__attribute__((__bounded__)) +__attribute__((__format__)) +__attribute__((__unused__)) +__attribute__((__used__)) + +you can learn more about gcc attributes here, https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html + diff --git a/py.html b/py.html new file mode 100644 index 0000000..08eeb66 --- /dev/null +++ b/py.html @@ -0,0 +1,403 @@ + + + + + +Would you eat them in a box? + + +

    Warning! Learn C before you learn python

    This is the green eggs & ham that make up pythons indigestible layer. +illustrate the loopiness caused by the equality of identifiers + + · A value passed, always starts from the last (or sum) + · for key:value (what for loops amount to) + · Functions as parameters and type (much like contiguous blocks of memory), + arrays of arrlist[] associates w/ any array type, + and parameters as objects that can inherit from everything + (therefore everything can be accessed through everything) + · Consolidating expressions & comprehension used to assign + whole expressions (generator objects), and variable-value + assignment (as in multi-variable sequence unpacking) + + ..if that doesnt make sense, humour me + + C and Python (although similar in ways) are fundamentally different + + def foo(p): + print(p) + def spam(reassign): + foo(reassign) + variable = 8 + spam(variable) + +We are reassigning and calling a function, because you dont declare things in python +instead you would do something similar, where a function is created and variables are assigned + +The calling convention-system for calling functions in python is not like C, +where you are thinking in terms of framework headers and graphics library headers. +Instead, you have to call a function on an "as needed" basis. Python is like a calculator. +Everything from passing parameters to assigning functions is simplified (which somehow makes it more confusing) +Luckily it doesnt matter if you use a `TAB` character, as you can choose the indentation level; +w/ that said its a common convention to use (4x) spaces. Indentation is essential for defining the statement block. + +You can think of python like your trying to create functions inside `main.c`, +where you call them at the end (`int main`), thats reminiscent to a python script + + def sum(iterable, start=0): + total = start + for item in iterable: + total += item + return total + +For loops can have a `key:value` type format +Once you understand the format of For loops, you'll have 50% of the battle cleared + + for key, value in function(param): + +this `key:value` concept is essential to python, or pythons methodology. There is also this inherent consolidation behavior of functions, +where different functions hold different values, and calling `print(result)` would output the consolidation of those values + +Python's "invoking" behavior is such that if you pass in a value, it's always starting from the sum or (last) by default + + def func(x): + def inner(y): + return x + y + return inner + newFunc = func(5) + result = newFunc(3) #Outputs 8 + +In the example above, `inner` is a closure that captures the variable `x` from the enclosing scope of `func` +"currying" carries over/translates the evaluation of a function-closure that took multiple arguments + + def func(x, y): + return x + y + double = partial(func, 3) + result = double(5) #Outputs 8 + +partials capture the instructions and assign a variable within its parameters... + + def add(x, y): + return x + y + def execFunc(func, x, y): + return func(x, y) + result = execFunc(add, 5, 3) #Outputs 8 + +here, you can also produce the same functionality without doing anything special. +Think of function parameters like contiguous blocks of memory that you access. +You can embed these concepts within each other (as well as return an anonymous function (lambda)) +as this is the main idea behind python, it makes functions have "composability", such as passing +a function as a parameter or assigning it to another function... It then further consolidates when +you assign it and pass in a value. if you are someone who's against these implicit, non-obvious things, +python allows some forms of explicitness. + + · `:` after a variable to annotate type + · `:` also represents the end of a `func(var:type):` + · You use `func(var)->type:` to specify return type + · arrays start at 0, per usual + · nd u have 'slices', and 'range' specification e.g. `1:4:2`, or start at 1, stop at 4, and step through every 2 elements + +more implicit + + · the assignment expression operator `:=` allows you to assign values to variables as yet another consolidated form of expression. + You have to wrap the expression within `()` brace delimeters as well + · by simply using (e.g. comparison operators) python infers a boolean output + · For loops let you specify names you havent even declared, i.e. `NameKey:NamePair` format i explained + +For loops in python are just statements, and these statements can iterate over another sequence or statement + + def gen_num(): + for i in range(5): + yield i + for number in gen_num(): + print(number) + +If you dont include `yield`, then you cant specify the variable from the function you want to use + + def countdown(start): + while start > 0: + yield start + start -= 1 + +In this e.g. `countdown` is a generator function that takes a `start` parameter. Inside the function, +there's a while loop that continues as long as `start` is greater than 0. Within each iteration of the loop, +it yields the current value of `start` and decrements it by `start-1` +You can use this generator function w/ `next()` + + gen = countdown(5) + print(next(gen)) #Outputs 5 + print(next(gen)) #Outputs 4 + +And it saves you from having to recall `gen = countdown(...)` + +Theres a few other things that involve arrays and iteration in python, however theres another thing called +Generator Expressions (Comprehension), that is to say you can have set comprehension for e.g. + + function = {num**2 for num in range(5)} + +Which is equivalent to if you were finding the square for the index (num) inside the loop every iteration +Notice whole expressions that use comprehension go inside the delimeter tokens (e.g. curly-braces) + +There's a technical fact about the asterisk `**` is the n^2, squaring operator. But you might also see +`*` or `**kwargs` (variadic parameters) before args to indicate that there is "other" parameters, +however the double asterisk is only for dictionaries... It can also be used IN PLACE OF a parameter, +which represents that every argument after it is "ordered, and explicitly named" in its use; +Like, a technical indicator. (same is the case for `/` which specifies the parameters before) +Pointers in python arent a thing. + +`range(5)` is equivalent to, n < 5, and it starts at 0 by default, or you could have also specified the +start e.g. 1 in the loop `range(1,5)` + +Theres a few methods/functions for string manipulation in python, but the underlying mechanism remains the same; +strings are sequences of characters, each character has an index (position) within the string, and each character +is represented by an underlying, numerical ASCII value + +You dont have structs in python, but you do have classes which could be considered the equivalent... +You designate a private class `__name` w/ two underscores, and you can gain access to whatever it is +associated w/, through a public (no underscores) class, that returns the same argument. +It is also recommended by python to use a getter method associated w/ private/public. +This is also where `@property` decorator might comes into play + +You might also see `_` underscore used as a placeholder/disposable.

    Python Arrays (which are also a `type` of said array)

    · brackets `[]` refer to a `list` + · curly-braces `{}` refer to either a `dictionary`, a `set` or a `frozenset`... + · braces `()` refer to a `tuple` + +Mutability & Order + + · mutable arrays can be changed, and immutable arrays cannot be. + · order refers to the sequence of elements within data structures. so an unordered array will presumably be "out of order" during operations + +Ordered arrays (as of python 3.7 for dictionaries) + + list = ["mutable", "elements"] + dict = {'mutable_key': 'mutable_val'} + tuple = ("immutable", "elements") + +Unordered arrays + + set = {"mutable", "elements"} + frozenset = {"immutable", "elements"} + +In summary +- I portrayed arrays in this way only to demonstrate what the syntax might look like, there isnt a prejudice on what element/type you fill it w/ + +Rememberance +- `list` look the most like a regular array to me, so its probably better to think of lists first, "arrlist" +- `dictionary` {'a collection/interpolation of key values'} i call it a "KeyPair Set" +- `tuple` ("immutable") you could think of a tuple as a regular function, in array form (3rd evolution), + more over, a tuple just means a finite sequence/ordered list of numbers. + +Purpose +- Use a `list` when you want an ordered collection of elements. Use a `set` if the order doesnt matter and +each element is unique in some other regard. Sets can be associated with the `union()` method + +More behavior +- And you can have multiple items within an element e.g. `[3.14, {item1, item2}]` ... is a set within a list and that set has two items + +Comments +- Python will accept single or double quotes for key-value pairs, and the same for strings i presume... its a common unix trait +Hash `#` denotes a comment, and you can use `"""` triple quotes around internal comments, which is the convention for `docstrings`. see `f'strings` + +Other Types +- `bytearray` is a smaller type of list. `bytes` represents an immutable sequence of bytes. `bool`, `int`, `complex` and `str` are also all immutable types + +If you link to `import numpy as np` you have access to vectors + + arr = np.array([1, 2, 3]) + +Note, that it is distinct from a tuple because its mutable. +Moving on, lets look at how a linear function can be implemented + + def linear_function(x): + m = 1 # Slope + b = 0 # Y-intercept + return m * x + b + x1 = 0 + x2 = 10 + +Find points between x1 and x2... + + x_values = range(x1, x2 + 1) + points = [(x, linear_function(x)) for x in x_values] + +Theres a couple different things about python functions, and that is if you only specify one/two variables in the parameter, +it will just assume the rest, for example `range` takes three parameters, but you can just give it the stop variable (others optional) +`x` and `y` coordinates are being specified within the tuple `(x, linear_function(x))`, and we're iterating `x2`, and the Y-intersect +starts at 0 and increases by `x2` every iteration. `x` starts at 1, but increases by `x2`. This represents a simple line segment for y=mx+b, +and conceivably, you can change those values. + + m = ∆y/∆x + +So, for a line where (m = 1), this means that for every 1 unit increase in `x` (horizontal movement), `y` also increases by 1 unit (vertical movement). +This creates a situation where the line rises at an angle where the vertical and horizontal movements are equal in length, forming a 45-degree angle with the x-axis... + + tan(I) = opposite/adjacent + tan(I) = ∆y/∆x = 1 + +nevermind the math, i just wanted to throw that in + +Example2: Consider a function `doclip`, which has this argument; `for i in range(3, len):` ... and we're iterating over the arrays +`L` and `R` starting from index 3, up until (len - 1) (A technical fact about `range`, is that it goes up to (but not including) length - 1) + +But lets also look at the linear interpolation part + + out[i] = L[i] + (f * ((R[i] - L[i]) >> HH_P)) + +It calculates a linear interpolation between the corresponding elements of `L` and `R`. The result is stored in the `out` array +at the same index `i`. This operation is part of the clipping process for coordinates other than `x`, `y`, and `z` + +After the loop, there is a separate operation + + out[2] = L[2] + (fhp * (R[2] - L[2]) >> 15) + +This specifically handles the z-coordinate with extra precision. It performs a similar linear interpolation between the +z-coordinates of `L` and `R` and stores the result in the `out` array at index 2 + +There was something way back i was interested in, in math, which had to do with odd/even. That could be interesting if +you think about it since it spans the entirety of every 2 numbers, you can conceivably do any kindve operation at any distance. +You can use the modulo operator to do operations on odds, perhaps we'll use it in the context of python: % = mod operator + + sequence = [1,2,3,4,5,6,7,8] + oddElements = [x for x in sequence if x % 2 != 0] + print(oddElements) + +This was another example how python lets you use conditions within another condition, as well as during assignment. +And if you dont understand the modulo operator, here's a better demonstration [ex. 14 mod5] + + 5 goes into 14, (2x) + 2 * 5 = 10 + 14 - 10 = 4 + +Here's another example [ex. 3 mod6] + + 6 goes into 3, (0x) + 0 * 6 = 0 + 3 - 0 = 3 + +Theres documentation online and on the command line called `pydoc`, for example you can look up individual things +e.g. `pydoc enumerate`, enumerate is a function that takes two parameters, one for the `index`'s size, and one for its `value`. +By that same token, `len(a)` is logically similar to accessing the size of an array w/ `sizeof(array)` +in C, as it is used to get the length of objects and arrays + +We've probably looked at For loops a billion*times, nevertheless lets look at one again to fully understand +both a For loop, and how its more efficient w/ `yield`; First, using a non-yielding example: + + def fibonacci(n): + sequence = [] + a, b = 0, 1 + for _ in range(n): + sequence.append(a) + a, b = b, a + b + return sequence + result = fibonacci(8) + +(`_`) Underscore informs the loop that we dont want to use a loop variable in the loop body. `range`, means +we are iterating `n` amount of times (`n` elements). `append` appends the 'current' value `a` to the sequence. +`a, b = b, a + b` ("sequence unpacking", technical term) its really just doing regular assignment... +it only looks strange cause python lets you do arithmetic during assignment - in short, `a` takes the value of +the previous `b`, and `b` takes the sum of the previous values of `a` and `b` (it sounds way more complicated +explained like that, but its just regular assignment that goes through iteration) +Then we assign/create the list all at once. + +Its preferred to use a generator w/ `yield` in the case of iterating over a large amount of data, +or data you dont need "all at once" + + def fibonacci(n): + a, b = 0, 1 + for _ in range(n): + yield a + a, b = b, a + b + result = list(fibonacci(8)) + +We get the current fibonacci number and pause anytime we encounter `yield`, until the next iteration: `a, b = b, a + b` +When we call & convert `fibonacci` to a `list` array, it consumes and collects all the values generated +by the `yield` statement. The program has encountered the value we want for `range`, +so we can return to `yield` and properly iterate through. + + my_list = ['element1', 'element2', 'element3'] + for index, item in enumerate(my_list): + print(index, item) + +I just want to recap; functions in python encapsulate arrays, and we explained how `enumerate` works, but +lets look at another example that involves iterating over a class. +youll notice, python lets you inherit `.objects` from EVERYTHING, literally + + class NameClass: + def __init__(self, data): + self.data = data + self.index = 0 + + def __iter__(self): + return self + + def __next__(self): + if self.index >= len(self.data): + raise StopIteration + value = self.data[self.index] + self.index += 1 + return value + + function = NameClass(['element1', 'element2', 'element3']) + for item in function: + print(item) + +- A function that takes two parameters, `self` (which refers to the instance of the class) and `data` + (which is the list we want to iterate over) +- Assign `self.data`, so that it can be accessed throughout the class. We initialize `index` to keep + track of the current position in the list `data` +- `__iter__` is used in a loop. It returns an iterator object, in this case, it returns `self`. +- Every time the next element is needed in the iteration, i.e. `__next__`. +- If the index is greater than or equal to the length of the data list. If it is, it means we've + reached the end of the list, so we raise a `StopIteration` exception to signal the end of iteration. +- `value = self.data[self.index]`, retrieve's the value at the current index from the `data` list. +- Then, increment the index so that the next time `__next__` is called, it will retrieve the next + element in the list. +- Finally, we return the value retrieved from the list. Then after we assign `function`, we iterate + over the list and print each item. This demonstrates both iteration within a class, assigning a function + to a class and the encapsulation of functions/classes. + +Lets look at one more example just to demonstrate the versatility of types and classes... +We briefly mentioned how `func(var:type)` lets you annotate a type of some variable, +and we might of seen how we assign functions, but look at how we assign a dictionary: + + class Counter: + def __init__(self, iterable=None): + self.data = {} + if iterable: + self.update(iterable) + + def update(self, iterable): + for item in iterable: + self.data[item] = self.data.get(item, 0) + 1 + + def __getitem__(self, item): + return self.data.get(item, 0) + +- In short, we are initializing a variable to an empty dictionary, which allows you + to store elements inside said variable thats now associated w/ the dictionary. +- Then in the update method, the `iterable` argument represents a collection of elements + that you want to count. The For loop iterates over each element (item) in the iterable. +- For each item, the method updates the count in the dictionary `self.data`. + `self.data[item]` accesses the value (count) associated with the current item. + `self.data.get(item, 0)` returns the current count of item, or 0 if item is + not already in `self.data`, notice we are storing an element (`item`) + inside said variable which is now associated w/ said dictionary, and we + made it w/ an "arrlist". `self.data[item] = self.data.get(item, 0) + 1` + increments the count of item by 1 and updates it in the dictionary, and + we are using it as a regular function to specify each argument. + +Now using this function, we would do something like the following + + def display_rate(counter: Counter[str]) -> None: + +This demonstrates the versatility of classes, as well as how a function can be a type +of any array, by which you can associate the `[]` arrlist w/ subsequent array type. + +Congratulations, you've now learned the green eggs and ham, as well as all 500 array and iteration methods in python. +Just practice those examples over and over and youll have memorized how python works. You might also consider trying +C-Extended Python or (Cython) which has a better runtime performance by adding an extra step of compilation. +It translates python into C, which can then be compiled into native machine code, and this could be +particularly beneficial while working w/ large projects remotely in my opinion. +The syntax fuses together C & Python... which might even make it easier to +understand in some cases as it forces the, once pythonic code, to utilize an +explicitly- "static" type of syntax (explicit type annotations are optional) + diff --git a/qfmtsp.html b/qfmtsp.html new file mode 100644 index 0000000..cb292f9 --- /dev/null +++ b/qfmtsp.html @@ -0,0 +1,242 @@ + + + + + +type + + + +
    --- control flow ---
    + +The `if` statement allows you to execute a block of code if a specified condition is +true/false or equal to, and likewise `else` & `elseif` provide an alternative +branch or block of code to be executed, given the condition is true/false or equal to. +In that sense, switch statements are a way for multi-way branching sequences to be +evaluated based on the value of some expression. + +`else if` allows you to specify another condition to test the previous `if` or +`else if` conditions. When you have a single statement (as opposed to one with explicit +opening and closing braces), the very first function call is associated with that `if`, +but the second one is not, i.e., + + if (condition) + do_something(); + do_something_else(); // not part of it + +Also, an `else` and/or `else if` following an `if` is associated with that block. + +When you have a function and/or variables within the condition of an `if` statement, it +leads to the function being executed at that point in the program flow, only if the function's +return value evaluates to true according to C's truthiness rules; That is, any non-zero +value is considered true, where the value zero is treated as false. + +Its at that point that the associated `else` statement would be evaluated instead, +if an `else` statement is present. Conditions can also depend on a given expression +and whatever the operator is evaluated to, leading to that same sequence of rules +with `else` and `else if` that follow. + +Then you of course have loops. Loops in C, are constructs that allow you to execute a +block of code repeatedly until the respective conditions are no longer true. +this eliminates the need to write the same code multiple times. + +I'll try to focus on some of the intricacies of them, specifically during for loops. +In a for loop, the first part of the loop (before the first semicolon) is specifically +for initializing variables BEFORE the loop starts. These initializations are executed +only once, right before the loop begins its first iteration. `for (i = j = ...`, would +be an example of having TWO variables initialized before the loop begins, where as in +this example `for (i = j; j = ..`, "j" is being set after the loop begins, albeit at +the start of each loop iteration. + +I am of course demonstrating this in the ANSI C89 style of creating for loops, where you +have presumably initialized them earlier in the function. Most of the time, with some +simple adjustments you can interpret one interchangeably with the other style; Its just +a matter of making sure that when you move the declaration of e.g. "int i" into the +condition of the for loop, that nothing else depends on "int i", otherwise you should +keep the "int i" declaration, and then change the for loop to the style you prefer. + +Before every iteration, the loop condition (e.g., `i < 5` in a for loop) is checked. +If the condition is true, the loop body is executed; if it's false, the loop terminates. +As the loop iterates, each statement in the loop body is executed sequentially. + +You might see the `continue` keyword used in the loop body. When the program encounters +`continue`, it skips the remaining lines of code in the current iteration (of the body) +and then immediately proceeds to the next iteration of the loop. The loop condition is +still checked before the next iteration begins. Here's how it can be used meaningfully +w/ an if statement inside of a for loop, + + for (int i = 0; i < 5; i++) { + if (i == 2) { + continue; // Skip the rest of the loop body when i is 2 + // .. where upon this will be skipped + } + // .. This is also skipped when i = 2 + printf("i = %d\n", i); + } + +If (some condition), then continue past the rest of the iteration. In our example, the output +will only leave out "i = 2" since that is what we skipped. + +The `break` keyword is used to exit a switch statement/loop prematurely. When it see's a break +the loop immediately terminates, and control is passed to whatever comes after it. So if `break` +is responsible for breaking at the point that its encountered, `return` (which is optional) exits +transfers control from a given block of code or function to another, or optionally returns a +value to the caller. For example `if (j == 2) { break; ... `, will exit the inner loop when +j is 2; Or in a switch statement, if it matches, e.g. `case 1: ... break;`, it will exit the +switch statement once it see's the `break`, bypassing remaining case labels and the default case. + +In the context of control statements (w/ return type void), without having specified the value +associated w/ the return statement, return will still function to exit, if said condition is met. +If the condition is false, the function will continue executing the rest of the code. A void +function is explicitly defined not to return any value, so attempting to return a value (e.g., +return 0;) would violate the rules of C, resulting in an error. see func.html to +learn more about return statements. + +The start and end of the statement also marks the memory allocation boundary, or +rather a new sequence of memory. You might then call the start or redirection of a +proceeding sequence "control flow transfer" or "control flow manipulation" -which refers +to the mechanism by which the program's execution flow is redirected based on the condition +or loop. Its something you have to be conscious of in situations where specific sequences of +memory matter (or more importantly) situations where the exact format-layout of the code matters. +the same is the case where you have ordered vs non-ordered arrays. In C, these are ordered by default. + +A `do...while` loop is similar to a while loop, with one key difference: +the `do...while` loop guarantees that the code inside the loop will be executed at least once, +regardless of whether the condition is true initially. This is because the condition is evaluated +after the code block has been executed. + +A goto statement provides a way to transfer control to a labeled statement within a function. +Once the program has encountered it, it jumps to the specified label. Execution continues sequentially +from that point until it reaches the end of the function or encounters a return statement. + +The key thing to remember is that execution will continue sequentially after the jump, unless +explicitly redirected or interrupted. This means execution can be redirected by control flow statements +in their typical way, including a `goto` jumping the program elsewhere in the function. Execution under +a label continues to occur even if another label falls within the sequence of that execution. + +The structured use of goto and labels helps avoid collisions or conflicts in control flow because +each labeled section serves a specific purpose within the function. There are no ambiguities about +which code path or branch to follow after jumping to a label. It creates an explicit divide, +so you'll often see functions that design themselves in such a way where they can recall +the same code multiple times, which may have otherwise been problematic without. + +Lets dig a little deeper. Control flow statements primarily work by altering the program +counter (the CPU register that keeps track of the next instruction as explained) +they don't directly navigate memory in the sense of manipulating memory addresses. +For example, when an `if` statement is encountered, the condition is evaluated. +If the condition is true, the program counter is updated to point to the first +instruction within the `if` block. + +These instructions are then fetched and executed from memory. If the condition is false +and an `else` block exists, the program counter is updated to point to the first instruction +within the `else` block; Otherwise, the program continues execution from the next +instruction after the `if` statement. + +Understanding control flow is crucial not just for writing functional code but also for +maintaining secure and error-free applications. `if` statements can help mitigate some of +the more deviant errors that can arise during program execution. By properly managing +conditions and branches, you can reduce the risk of errors like overflow errors (a max +integer bound that could potentially wrap around) Buffer overflow prevention is essential +to prevent integer overflows, memory leaks, and null pointer dereferences. Heres an example: + + int main() { + #define MAX_VAL 65535 + unsigned int value = MAX_VAL; + + if (value == MAX_VAL) { + printf("Overflow would occur!\n"); + } else { + value++; + /* very dumb example just for illustration purposes... + * post increment means it stays that value til after assignment, + * thats not important to know for this demonstration but now you know + */ + printf("Incremented value: %u\n", value); + } + return 0; + } + +Thus the increment will only happen if the program doesnt encounter the value at UINT_MAX, +which'll consequently prevent the overflow... its quite common to use an if statement to +check a condition and perform an action at once. For more information on errors you should avoid see this + + +
    ---format specifier---
    +this example demonstrates the more common data types +available, as well as which format specifier to use +they are to infer what type of data should be printed +or scanned in input and output operations... ... +(we'll go over what all of that means as we progress) + + int main() { + printf("query format specifiers\n"); + printf("%d | i (signed integer)\n"); + printf("%u (unsigned integer)\n"); + printf("%o (unsigned octal)\n"); + printf("%x (unsigned hexadecimal)\n"); + printf("%f (decimal floating point)\n"); + printf("%e (scientific notation)\n"); + printf("%a (hexadecimal floating point)\n"); + printf("%c (character)\n"); + printf("%s (string of characters)\n"); + printf("%p (pointer address)\n"); + + return 0; + } + + +---operators--- + + && || ! logical \ + + - * / % arithmetic \ + ++ - - unary aerith \ + +When you use the unary increment (++) or decrement (- -) operators on a variable \ +they directly modify the variable's value within its lifetime. \ + +This means that the change to the variable's value persists regardless of whether the \ +variable is being assigned to another variable or used as an r-value (the right-hand side of an assignment or in an expression) + == != > < >= <= relational \ + & | ^ ~ << >> bitwise \ + = += -= *= /= %= <<= >>= &= ^= |= assignment \ + +Here's an interesting demonstration of the unary pre-increment, consider `if (++example == newvalue)`. +Therefore `++example` increments the value of `example` by 1 before performing the comparison, +whereby the incremented value of `example` is then compared with `newvalue`. +This sometimes depends on factors such as the size of the "type" that it points to. + +During post-increment, the evaluation of the condition happens with the +original value of the variable, before the increment takes place. +After the comparison, then `example` is incremented... +And, because unary increment/decrement affect the variable in the lifetime +of a given statement, than it still remains effective, albeit for +different reasoning. We'll be explaining this more as we go. + + +
    ---storage class specifier---
    +`static` is used to specify the storage duration (lifetime) and linkage (visibility) of a variable +when used as a local variable within a function, it makes the variable retain its value +between function calls and gives it internal linkage +when used w/ a global variable or function, it gives said variable internal linkage, which means +the variable is visible only within the same translation unit (the same source file) +`extern` is used to declare a variable or function that is defined in another file or translation unit +it specifies that the variable or function has external linkage (visible across multiple translation units) + +
    ---type qualifier---
    +`const` is used to specify that a variable's value cannot be modified after initialization +consider it also a form of documentation and a contract with the caller that a function shouldnt +attempt to modify the provided variable. + +The value of a const-qualified object however is not a constant expression in the full sense of the term, +and cannot be used for array dimensions, case labels, and the like. (C is unlike C++ in this regard.) +When you need a true compile-time constant, use a preprocessor #define (or enum)! + +i personally always get confused by these terms like, const-qualified, logically quantified, etc... + +look up the following terms if you want resolve any confusion that may be caused by like-terms, e.g. +modifiers (modify other identifiers), quantifiers (in mathematics they specify quantities of a set) +identifiers (the names of the entities in the programming language), delimeters (specify boundaries) + +go to the next page + + diff --git a/sh.html b/sh.html new file mode 100644 index 0000000..e0cd735 --- /dev/null +++ b/sh.html @@ -0,0 +1,934 @@ + + + + + +sh + + + +what is the shell﹖ + +~ home (user) directory / root (filesystem) directory + +Shell is a command interpreter intended for both interactive (from command line) and shell-script use. + +Comments # are made w/ hashtag, a special character. +When used as a sha-bang #! it tells your system which interpreter to use to parse the rest of the file; + + #!/bin/sh + +Or you can override it by explicitly specifying the shell and writing the filename of your script to run it. + +"Change directory", cd traverses you forward or backward to a relative or absolute +file path location, wherein you could point and run that file relatively, as we'll explain. + +Command plus a single period . refers to your current working directory. +When specifying the path to a file `./my/path/` is pointing to the relative location. +Where as running the following would be directly specifying that path and file. + + /path/example_script.sh + +This applies to those files you want to execute as well. + +./ is a reference to the current directory, and ./* is a shell globbing pattern that matches files and +directories in the current directory. We'll talk more about globbing later They both have a catch in that they +dont help in situations where you need to include hidden files or directories; i.e. when using e.g., `-a` flag. +Instead you should just use a plain dot. This example uses the `cp` (copy) command. + + cp -a . /dir + +Or you can use the `-r` flag, since the -a (archive) flag is essentially a combination of -r (recursive) plus other +options that preserve symbolic links, file permissions, ownerships, and timestamps; A more faithful copy of the directory. + +mv command by itself doesn't have an option to automatically include hidden files. The inclusion of hidden files +has to be handled through globbing or other workarounds. + +i find its best to think of an option as an "extension" to a command, as opposed to some parameter of related +letters, as every command is unique. They do not take assignment with = instead you use the `set` +command which sets or unsets values of shell options and positional parameters, changing that attributes value. + +So when trying to understand a script better for example, common flags are -x (debug) and -v (verbose) options. +-v echoes the line/process as it is read, while the -x option echoes as it's executed (with + preceding each line) +This helps you see exactly what commands are being run and what their final arguments are after any variable substitutions or expansions. + +-n will read a script and parse commands, but it does not execute. -e immediately exits the script if an error is present. +-t executes one line and then exits. -a (archive) all variables that have been modified or created are exported. + +Passing flags or options to a command is quite simple in general. A single dash is commonly used to denote short-form +options/flags. Double dash (--) is sometimes a long-form options. However it is also used to signify the end of +command-line options and the beginning of positional arguments, telling the command-line parser that everything +following it should be treated as positional arguments or options for the command that proceeds the (--) + +Spaces and dashes alike are then utilized to clearly demarcate boundaries between different commands and their arguments, +especially in the context of complex command pipelines—special sequences that separate those arguments, which is +ensuring that each part is properly understood by the command-line parser. + +Dollar sign $ is a meta-character/sigil and tells the shell the next word is a variable. +Of course within quotation marks its considered a regular character, as Quoted characters do not +have a special meaning. Quotes are unique in that they behave like a toggle. + +A variable is a string of characters in shell that stores some value. +That value could be an integer, filename, string or some shell command itself. + +A command is just a program interpreted by the system. NOTE: I'll only be going over a +handleful of commands. Its up to you to explore and find out more about which commands there are. +Also, if your interests are in understanding what a shell is and its inner workings, you could +look up; lexer, parser, core, executor (input/output, command execution, etc.), command +handling (characters, strings and command file), and shell subsystem/input line (history, +command management, input handling, etc.). + + variable=value + +Putting a space before or after = equals different results. This will be demonstrated as we go on. + +Setting a variable to an empty string looks something like: + + variable="" + +Environment variables are a mechanism that passes information to all processes, created by a parent process. +By default there are typically some pre-assigned variables. Every program will inherit these variables. +The information flow is one-way, meaning shell script cannot change the current directory(parent). + +There are two different kinds of variables. Environment variables that are exported to all processes +spawned by the shell. Their settings can be seen with the env command. A subset of environment variables, +such as PATH, affects the behavior of the shell itself, ergo it specifies a list of directories where +the shell looks for executable files after typing a command. + + example_command + +This would be equivalently to run a command who's path is listed under $PATH... +You can also associate a custom environment variable with a path using `export` +to create said relationship. Running the commands from the path would then look like... + + $MY_ENV_VAR/example_command + +Local variables affect only the current shell instance. They are defined within a script and +they are not available outside of the script or function where they are defined... +To access the value stored in a variable, prefix its name with the dollar sign $ Now, +the following script will access the value of defined variable `NAME` and print it to stdout; + + NAME="Marco_Polo" + echo $NAME + +... or how about passing an argument to a script?... Typically we think of this format w/ a +command and file: a-w sets writable "off" for All->(user,group,other), +regardless of set bit occupation, henceforth this file would be considered a "write-protected regular file" + + chmod a-w ./example + +But there's also positional parameters that pass in commands +like this (will explain further on). + + ./example.sh 1 2 3 + +You can find out what an individual variable is set to e.g. + + echo $WHATEVER + +$PS1 specifies the prompt printed before each command. +Usually this is $ + $PS2 defines the secondary prompt, the prompt you see after +multi-line commands such as for or if. + + echo $0 + +Displays that shell that had run the command. + + echo $HOME + +is the equivalent of echo /home/user (or whatever your home directory is) + +$PATH variable lists directories that contain commands. +If we have several commands in there, the directories are searched in the order specified. +An empty string corresponds to your current directory. + +$CDPATH sets a path that tells the cd command where to search. For example if you set +CDPATH=$HOME, you can cd to any subdirectory of $HOME from the current directory you are in. + + ls /your/directory +ls: cannot access '/your/directory': No such file or directory + + mkdir /your/directory + +Anytime i use the rm command i use the -ir flags as ive personally deleted things +accidently. Or, having a "write-protected regular file" permission set from the start is preferable. +The same thing using mv, as i recommend using cp over mv when applicable. With that said, +i find the following behavior insightful; + + rm -i file1 file2 + +The shell breaks this line up into four words. +The first word is the command/program to execute. +The next three are passed to the program as three +arguments. So the program rm looks at the first argument, +realizes it is an option, because of the hyphen, and treats the next two arguments as filenames: + + echo "The directory your in is $PWD/filename.jpg" + The directory your in is /currentdir/filename.jpg + +The following is more of a fact about linux and the filesystem, but i still think its one, +if not the most quintessential and important things to know. If you have a user and a root +account, you can make a symlink, e.g. + + ln -s /dir/here /my/location/there + +This'll auto-create a directory (symlink) `/my/location/there/here` and allows you to reference +those file(s) from what's considered the directory being pointed to. Symbolic links are sometimes +called (soft) + +Note, i dont do this for those hidden files in (~) or (/home/user), and considering inter-activity +between users its more common to have user-specific configurations that the root then inherits. +Thats just my personal recommendation. + +... +┌── ln(1) link, ln -- make links +│ ┌── Create a symbolic link. +│ │ ┌── the optional path to the intended symlink +│ │ │ if omitted, symlink is in . named as destination +│ │ │ can use . or ~ or other relative paths +│ │ ┌─────┴────────┐ +ln -s /path/to/original /path/to/symlink + └───────┬───────┘ + └── the path to the original file + can use . or ~ or other relative paths + +The filesystem allocates a new inode specifically for a created symlink, that is separate from the inode +of the target file/directory and which doesn't contain the actual data of the target file; it just stores +the path (a string) that points to the target. An inode is something that stores information about a file +or directory. + +Hard links are a little different. They dont create a separate inode. They're essentially another name for +an existing file; That is, both the original file and the hard link share the same inode number, meaning +they point to the same data on the disk. + +Its obviously not the same as copying a file directly, as any changes made to the content of one file that +share a hard link will be reflected by the other. However deleting a hard link does not mean that it +deletes the other hard link. The data remains so long as there’s at least one other hard link. + +In Unix/Linux systems, file permissions determine who can read, write, or execute files and directories. +One example is when you need to run `chmod 0755 dir/file`, in order to execute a given file. + +Theres more we could say about reading, writing, permissions and so forth, however lets try to keep +things relatively focused in terms of how it applies to the command shell. + +For command substitution $(command) is considered the proper method, here's an example of that; + + A=$(expr $A + 2) + echo "$A inefficient yet simple" + +Single quotes would treat everything as plain characters. + +We show an example w/ expr used again later on. Another one is `eval`, which is simple also. +It evaluates a given variable and runs the command associated with it, as opposed to echo'ing it out. +You can remember it like this, eval is eval;then;run command. + +The entr command is a utility that runs arbitrary commands when a file(s) has changed. If youve ever +used a record command or `watch` in gdb then you've probably done something similar like this before, +except that entr additionally executes (autonomously) when encountering changes in, continuing to +monitor and do whatever its been instructed to. This behavior persists (in the background) until +you$ kill -SIGSTOP or whatever. + +As we had briefly mentioned, env and export without specifying anything after it will display the current +environment variables that are inherited by any command executed within the same shell session. +(export displays those marked for export to child processes) + +set without an argument will also list environment variables, as well as shell-specific variables and functions. +It can set or unset with the + or - option. It can be used with positional parameters for example: + + set apple banana carrot + +This will set apple to correspond to the $1 parameter, banana to $2, and carrot to $3 + (..will explain further on) + +You can set and export in one line. + + export APPLE="my apple" + +unset can be used to undefine any variable. + +The export command is necessary to update the environment variable. It lists all the exported variables; + +For environment variables to persist they must be set in file. By default these are hidden files in your system's root. +However you can setup a user to have environment variables in the /etc designated for system configurations, +then ../environment which should already exist. + + ls -a home/user + +should also contain familiar configuration files. + +$IFS is the "Input Field Seperator" +IFS is a special variable which lists the characters used to terminate a word. +Whitespace is what separates characters. This variable contains a space, a tab, and a newline. + +If you are unsure about overriding your main IFS, you can set a different variable to it before hand like this; + + OLDIFS=$IFS + +env allows you to run another program in a custom environment without modifying the current one. + +You can imagine that a program generates a child process. And this process has the same environment as its parent. +The process ID number is different, and this is typically referred to as 'forking'. Forking provides a way for an +existing process to start a new one. However, there may be situations where a child process is not part of the +same program as the parent process. In this case exec is used. It will execute a program; however the +command-to-follow replaces the current shell -> which means no subshell is created during this, and the +'current process' is replaced with this new command. + +Ampersand is a funny symbol that functions differently depending on context, just as most symbols do. + + ls /path/to/directory & sleep 10 + +In this example we list the contents of the specified directory and (&) puts the ls command in the background, +allowing the shell to immediately start executing the next command. `sleep 10` pauses for 10 seconds. Since ls is +running in the background, `sleep 10` starts executing right away. + +You can also monitor and control jobs in the shell. Jobs are processes or groups of processes created for +commands or pipelines. At a minimum, the shell keeps track of the status of the background (i.e. asynchronous) +jobs that currently exist; this information can be displayed using the jobs commands. + +If job control is fully enabled (using set -m or set -o monitor), as it is for interactive shells, +the processes of a job are placed in their own process group. Foreground jobs can be stopped by typing +the suspend character from the terminal (normally ^Z), jobs can be restarted in either the foreground or +background using the fg and bg commands, and the state of the terminal is saved or restored when a +foreground job is stopped or restarted, respectively. Continuing on to regexp... + +Lets give a brief summary of regular expressions, and how things like python, grep and vim-search have two different +modes of character interpretation: literal or interpreted patterns (e.g., ANSI C) and those are characteristic of such +things as regular expressions. Regular expressions can be either ERE, BRE or PCRE (see more about compatibility and expressions) + +See also about the aforementioned regexp (POSIX) versus PCRE1/PCRE2(original & newish versions), +and how the shell itself uses POSIX/PCRE, versus commands that derive from e.g. coreutils; Shell's builtins and coreutils' +commands can both potentially use PCRE as long as they were compiled with it. It should also be mentioned that many of my +examples use commands from other packages, expecially builtin commands. + +Apostrophes (' ') can often be used to preserve the literal interpretation of characters. A quote begins a sequence, and +will continue a command until it encounters a closing quotation. You can also use a backslash to continue a command. + +Using $'...' (called ANSI C quoting) you explicitly enable interpretation of ANSI-C escape sequences +within the quoted string. In most POSIX-compliant shells, double quotes ("...") enable interpretation of some +escape sequences (such as \n for newline and \t for tab, which we'll explain) + +Some shells provide options to explicitly enable/disable interpretation of escape sequences. + +The (\) escape character acts as a form of a delimeter, but in such a way where the proceeding character (newline or +whitespace) is consumed—at the same time, it can be used for control sequences, or even command continuation (in the +same way that starting a cmdline command with a single quoted string, and going to the next line without completing +the second quote will tell the program to look for that second quote onto the next line (until its found) + + echo "Hello, \$USER! Today is \`date\`." + +And its an example of using backslash expansion in order to escape special characters. +Using printf with backslashes lets you interpret escape sequences (see more on literal versus interpreted patterns) + + name=$'hello\nworld' + printf "%s\n" "$name" + +\n , \e , etc. are examples of escape sequences. It means it'll start interpreting the backslash as +escaping or doing something (such as a control character) given whatever the proceeding character is... +Introducing the bracket \e[ means that the beginning of an escape value has begun + +\n within double quotes or $'...' indicates a newline character (we'll talk about them some more... As well as +discussing more about left bracket ` [...`, as its essentially the same as writing `if test...`, in conditional +expressions. + +Single brackets require the use of escaped parentheses \( and \) ... in order to group conditions, which can make the code +harder to read and more error-prone. Within the double bracket syntax for conditionals, i.e. [[...]] you don't need to worry +much about quoting variables. For instance, [[ $var = value ]] won't break if `$var` is empty or contains spaces. + +Double bracket also supports additional operators, such as (=~) for regex matching, and has more intuitive syntax for logical +operators. This support extends to complex expressions like [[ -f $file && -r $file ]] (more on the flags later) + +Moreover, mixing operators like -r (flag that checks if a file exists and is readable by the current user) or -o (for OR operator) +can lead to ambiguous expressions if not handled correctly; So its often preferable to use `[[` in these situations... +Brackets are of course specific to evaluating specific conditions, like whether a file exists, whether a variable equals +a specific value, or whether a string matches a pattern. however you can freely create statements without brackets too. + +When you don't use brackets in an if statement, the shell evaluates the exit status of a command directly. If the command exits +with a status of (0) ,which indicates success, the if block is executed. If it exits with a non-zero status (indicating failure), +the else block (if present) or the proceeding statement, is then executed; More on if statements later. + +The backtick symbol (`) is a legacy form of command substitution, and it functions similarly to $(...). When you surround a command +w/backticks, the shell executes that command and then replaces the command with its output, e.g. echo `uname -s` + +This will execute uname -s and replace the command with its output. if you use backticks around a command substitution, such as +surrounding it like $(command), then the shell will treat it as a nested command substitution, where the inner command substitution +is executed first, and then the result of that is treated as a new command, which is then executed. Also if you try to nest backticks +within backticks, you must escape the inner ones with a backslash. + +A colon (:) serves as a delimiter that separates multiple directory paths, such as in the case of the $PATH variable, +which has a colon-separated list of directories that the shell searches through when looking for executable files in +response to a command. + +When you type a command in the shell, the system checks each directory listed in $PATH in order, until it finds an +executable file that matches the command name. If the command is found, it is executed; if not, the shell continues +to the next directory in the list. If none of the directories contain the executable, the shell will return an error +indicating that the command was not found. + +The semicolon (;) serves as a command separator. It allows you to write multiple commands on a line, as the shell will +encounter a semicolon and interpret it as the end of the current command-preparing to execute the next command that follows. + +You might also see (%) symbol used inside a control sequences as a format specifier, which denotes some operation, +that might include variables and arithmetic operations, making sequences dynamic. It makes those strings it +appears in parameterized (parameters or variables can be changed). + + %p1, %p2, etc. Refer to the first, second, etc., parameters passed to the capability string. + %d: Print the parameter as a decimal number. + %c: Print the parameter as a character. + %{5}: Push a constant number 5 onto the stack. + %+, %*, %m, etc. Performs arithmetic operations using the top elements of the stack. + %=: Compare the top two stack elements for equality. + %>, %<: Compare the top two stack elements for greater-than or less-than. + %!, %~: Perform logical negation or bitwise NOT. + %?...%t...%;: Conditional operations (if-then-else structure). + %P{variable}: Pop the top value from the stack and store it in a variable. + %g{variable}: Push the value of a variable onto the stack. + %{number}: Push a constant number onto the stack. + %i: Increment the parameters (typically used for converting 0-based indices to 1-based indices.... + that simply means 0-based starts at 0, 1-based starts at 1, and %i would be used to convert one to the other) + +In the context of pattern matching you have: character classes, anchors, escape sequences and assertions (which we'll go over) + +Assertions are zero-width conditions, meaning that they do not consume characters in the input, but rather +assert specific conditions around a match; The most common being (?=...) which checks if the pattern inside the +lookahead assertion can match at the current position in the string (as opposed to looking before the position) + +ls, find and grep are good examples of commands to get started w/ + + ls is used to list files and directories. + +help command to view the help page for a command help help for help -options +and man command to view a man page, man man traverse page; e, f, z, d, PgDn +y, b, w, u, PgUp info command to view a command in stand-alone info pages. + +In the following examples I'm going to show how wildcards are used in different places. Wildcards and pattern substitution +(patsubst specific to makefiles) can be used w/ a string and the symbol itself is replaced by a space-separated list of names +of existing files that match one of the given file name patterns (try saying that five times faster). + +If no existing file name matches a pattern, then the pattern is omitted from the output of the 'wildcard' function. +Note that is different from how unmatched wildcards behave in rules where they are used verbatim rather than ignored. +More simply, using an asterisk matches any number of characters. + +So the shell expands these wildcards such as *, ?, and [] before passing arguments to commands. + +One use of the wildcard function is to get a list of all the C source files in a directory: + + $(wildcard *.c) + +We can change the list of C source files into a list of object files by replacing the `.c` suffix with `.o` in the result: + + $(patsubst %.c,%.o,$(wildcard *.c)) + +You can of course emulate this in shell, however we're just going over the basic idea. +In Unix systems you'll be running commands alot of the time, so one way i like to remember which order of options proceeds after e.g. +find is by rememebering these keywords (mneumonic): FIND PATH TYPE NAME + +Keep in mind -iname "example" will not look for joined names such as "anexample", however you can solve this when using +* in a pattern. + + find / -type f -iname "*thisword*" + +Basically we wrote look from the / directory of type filename for case-insensitive name "*thisword*", +where the * is to enable globbing, before and after the substring. In regular expressions a . dot is the pattern +which matches any single character~combined with the asterisk operator in .* and it will match any number of any characters. + +find does a recursive search on any file or path in quotes, provided that the expression is successfully matched. +There is other case-insensitive options such as -ilname -iregex -iwholename.. One more example with find... + + find . -path "./dir?/file*.txt" + +This command will find files with names like "file1.txt", "file2.txt", etc., but only within directories named "dir1", "dir2", etc., +in the current directory. So an asterisk in a globbing pattern will match zero or more characters, while the question mark matches exactly one character. + +The -path option is used to match the entire path of the file or directory against a specified pattern, and doesnt restrict to either~or. + +Pattern matching for words within files is accomplished with grep; + + grep -i "this" script.sh + +Case-insensitive search for 'this' inside script.sh + + grep -nr 'yourstring*' . + +..Recursively search through current directory for string w/ -n (line numbers) + +In BRE, matching patterns with exact repetition like three consecutive `a` characters can be accomplished by directly specifying the characters, +such as `aaa`. In contrast, ERE allows for more precise control using `{}` quantifiers, where `a{3}` matches exactly three consecutive `a` characters. + + grep -E '^[0-9]-[0-9]{3}-[0-9]{3}-[0-9]{4}$' file + +This command uses `-E` to enable Extended Regular Expressions (ERE). In ERE: +. (period) Matches any single character except newline. +^ (caret) asserts the pattern must match at the beginning of a line. +$ (dollar sign) Matches the end of a line. +[] (brackets) Match any single character within the brackets. Example: [abc] matches "a", "b", or "c". + Or w/ a caret i.e. [^a-z] matches any character that is not a lowercase letter. +() (parentheses) Group expressions and capture matching text. Example: (abc)+ matches "abc", "abcabc", etc. +{} (curly braces) enables specifying exact repetition counts of characters or character classes. ++ (plus) quantifier, indicates "one or more occurrences" of the preceding element, such as a + character, character class, or group. +| (pipe) represents alternation, allowing matching of either of two patterns. +* (asterisk) Matches 0 or more of the preceding element. Example: a* matches "", "a", "aa", "aaa", etc. +.+ (period, plus) pattern matches any line with at least one character. Example: echo -e "Hello\nworld\n\nfoo\nbar" | grep ".+" + will produce Hello world foo bar on separate lines, not matching and showing the empty string in between world and foo +? (question mark) is a quantifier that matches zero or one occurrence of the preceding + element. +\b (backslash+character) Matches the position between a word and a non-word character. + Example: \bword\b matches "word" if searching for "a word of warning". + +Most Unix text facilities are line-oriented that search for patterns spanning several lines. +The end-of-line character $ is not included in the block of text that is searched. It is a separator, and regular +expressions examine the text between the separators. If you want to search for a pattern that is at one end or the other, you use anchors. + +Caret ^ is the starting anchor. The regular expression ^A will match all lines that start with an uppercase A. +The expression A$ will match all lines that end with uppercase A. If the anchor characters are not used at the proper end +of the pattern, they no longer act as anchors; That is, the ^ is an anchor only if it is the first character in a regular expression. + +Dollar sign $ is an anchor only if it is the last character. If you need to match a ^ at the beginning of the line or +a $ at the end of a line, you must escape the special character by typing a backslash \ before it. + +For example, if you want to ensure that a pattern matches exactly, you can anchor it to start and end of the string. Since logs often have multiple +fields, using exact boundaries with `^` and `$` might not be practical, so word boundaries, contextual matching or special sequences may be necessary. + +Try exploring what every symbol is for, and what its significance is within the context of the shell/regexp and pattern matching respectively. + +Every now and then you'll end up in a less than desirable situation, where you either have a crashed/frozen terminal session, or you may have +accidentally pressed a sequence of keys such as Alt+op+Backspace which can causes the cursor to start writing/erasing into the prompt; +although you should be able to press Ctrl+C to interrupt, Ctrl+D for End-of-file signal, or Ctrl+Z to suspend currently running process. +You can use Shift+Alt+C , Ctrl+Alt+Fn2 or Ctrl+Alt+Del tty session, where you can run kill or killall from there. + +Looking back at `$`, we know that it has other functions, serving multiple purposes, specifically in the case of variables and how they're interpreted. + +$1,$2..$9 are known as Positional Parameters, special variables that store the arguments passed to a script or function; +With emphasis on parameter as they take on the value of the corresponding parameter. The $ sign is part of the syntax. +The number that follows indicates the position on the command line. $0 represents the actual name of the script. +$1 indicates the first parameter. $2 indicates the second parameter and so on. +Here's another example in the context of a script; + + echo "param = $1" + echo "param = $2" + +or passed in as an argument; + + ./testfile 4 5 + +For positional parameters beyond $9, you need to use braces, such as ${10} for the tenth parameter. +$* Asterisk is similar to the filename meta-character, in that it matches all arguments. All positional parameters ($1, $2, $3) are +concatenated into a single string separated by spaces. + +$@ is similar to $*, except it retains the spaces found in the variable. It expands each positional parameter as a separate quoted string. + +$# is equal to the number of arguments passed to the script. + +$$ variable corresponds to the process ID of the current shell running the script. Every process has a different identification number. +This is useful when picking a unique temporary filename. The following will select a unique filename, use it, then delete it; + +$! indicates process ID of the process executed with an ampersand, an asynchronous or background process. +You do something else and wait for a background process. + +$- corresponds to certain internal variables in the shell. + +$? equal to the error returned from the previous program. The shell keeps track of the exit status of the last command executed +in a special variable (referred to as $?) This variable is updated automatically by the shell every time a command or script finishes executing. +So when you execute a command or script, the shell runs it and waits for it to finish, and once its completes, the shell captures the exit status +(a numeric code returned by the command) and stores it in the $? variable. + +When you want to do input or output to a file, you have a choice of two basic mechanisms for representing the connection between your +program and the file: File descriptors and Streams. File descriptors are represented as objects of type int, while streams are represented +as FILE * objects. Both file descriptors and streams can represent a connection to a device (such as a terminal), or a pipe or socket for +communicating with another process, as well as a normal file. Each Unix process has three standard POSIX file descriptors, corresponding +to the three standard streams: standard input (stdin(0)), standard output (stdout(1)), and standard error (stderr(2)). They can be used for a +file or other I/O resources such as a pipe. + +The (|&) operator is commonly referred to as the pipe-and-error operator. Its a shorthand way to pipe both the +stdout stderr of the command (from the left) into the command on the right. + +UNIX comes with two programs called true and false, "exit 0" and "exit 1-255". These are known as an Exit status, with integers from 0 to 255. +The shell can either examine the integer value of an exit status, or treat the value as a boolean. Zero is true (successful), all other values +are false. If you do not provide an exit status, the system returns with the status of the last command executed. + +Operators take standard input or standard output, and also return an exit status. + + cat << testhello + > Hi!vehello + > Hollow World + testhello + +testhello on the last line, acts as a delimeter of this script. + +‘>‘ symbol is used for stdout redirection, where as `<` is for stdin redirection... + + ls -lap > /testfile + +This will redirect and REPLACE the output from ls , however you can append to the end of a file without replace; + + ls -lap >> /testfile + +I think it helps to see the whole process here to better understand it... + + sort < input.txt + +Before executing the command, the shell opens the file input.txt for reading using a system call like open(): +which is something like int fd = open("input.txt", O_RDONLY); + +The shell then needs to make the stdin file descriptor (0) point to the same open file as input.txt. +This is done using the dup2() system call, wherein it also executes `close(fd);` + +The shell forks a new process using fork(), and the child process inherits the modified file descriptors. + +After setting up the redirection, the shell executes the `sort` command. The execve() system call or a similar call +is used to replace the current process image with the new command: + + execve("/usr/bin/sort", ["sort", NULL], envp); + +`sort` then processes the input from input.txt and produces sorted output, which is finally sent to stdout. +The same thing is true of (>) stdout redirection too, for `open()` (but w/ O_WRONLY), file descriptor (1), +forking, command execution, exec system call and processing of the final output. You can look more into file +descriptors to learn about how they work in converse situations. + +Of course using the `strace` command, you might see something different (this is just the basic explanation) +The initial execution might look like: + + execve("/usr/bin/sort", ["sort"], ...) = 0 + +And this shows that `sort` was executed, and its at this point where the `sort` binary is loaded and executed. +Several system calls related to loading libraries (openat(), mmap(), read(), etc.) are seen, which is normal +and involves setting up the environment for the sort command to run. + + fstat(0, {st_mode=S_IFREG|0644, st_size=49, ...}) = 0 + +And `read(0, "...")` indicate that `sort` is reading from file descriptor 0, which is stdin. +This is where `input.txt` contents are being read. + + `read(0, "plum\ngooseberry\n...", 4096)` shows the data read from input.txt +`write(1, "...")` shows the sorted output being written to file descriptor 1 (stdout), that the output is +sorted and written line by line. And the close(0), close(1), and close(2) calls at the end indicate that +the file descriptors for stdin, stdout, and stderr are closed when sort completes. + +note: `open()` as well as `dup2()` and `fork()` are managed by the shell as part of preparing the environment +for sort before the command runs, and thus the strace output only shows what happens within the context of +the command and not the file redirection setup being done by the shell. + +Evaluation, pipelines, in this example; + + cmd1 ; cmd2 ; cmd3 ; cmd4 + cmd1 & cmd2 & cmd3 & cmd4 + cmd1 && cmd2 && cmd3 && cmd4 + cmd1 || cmd2 || cmd3 || cmd4 + +Semicolon tells the shell to operate sequentially. First "cmd1" is executed, then "cmd2," etc. Each command starts up, and runs as long as they +don't need input from the previous command. The & command launches each process in a detached manner. The order is not sequential, +and you should not assume that one command finishes before the other. The last two examples, like the first, execute sequentially, as +long as the status is correct. In the && example, "cmd4" is executed if all three earlier commands pass. +In the || example, "cmd4" is executed if the first three fail. + + cat wordoc1.txt | cat wordoc2.txt + + cat wordoc1.txt || cat wordoc2.txt + +Main difference between the two being, when the first command is not recognized it will terminate, where as the double is +used as comparison (on failure of the first command and ignores the second). So the comparison is unsuccessful and it runs the second command. + +The technicality however of the first example is just how we described operators by specified order of evaluation, +or the manner in which commands are processed. + +/proc directory refers to processes currently running. Lets print a list of registered interrupts on the system. + + cat /proc/interrupts + +An interrupt is a signal emitted by a device attached to a computer or from a program within the computer. It requires the OS to stop and +figure out what to do next. An interrupt temporarily stops or terminates a service or a current process. Most I/O devices have a +bus control line called Interrupt Service Routine (ISR) for this purpose. + +Using the shell interactively, or one that is from user and not running from a file, reads out of stdin. Without an argument this is the +shell's behavior; -s forces shell to read stdin for commands. Normally, the shell checks standard input, and checks to see if it's +a terminal or a file. If it is a terminal, then it ignores the TERMINATE signal, which is associated with signal zero in the trap command. + +Also, INTERRUPT is ignored. However, if the shell is reading from a file, these signals are not ignored. The -i option tells the +shell to not ignore these traps. -p unallows changing of the effective user and group, to whomever is the real user and group. + +Earlier we mentioned how you can use if statements with or without brackets: + + if grep "some pattern of words" file.txt; then + echo "pat found" + else + echo "pat not found" + fi + +`grep "pattern" file.txt` is a command that searches for "pattern" in file.txt. The if condition checks the exit status of grep. +If grep finds the pattern, it returns 0, and the "pat found" message is printed. If it doesn't find the pattern, it returns +a non-zero status, and the "pat not found" message is printed. + +Relational operators compare two values and always print a "0" for false or "1" for true. +Options can also be relational operators used in comparison: + + == Equal to -eq + != Not equal to -ne + > Greater than -gt + < Less than -lt + >= Greater than or equal to -ge + <= Less than or equal to -le + +Basic example of an if statement w/ single brackets: + + if [ "$(id -u)" -eq "0" ]; then + echo "This script is running as root" + elif [ "$(id -u)" -eq "1000" ]; then + echo "This script is running as a regular user with UID 1000" + else + echo "This script is running as a different user" 1>&2 + exit 1 + fi + +id -u flag is used to check the user id of the user who is running the script. And if the user id is not 0 that means the user is not root +and the script will print the else statement. The 1>&2 is used to redirect the standard output to the same place as standard error, making it +appear as an error message. It can be useful when you want to ensure that certain messages are treated as errors by scripts... +Likewise 2<&1 is used to redirect stderr to the same place as stdout (file descriptor 1) + i.e. (stdin=0, stdout=1, stderr=2).. Exit 1 will exit the script with a status code of 1, a failure. +And if we do not use exit 1, the script will exit with a status code of 0 (success). You can remember it +by going in 1,2,3 order as "In, Out, Err".... + +Similarly, -f checks if a file exists and is a regular file. -d checks if a directory exists and -s checks if a file is not empty. +Further more, -r , -w and -x are for checking whether a file is readble, writable or executable. +You can find a comprehensive list of test operators in the manual. + +/dev/ directory contains device files or nodes, and they are created dynamically during installation by udev (a device manager which also removes device +files, e.g. during a hardware disconnection) It replaces the need for a static MAKEDEV script. /dev/null is a special file that discards all data +written to it, and is commonly used to suppress output. + +Let me demonstrate common examples. The first example only redirects stdout to /dev/null, that way the output of stdout is discarded and stderr remains: + + command > /dev/null + +For redirecting stderr to /dev/null, which discards it, and stdout remains: + + command 2> /dev/null + +For redirecting both stdout and stderr to /dev/null, which discards all output from the command: + + command > /dev/null 2>&1 + +It is often the case you can use the logical operators we discussed + && & || ; +in place of this type of if statement, as they share similar behavior. + +When you use redirection (>, >>, etc.), the output goes to one destination (a file or another file descriptor), but it cannot simultaneously split +to multiple locations (such as a file and standard output). + +The `tee` command writes to multiple locations, that is it reads from stdin and sends the output to both a file and stdout. Redirection alone cannot +achieve this because it's a one-to-one mapping (only w/ the `open` side of system calls and not `write` and `close`), as we demonstrated earlier w/ the +operators that take stdin or stdout, which are redirecting input or output, a single destination; It should also be noted that its uncommon to use input +redirection directly within file descriptor manipulation, as that's typically the role of redirection operators. + +Our next interest has to do with accessing arrays. You can assign values to specific indices, e.g. array_name[0]="value1" +To access a specific element of the array, you use the index in square brackets: `echo ${array_name[1]}` + +In shell scripts, arrays are 0-based. Here’s how you work with arrays: + + arr=("apple" "plum" "gooseberry") + echo "First element: ${array[0]}" + echo "Second element: ${array[1]}" + echo "Third element: ${array[2]}" + +Later we will demonstrate a situation where you have to convert a 1-based index into a 0-based. + +Associative arrays allow you to use strings as indices instead of just numbers, which is useful for storing +key-value pairs in situations where you need to map keys to specific values (this is a dummy example) + + typeset -A fruit_colors + + fruit_colors[apple]="red" + fruit_colors[plum]="purple" + fruit_colors[gooseberry]="green" + + echo "The color of an apple is ${fruit_colors[apple]}" + echo "The color of a plum is ${fruit_colors[plum]}" + echo "The color of a gooseberry is ${fruit_colors[gooseberry]}" + + # Iterate over all keys + for fruit in "${!fruit_colors[@]}"; do + echo "The color of $fruit is ${fruit_colors[$fruit]}" + done + +Typeset allows you to give variables specific attributes, like making them readonly, integer, etc. +The -A option specifically tells typeset that the variable is an associative array. `[@]` is used to reference +all elements of the array... + +Basic structure of a case statement + + FRUIT="plum" + case "$FRUIT" in + "apple") echo "Tasty." + ;; + "plum") echo "Yummy plummy." + ;; + esac + +FRUIT was equal to plum so we got back Yummy plummy. You can also use `*)` as a "default case" (wildcard case), +which acts as a catch-all matching any value not explicitly handled by other patterns in the case statement. +This type of delimeter we use (;;) or double semicolon is specific to case statements, in order to terminate +(signal the end of) each pattern block. + +`let` is an important keyword as well, as it allows you to perform arithmetic operations +directly on variables, e.g. + + let result=a+b + + + + expr, is used to perform arithmetic as well, e.g. + expr 1 + 1 expr 2 \* 2 expr 3 / 3 + +Note: The print $((...)) syntax is another way to perform arithmetic in the shell, and it is more +straightforward and flexible than using expr or let; The difference between expr and let +being that expr requires variables be referenced w/ a dollar sign, i.e. result=$(expr $a + $b) +For example, here's expr in a script: + + string="Hi!veHollow" + n1=15 + n2=7 + + len=$(expr length "$str") + echo "length of the string \"$str\" is: $len" + + differ=$(expr $n1 - $n2) + echo "difference between $n1 and $n2 is: $differ" + + # Extract a substring using expr (from position 2, 4 characters long) + substr=$(expr substr "$str" 2 4) + echo "substring of \"$str\" starting at position 2 with length 4 is: \"$substr\"" + +The `length` keyword is specific to `expr` and its function in manipulating strings, that is +The length operator is used to determine the number of characters in a string. + +You can use parameter expansion w/ the `#` feature to get the length of a string without needing expr + + str="Hi!veHollow" + len=${#str} + echo "The length of the string \"$str\" is: $len" + +`${#str}` is a shell built-in that directly gives the length of the string that was stored in "$str" + +When you use a double hash symbol (##) in parameter expansion, it performs the longest match removal +of a pattern from the beginning of a string. Here’s a quick example to illustrate this: + + filename="archive.tar.gz" + basename=${filename##*.} + echo "The basename is: $basename" + +The ## is used to remove the longest matching pattern from the beginning of the string. The pattern +` *. ` will match everything up to and including the last period (.) in the string "archive.tar.gz" +subsequent output being `gz`... + +The `dirname` command is specifically designed to remove the filename from a full file path, +leaving just the directory path. In this way, its equivalent to using parameter expansion w/ +`%` symbol which removes the shortest match of a pattern from the end of the string. + + filepath="/home/user/Documents/archive.tar.gz" + dirpath=$(dirname "$filepath") + echo "The directory path is: $dirpath" + +Theres other features available in parameter expansion too; Please see your shell's manpage. +Returning to substrings, we can also go as far to create a custom substring like this... + + substr() { + local str="$1" + local pos="$2" + local len="$3" + echo "${str:$((pos-1)):len}" + } + + # Call the custom function + substr "Hi!veHollow" 2 4 + +The local keyword in shell scripting is used to declare variables with a scope limited to the +function in which they are defined. This means that variables declared with local are only accessible +within that function and are not visible or modifiable outside of it. + +Substring extraction `${str:$((pos-1)):len}` goes by the following recipe: + + ${variable:start:length} + +`length` is obviously the number of characters to include in the substring. +`variable` is the variable containing the string (str in this case). +`starting` is the starting position of the substring; but shell parameter expansion (meaning +`${variable:start:length}` ), is 0-based, therefore when specifying "start", you need to convert a +1-based index to a 0-based index. Thus our expression $((pos-1)) converts a 1-based index pos +into a 0-based index suitable for shell parameter expansion, and is thus responsible for +calculating the starting position for the substring of our original example. + +If you don't convert a 1-based index to a 0-based index when using shell parameter expansion, +the shell would misinterpret the starting position, resulting in incorrect extraction from there. + +Functions may be expressed in this way, though they're a bit particular about character placement, +newlines, spacing and indentation although you can enforce your own tabulated construct + + apple(){ + A=$(expr $A + 1) + } + A=1 + while ["$A" -le 10] + do + echo $A + echo 'apple!' + apple + done + echo 'we got ALOT of apples' + +note, that that parentheses after the function name are are purely syntactical and do not +serve any functional purpose other than indicating that- *what follows is a function* +also notice that all variables are treated as strings and otherwise the shell will +perform said type conversion as needed, and based on the context that its in: + + str_var="42" + echo "As a string: $str_var" + result=$((str_var + 8)) + echo "As an integer, after arith: $result" + +For Loop example... here's an description/ingredients of a working for loop: + + for NAME [in WORDS ... ] ; do COMMANDS; done + +Execute commands for each member in a list. The 'for' loop executes a sequence of commands for each member in a +list of items. If 'in WORDS ...;' is not present, then 'in "$@"' is assumed. For each element in WORDS, +NAME is set to that element, and the COMMANDS are executed. + +Regarding parentheses and braces: ( ) and { } are analogous in some ways i.e. variables, +expansion, nesting, however they differ in a variety of ways; most simply, parentheses work on numbers, +commands as well as subshell execution. subshell refers to $variable where as the other +works by user-defined conditions/errors or groups. Square brackets [ ] are similar to parentheses +except you'll see it used with conditions, arguments and expressions as opposed to environment variables, etc. + +A subshell is a child process launched by the current shell so that you can run a series of commands in +a separate process. And in shell scripting you create this subshell by enclosing commands in parentheses (). +A subshell inherits the environment variables of the parent shell at the time it is created. + + (subshell command1; subshell command2) + +When you use a variable in the shell, it's accessible within the current shell process and can be inherited +by a subshell. However, changes to variables within the subshell do not affect the parent shell's environment. + + current_date=$(date) + echo "Current date: $current_date" + +In this example, the date command runs in a subshell, and its output is captured and assigned to the current_date variable. +Exporting a variable ensures that the variable is available in the subshell as well as the parent shell... + + export parent_var="I am in the parent shell" + + # Start a subshell + ( + echo "Subshell: $parent_var" + # Modify the variable in the subshell + parent_var="I am modified in the subshell" + echo "Subshell modified: $parent_var" + ) + + # Back in the parent shell + echo "Parent shell: $parent_var" + +The modification still doesn't affect the parent shell, that is, changes made to a variable inside the shell are local to the subshell. +The parent shell remains unaffected by any modifications that occur within the subshell. More generally put, the subshell operates with +its own copy of the environment variables. If you want to learn more about the shell, please read your shell's manpage for further info +on any remaining commands, arguments, syntax, rules and other behaviors I may have missed. + + diff --git a/std.html b/std.html new file mode 100644 index 0000000..359e967 --- /dev/null +++ b/std.html @@ -0,0 +1,15 @@ + + + + + +standards + + + + +- The official website of the ISO/IEC JTC1/SC22/WG14 which is the working group responsible for the standardization of the C programming language. Here, you can find documents, drafts, and updates about the C standard. + +- This site, often referred to as the C Programming Language Home provides resources and links related to the C language. It's a good starting point for finding tutorials, compilers, and other information related to C. + + diff --git a/struct.html b/struct.html new file mode 100644 index 0000000..ccd4fdd --- /dev/null +++ b/struct.html @@ -0,0 +1,303 @@ + + + + + +struct + + + + +
    struct
    +In C, structures are user-defined data types that group related variables. +in technical terms they would be described as contiguous blocks of memory, +wherein the fields (members) are accessed using offsets. + + +---------------+ + | x | y | + +x is at offset 0 to (e.g. 3), 4 bytes within the struct (the start of the struct) +y is at offset `sizeof(int)`, presumably 1 byte, afterwards. +Padding may be added after y to align the structures size. +In this simple case, the structure could align itself naturally to +the largest member's alignment requirement. Otherwise the padding after y +would be 3 bytes to align the total size to the next multiple of 4. + +structs can store any data type and +you can create them like this + + struct example { + int x; + int y; + }; + +using a struct within a function might look like + + void function(){ + struct example newname; + struct example *something = &newname; + } + +You can then declare a variable (newname) in this case +that points to x or y in the original struct, and same for +(something), which it now equals the address of member. + +When you see this `Object.Member, its referred to as a Member that is +being accessed through the pointer to Object... you use the dot when there's +direct access (used with a struct variable and not a pointer) and the (->) +arrow operator when its indirect (i.e. when it's being accessed from a pointer already) + +`newname.x` could be declared w/ direct access, and `something->x` would have +indirect access, seeing as its two levels away from `example` + +Because arrow operator is used to access members of a struct through a pointer +to that struct, its combining a dereference of the pointer with accessing of the struct +member all in one step. + + List *this = &that; + this->num = 6; + strcpy(this->name, "Activity"); + +Whereby `this->num` is equivalent to `(*this).num`, and the same is true for it in +the proceeding line. In other words, it makes the syntax slightly less involved, +and slightly more specific in regards to indirectly accessing through this member. + +For further reading on more complex situations, such as those that arise when you want to access +things from multiple structs; Although you can always +return to it once you've gotten more familiar w/ using structs and pointers... + +Lets try simply declaring a struct now: + + struct example newA; + +Here's how you might declare and access members of structures, +this time w/ an array using the dot (.) operator. + + struct example BarrF[3]; + BarrF[0].x = 101; + +`BarrF` is the name of the array. It's a variable that holds a collection of +three `example` structures (from the struct we created in the beginning). + +so `BarrF` will hold objects of type `example`. hopefully thats not too +confusing to understand. anything you dont understand you can revisit again. + +you can construct a structure as an anonymous struct with a tag, but... + + struct { + int x; + int y; + } example; + +we couldnt directly use typedef to create an alias for an unnamed or +anonymous structure. Therefore, consider the struct we made was named +`struct example {...`, using typedef we can create an alias for the struct: + + typedef struct example example; + +lets you declare w/ + + example ex; + +as opposed to reusing `struct example ...`, since struct example is a named +structure, `typedef struct example example;` creates an alias "example", +which you can use to declare variables. + +and since you've created a typedefined name, it can be used as +a type of some member in a new struct. Note, this is also a common +way typedefined structs are created: + + typedef struct { + example whatever; + } demonstrate; + +and since youve already included `typedef`, you can omit the `struct` keyword +in the declaration as we had mentioned. + + demonstrate newA; + +or you can access and manipulate its members, like `newA.whatever`, just as +you would with any other struct variable + +typedef also means that the type (in this case `demonstrate`) +is available for use as a "type of function".. + + demonstrate CreateFunction(){ + + } + +`typedef` in a struct definition not only creates a type alias +but also makes the struct name visible in a broader scope, and +designated intializers provide a way of explicitly +intitializing struct members... this is considered as, +"default order initialization" + + struct test { + int a; char b; + } data = {10, 'c'}; + +and this one is "explicit order initialization"... + + struct test data = {.b = 'c', .a = 10}; + +note, that these are not specific to structs. +now we should see what it looks like in context: + + struct Ext { + int x; + char *y; + }; + + int main() { + struct Ext t = {.x = 3, .y = "word"}; + printf("%d, %s\n", t.x, t.y); + return 0; + } + +you can do alot with these. continuing on, lets return to the +beginning of what we learned... + + struct whatever { + int x; + int y; + }; + +if you want you can do... + + struct whatever points[] = { + {10, 20}, + {30, 40}, + {50, 60} + }; + +and you can then declare/access the array elements like normal. +p.s. if you want you can even make it an anonymous structure array, +but lets look at several different examples. first we have... + + struct whatever { + int a; + int b; + }; + + void plop(struct whatever p) { + } + +youve already seen a compound literal before, where the struct literal created +this way is unnamed and temporary. now that example was just for demonsrtation, +lets take a look at another example.. humour me on this.. + + void plop(struct whatever p); + +compound literals can be used within function calls, both in the actual function +call and as part of the function prototype or declaration when explaining their usage. +Lets demonstrate by calling `plop`, initializing its members to "1" and "2"... + + plop((struct whatever){1, 2}); + +In `{...}` is where the members would presumably be. the temporary struct object +`whatever` only exists within the scope of the expression. + +but lets see one more example to be sure (for conventionary reasons, we should use +tagged anonymous structs) + + typedef struct { + int x; + int y; + } point; + + int main() { + point p = {10, 20}; + } + +now that we've seen three examples of this in a row, surely you got the idea. +lets go back to basics; lets suppose i have a struct that i want to declare. + + typedef struct STable *S + +because it is a pointer, you should utilize pointer notation (i.e., -> instead of dot) +when accessing its fields. however if it wasnt a pointer then you would refer to those +instances of it directly with the (.) dot. Also, because `S` is already a pointer type +(struct STable *), you don’t need to add another pointer (*) in the declaration of newMember. + + struct NewStruct { + S newMember; + } + +Continuing on, if you learned about function pointers... + + typedef struct { + const char *group_name; + void (*action)(); + } AutoGroup; + +then here is something cool you can do now: + + AutoGroup auto_groups[] = { + {"group1", action1}, + {"group2", action2}, + {"group1", action3} + }; + +in other words, function pointers let you omit the `(int *)` or type that +precedes the given variable and have a clean-looking string pair like this +(or whatever other way you want) + +Theres some common conventions i find myself reusing, such as... + + typedef struct { + Example *d; + // ... + } AppContext; + + AppContext App = { + NULL, /* whatever */ + // ... + }; + +Declaring and assigning struct members, i often declare `Example *d;` in a function, +And later assign that variable to an object of the same name, `d = App.whatever;` +or directly with `Example *d = App.whatever; + +Theres many of these common conventions that we could get into. +An enum is for when you want to create, whats called "enumerated constants" + + enum Alphabet { A, B, C }; + +they define a set of named integer constants, a collection of related values like +states or options. + +There's also unions which use less memory, however only one member of its +allocated memory is used as it assigns one common storage space for all its members + +In C, there's no true nested definitions allowed in a function, unquestionably. +but regarding nested structs within unions... + + struct s {double i;} f(void); + union { + struct { + int f1; + struct s f2; + } u1; + struct { + struct s f3; + int f4; + } u2; + } g; + + struct s f(void) { + return g.u1.f2; + } + /* ... */ + + g.u2.f3 = f(); + +The behavior is defined. see here http://www.lysator.liu.se/c/tc1.html + +Regarding anonymous structs (anon unions too), c does support them, with some considerations... +anonymous unions must be declared within a containing struct or union, and +members of the anonymous struct or union are accessed directly as if +they were members of the containing struct or union. + +More links on C Standards are here +next, see macros + diff --git a/style.js b/style.js new file mode 100644 index 0000000..ba18d5a --- /dev/null +++ b/style.js @@ -0,0 +1,61 @@ +/* style.type = 'text/css'; // MIME type */ +/* @syntax: asterisk (*) is for wildcard selectors */ +/* @remove: .no-wrap { white-space: nowrap; } */ +/* @remove: .no-wrap*{ white-space: inherit; } */ +/* @fix: user-select and text-decoration if broken */ +/* reset initial values=not working, in every day css fashion*/ +/* so im using javascript instead */ +// Creating a