03. The 8‐bit and 32‐bit instructions

Last time we looked at how mod/rm encoding works for 16-bit instructions, but in x86 there's than just 16-bit instructions. In 8086 we have 8-bit and 16-bit instructions. In 80386 we also get the ability to use 32-bit instructions, even in 16-bit mode by using data and address override prefixes. In this article I'm going to be describing how these prefixes work and we'll further extend our table format.

Instruction prefixes

Preceding the opcode, there can be a few bytes of instruction prefixes, which affect how instruction is interpreted and sometimes parsed. Complete list of instruction prefixes is presented below.

Prefix	Value
LOCK	`0xf0`
REPNZ	`0xf2`
REPZ	`0xf3`
CS override	`0x2e`
SS override	`0x36`
DS override	`0x3e`
ES override	`0x26`
FS override	`0x64`
GS override	`0x65`
Branch taken	`0x2e`
Branch not taken	`0x3e`
Data override	`0x66`
Addr override	`0x67`

The prefixes REPZ and REPNZ only apply to instructions that support them. Sometimes these prefixes are used to extend the opcodes (SSE and above).

However, we're inly interested in Data size override prefix and the address size override prefix.

Data size

As you may remember from the previous article, the instruction encodings use 3 bits to represent a register number. These 3 bits could be found within the opcode, or within the mod/rm byte. But there's nothing about the size of that register that we're currently encoding. Here's how the size is decided:

By default an 808386 CPU runs in either a 16-bit mode or a 32-bit mode.
If we're in 16-bit mode, the default register size is 16, and similarly, if we're in 32-bit mode, the default register size is 16.
If an instruction in either mode has a data size override prefix, the register size for that instruction changes to the other one (i.e. to 32-bit in 16-bit mode and vice-versa)

It's helpful to think about this "default register size" as the "data size", and inside my disassembler I use the ds variable to represent the current data size. This variable is initialized according to the CPU mode at the beginning of the decode cycle, and if the prefix is found, it changes to the other size.

The data size affects more than just the register though. It affects all data. Immediate values, the size of memory reference (if you've seen something like dword [rax+rcx], that "dword" part is dictated by the data size. However, data size doesn't affect register sizes inside of memory references. This means that the following behavior will be present:

Without data override	With data override
`mov ax, word [bx+si]`	`mov eax, dword [bx+si]`

Here's an example of how we're going to be using the data size override prefix to compress the table:

Encoding	Mnemonic	New encoding
`8b /r`	mov r16, r/m16	`mov 8b /gr`
`8b /r`	mov r32, r/m32	`mov 8b /gr` (same thing)

So we can see that we don't need to overspecify the encodings for data-size-controlled instructions. They were, in fact already handled by our previous encoding table.

Let's revisit our encoding table.

Encoding	Note
`mov 89 /gr`
`mov 8b /gr +d`
`mov b8+ imm16`	(*)
`mov a1 disp +rx=ax +d`	(**)
`mov a3 disp +rx=ax`	(**)
`mov c7 /0`

If we had just made a simple change to our disassembler where registers are scaled according to the data size, all instructions we've defined here would work. Encodings marked with (*) would need modification, because right now we're hardcoding a 16-bit immediate, which is not what we want. That instruction uses an immediate that scales off of data size, so it's a 16-bit immediate with 16-bit data, and a 32-bit immediate with 32-bit data. In this instruction we'll replace imm16 with imm.

In instructions marked as (**) we have defined the implicit register argument as ax, which could be misleading. Let's stick to register numbers instead, so +rx=0.

Encoding
`mov 89 /gr`
`mov 8b /gr +d`
`mov b8+ imm`
`mov a1 disp +rx=0 +d`
`mov a3 disp +rx=0`
`mov c7 /0`

32-bit mod/rm

Above I've described how data size override prefix can not be used to override the size of registers inside a memory reference. Well, here's what allows us to use 32-bit registers to address memory: the address override prefix. Well, before we discuss that, however, we need to look at how 32-bit mod/rm works.

Just like before, the 32-bit mod/rm consists of mod, rx and rm bytes. If mod=0b11, the rm specifies a register and holds a register number. The major difference between 16-bit and 32-bit mod/rm is the size of displacement and what encodings are representable. With 32-bit mod/rm the following addresses can be represented:

[disp32]
[base]
[base + disp]
[base + index*scale]
[base + index*scale + disp8]
[base + index*scale + disp32]

If mod != 0b11, the rm field holds the register number of the base register. Just like before, if mod=0b00 and rm is ebp (0b110), there is no base and the effective encoding is just [disp32].

If rm is esp (and mod != 0b11), the mod/rm byte is followed by another byte, called the SIB byte that stores information about base and index registers and scale. The sib byte has a similar format as the mod/rm byte.

2 bits	3 bits	3 bits
ss	si	sb

The are three fields in the SIB byte, in order, scale, index and base. Scale is a log base 2 of the multiplier, by which the index register is scaled in the address representation. Index and base hold register number. If index is esp (0b100), there is no index. If base is ebp (0b101), the base is ebp if mod is 0b00, and 32-bit displacement follows it. For other values of mod, the is no base and there's an additional displacement of 8 or 32 bits according to the value of mod.

Address size

The address size controls which mod/rm byte is used. If the address size is 16, the 16-bit mod/rm is used. Otherwise a 32-bit mod/rm is used. This is all this prefix controls, so it doesn't really affect our table-driving.

Let's return to one of the instructions we've been looking at:

Encoding
`mov a1 disp +rx=0 +d`
`mov a3 disp +rx=0`

Both of these instructions can deal with 16 and 32-bit data. However note that data size does not control how many bytes disp is encoded with. It only controls the size of the pointed value. The actual displacement is controlled by the address size. So we can tell the size of displacement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

03. The 8‐bit and 32‐bit instructions

Instruction prefixes

Data size

32-bit mod/rm

Address size

Clone this wiki locally