Skip to content

Processor Identification

Marco Bortolin edited this page Aug 27, 2018 · 13 revisions

How to distinguish between older 80486 and lower processors by checking for known behaviours. Those behaviours must be correctly emulated in order to create a functional emulator.

Methods and descriptions taken from a document titled "Processor Identification" by Chris Dragan & Chili, except:

Diagnostic programs that use various detection methods:

  • AGSI SysInfo
  • PC CHECK v3.03
  • CheckIt v3/v4
  • Landmark System Speed Test v6.00

SUMMARY

EFLAGS Register

On old pre-286 CPUs, bits 12 through 15 of the FLAGS register are always set, so we can check for this type of processor, in opposition to newer ones, by attempting to clear those bits:

	pushf
	pop		ax
	and		ax, 0fffh	; clear bits 12-15
	push	ax
	popf
	pushf
	pop		ax
	and		ax, 0f000h
	cmp		ax, 0f000h	; check if bits 12-15 are set
	je	_is_an_older_cpu
	jne	_is_a_286_or_higher

Once we know that we are at least on a 286 processor, we can then check to see if we're on a 32-bit processor (386 or higher) or on an actual 286. For this purpose we know that bits 12-15 of the FLAGS register are always clear on a 286 processor in real mode:

	pushf
	pop		ax
	or		ax, 0f000h	; set bits 12-15
	push	ax
	popf
	pushf
	pop		ax
	and		ax, 0f000h	; check if bits 12-15 are clear
	jz	_is_a_286
	jnz	_is_a_386_or_higher

If instead, the processor is running in protected mode these bits are used for the IOPL (bits 12-13) and NT (bit 14) flags. Note that bits 12-14 hold the last value loaded into them on 32-bit processors in real mode. Also remember that there is no virtual-8086 mode on 16-bit processors.

In order to find out if the processor is in real or protected mode we must test if the Protection Enable flag (bit 0 of CR0) is set, if so then we're in protected mode:

	smsw	ax
	and		ax, 0001h	; check if bit 0 (PE) is clear
	jz	_real_mode
	jnz	_protected_mode

To find out if it is a 486 or a newer processor we'll try to set the AC flag (bit 18), since it is always clear on a 386 processor (also NexGen Nx586), unlike newer ones that allow it to be toggled:

	pushfd
	pop		eax
	mov		ebx,eax
	xor		eax,40000h	; toggle bit 18
	push	eax
	popfd
	pushfd
	pop		eax
	xor		eax,ebx 	; check if bit 18 changed
	jz	_is_a_386
	jnz	_is_a_486_or_higher

And finally to check if we're in an old 486 or in a new 486 and other newer processors (i.e. Pentium), we'll try to toggle the ID flag (bit 21) which indicates the presence of a processor that supports the CPUID instruction. This part is explained below in a section about CPUID.

PUSH SP Instruction

Before the 286, processors implemented the "PUSH SP" instruction in a different way, updating the stack pointer before the value of SP is pushed onto the stack, unlike newer processors which push the value of the SP register as it existed before the instruction was executed (both in real and virtual-8086 modes).

Older CPUs        286+
{                 {
    SP = SP - 2       TEMP = SP
    SS:SP = SP        SP = SP - 2
}                     SS:SP = TEMP
                  }

(credit for the PUSH SP algorithm representation goes to Robert Collins)

So all one has to do is see if the values of the SP register are different before and after the PUSH SP:

	push	sp
	pop		ax
	cmp		ax, sp		; check if SP values differ
	je	_is_a_286_or_higher
	jne	_is_an_older_cpu

Note - If you want the same result on all processors, use the following code instead of a PUSH SP instruction:

	push	bp
	mov		bp, sp
	xchg	bp, [bp]

Shift and Rotate Instructions

Starting with the 186/88, all processors mask shift/rotate counts by modulo 32, restricting the maximum count to 31 (in all operating modes, including the virtual-8086 mode). Earlier CPUs do not mask the shift/rotation count, using all 8-bits of CL. So, if we try to perform a 32-bit shift, on newer processors we'll end up with the same result (since the shift count is masked to 0), whereas on an older processor the result will be zero:

	mov		ax, 0ffffh
	mov		cl, 32
	shl		ax, cl		; check if result is zero
	jz	_is_an_older_cpu
	jnz	_is_a_18x_or_higher

MUL Instruction

NEC processors differ from Intel's with respect to the handling of the zero flag (ZF) during a MUL operation. While a NEC V20/V30 does not clear ZF after a non-zero multiplication result, but only according to it, an Intel 8086/88 will always clear it (note that this is only true for the specified processors):

	xor		al, al		; force ZF to set
	mov		al, 40h
	mul		al		; check if ZF is clear
	jz	_is_a_NEC_V20_V30
	jnz	_is_an_Intel_808x

AAD (ASCII Adjust before Division) Instruction

This instruction allows us to distinguish between at least NEC's V-series and Intel processors. AAD, usually in preparation for a division using DIV or IDIV, works like this:

AL = AH * 10 + AL
AH = 0

Converting the unpacked two-digit BCD number in AX into binary. Thus being "0d5h, 0ah" the normal opcode. The difference is that while Intel's chips allow one to replace the multiplicand with any number (and by so building your own AAD instruction for various number systems), NEC always encodes it as 10 by default. So by replacing the second byte with a different number, we can then check if the operand is actually used, and if not, assume it's a NEC.

    mov     ax, 0f0fh
    db      0d5h, 10h       ; opcode for AAD 16
    cmp     al, 0ffh        ; check if multiplicand was 10 or not
    jz      _is_Intel
    jnz     _is_NEC

This should be used as another way (in addition to the one presented in the first article on this subject) to distinguish the NEC V20/V30 series from the Intel 8086/88.

PUSHA Instruction

Here is another good way to differentiate NECs from Intel's 8086/88. Since V20 and V30 execute all the 80186 instructions and knowing that PUSHA executed on the 8086/88 as "JMP $+2", one can for example, after executing it, set the carry flag and then see if it was really set.

    clc                     ; ensure that CF is clear
    pusha                   ; executed on 8086/88 as JMP $+2
    stc
    jc      _is_NEC_or_186plus
    jnc     _is_808x

    whatever code here
    .
    .
    .

_is_NEC_or_186plus:
    popa                    ; clean up

Of course the carry flag must not already be set before performing this test.

POP CS Trick

I'll just show one last way of accomplishing the same. The trick is that, on a 8086/88 (non-CMOS versions, at least), the opcode "0fh" will perform a POP CS, on a 186/88 is an invalid opcode, generating an INT6 exception, while NECs and 286+ use that encoding as a prefix byte, to indicate new instructions. So, to tell NEC's V20/V30 (also V40/V50, I think) and 8086/88 apart, and knowing that with the byte string "0fh, 14h, 0c3h", the CPU will perform the following:

  8086/88                 V20/V30
  -------                 -------
pop     cs              set1   bl, cl
adc     al, 0C3h

It is then easy to write a piece of code that will distinguish between them:

	xor     al, al          ; BTW: clears CF
	push    cs
	db      0fh, 14h, 0c3h  ; intruction(s) -- see above
	cmp     al, 0c3h        ; check if ADC was executed
	je      _is_808x
	jne     _is_NEC_V20plus

	whatever code here
	.
	.
	.

_is_NEC_V20plus:
	pop     ax              ; clean up (no POP CS available)

Note that, again, the carry flag must be cleared before execution of this test. Also, just a reminder that this is to be used when you know that the processor is not a 186 or above but an older one.

POPAD Bug

If either POPA or POPAD is followed by an instruction which uses an effective address calculation consisting of a base register and another register other than (E)AX as an index, the contents of (E)AX is corrupted.

Also, if POPA or POPAD in 16-bit mode is followed by an instruction which uses an effective address using EAX as a base or index, the CPU will hang.

    cli
    mov ax,1234h     ; loaded AX with any nonzero value
    pusha            ; push the value on to the stack, use push all
    popa             ; then pop back again
    mov bl,[bx+si]   ; base-index addressing mode
    sti

If this piece of code is excuted on a 386DX (or some 386SX), the AX register value will be ZERO, but on a 80486 or later CPU, it will be 1234h.

Word Write

On the 8086/88 (+ V20/V30), when a word write is performed at offset 0ffffh in a segment, one byte will be written at that offset and the other at offset 0, while an 80186 family processor will write one byte at offset 0ffffh, and the other, one byte beyond the end of the segment (offset 10000h). So all we have to do is test if it wraps around or not:

    mov     ax, ds:[0ffffh]         ; save original bytes
    mov     word ptr ds:[0ffffh], 0aaaah
    cmp     byte ptr ds:[0], 0aah   ; did 2nd byte wrap around?
    mov     ds:[0ffffh], ax         ; restore original bytes
    je      _is_808x
    jne     _is_8018x

Again, note that this should only be used for the specified processors.

Multi-Prefix Intructions

The standard 8086/88 processors have a bug such that they loose multiple prefixes if an interrupt occurs, while CMOS versions do not, since this bug was fixed in the 80C86/C88 processors (NEC V20/V30 processors also do not have this bug -- allowing the following code to also be applicable to them). If we execute a string operation with a repeat prefix and also a segment override for long enough to be interrupted, then, if we are on a 8086/88 the REP prefix will be lost when the instruction is interrupted, since on return, only the last prefix will be retained. If instead, we are on a low-power consumption CMOS version, the code will successfully complete.

	mov     cx, 0ffffh
	sti
	rep lods    byte ptr es:[si]        ; sure to be interrupted
	cli
	jcxz    _not_standard_808x      ; check if REP was completed

	if here, then it's just a standard 8086/88
	.
	.
	.

Just in case you want to use a piece of code like this without having to worry about that bug, here's how to get it work correctly every time (with interrupts enabled -- this time with MOVS):

do_REP:     rep movs    byte ptr es:[di], es:[si]   ; may be interrupted!
            jcxz    carry_on                        ; if not, carry on,
            loop    do_REP                          ; else, complete REP
carry_on:

SGDT/SIDT instructions

This method is used by Windows 3.0 to check if it's running on a 286 or a 386. The 80386 16-bit forms of the SGDT / SIDT instructions are compatible with the 80286, if the value in the upper eight bits is not referenced. The 80286 stores 1's in these upper bits. These bits were specified as undefined by the SGDT / SIDT instructions in the iAPX 286 Programmer's Reference Manual. The following instructions are taken from Windows 3.0's KERNEL.EXE:

	sgdt	fword ptr [bp-4]
	cmp	byte ptr [bp+1], 0FFh
	jz	_is_a_286
	; _is_a_386_or_higher

Please note that, on any 32-bit CPU, SGDT/SIDT instructions ignore any operand size prefixes and always store full 32 bits of base address. Intel documentation on the other hand states that with 16-bit operand size, SGDT/SIDT stores 24 bits of the base address and the 4th byte is zeroed, and while using 32-bit operand size, all 32 bits of the base address are stored.

ID Register

Beginning with the 80386 processor, Intel included a so-called ID register, which contains information about the processor model and stepping. This register is accessible in an unusual way - it is passed in DX after reset.

To read the ID register one must proceed the following steps:

  1. By storing value 0Ah (resume with jump) at address 0Fh (reset code) in the CMOS data area, inform BIOS not to issue POST after reset, but to return the control to the program.
  2. Update after-reset-far-jump address at 0040h:0067h.
  3. Set shutdown status word (0040h:0072h) to 0, to avoid undesirable side-effects.
  4. Cause a reset.

Causing a reset is typically done by issuing a so-called triple-fault-reset, i.e. causing an error from which the processor cannot recover and enters a reset state. TFR (triple...) can be done only if we have enough control over the processor, i.e. under plain DOS in real mode (no EMS) or under Win'95 (this is risky). The following code shows how to do it in DOS. The code is assumed to be in a COM program.

section .data

GDT		dd 0, 0 		; Selector 0 is empty
		dd 0000FFFFh, 00009A00h ; Selector 8 - code segment
GDTR	dw 000Fh, 0, 0	; Limit 0Fh - two selectors
IDTR	dw 0, 0, 0		; Empty IDT will cause TFR

section .text

	; Ensure that we are in real mode, not in V86
		smsw	ax
		and	al, 1
		jnz	near _skip_tfr_since_in_v86_mode

	; Update code descriptor as we are going to enter pmode
		xor	eax, eax
		mov	ax, cs
		shl	eax, 4
		or	[GDT+10], eax
		add	eax, GDT
		mov	[GDTR+2], eax

	; Update reset code in CMOS data area
		cli					; Disable interrupts
		mov	[SaveSP], sp	; Save stack pointer
		mov	al, 0Fh 		; Address 0Fh in CMOS area
		out	70h, al
times 3 jmp	short $+2		; Short delay
		mov	al, 0Ah 		; Value 0Ah - far jump
		out	71h, al

	; Update resume address
		push	word 0
		pop		es
		mov	[es:0467h], word _tfr	; offset
		mov	[es:0469h], cs			; segment
		mov	[es:0472h], word 0		; Update shutdown status

	; Switch to pmode
		lgdt	[GDTR]			; Load GDT
		lidt	[IDTR]			; Load empty IDT
		smsw	ax
		or		al, 01h 		; Set pmode bit
		lmsw	ax
		jmp	0008h:_reset		; Reload CS
_reset: mov	ax, [cs:0FFFFh] ; Reach beyond segment limit

	; After reset we are here with DX containing the ID register
_tfr:	cli
		mov	ax, cs
		mov	ds, ax
		mov	es, ax
		mov	ss, ax
		mov	sp, [SaveSP]
		sti

Of course there are also other ways of reading the ID register. They are well described in DDJ (www.x86.org).

As said before, the ID register contains information about processor model and stepping. The format of the register is as follows:

bits 15..12	- stepping
bits 11..8	- model
bits 7..0	- revision

Some example ID register values:

0303	i386DX
2303	i386SX
3301	i376

This format of the ID register was used in Intel 386 processors (all except RapidCAD), AMD 386 processors and most of IBM 486 processors.

Another format of the ID register was introduced with Intel 486 processors. This format is similar to the format of CPUID model information (see below), and until the Pentium was kept the same. However newer processors do not keep any useful information in the ID register (it is usually 0). This also concerns Cyrix 486 processors.

bits 15..14	- unused, zero
bits 13..12	- typically indicate overdrive
bits 11..8	- model
bits 7..4	- stepping
bits 3..0	- revision

And some example ID register values with this format for Intel processors:

0401	i486DX-25/33
0421	i486SX
0451	i486SX2

Cyrix DIR

All Cyrix processors have a Device-Identification-Registers, which are used to identify these processors. To read DIRs, one first has to determine that he uses a Cyrix processor. This can be accomplished in two ways:

  1. On modern processors using CPUID instruction.
  2. On first Cyrix processors issuing 5/2 method.

If there is no CPUID instruction, one has to use the other way of determination. If one knows that he is on a 486 processor, he can use the following code:

	mov		ax, 0005h
	mov		cl, 2
	sahf
	div		cl
	lahf
	cmp		ah, 2
	je	_we_are_on_cyrix
	jne	_this_is_not_cyrix

Once we have determined we are on a Cyrix processor, we can read its DIRs to get its model and stepping information. All Cyrix processors have their special registers accessible through ports 22h and 23h. Port 22h keeps register number and port 23h register value.

	; This function reads a Cyrix control register
	; It expects a register address in AL and returns value also in AL
ReadCCR:	out	22h, al 	; select register
times 3		jmp	short $+2	; delay
			in	al, 23h 	; get register contents
			ret

DIRs have offsets 0FEh (DIR1) and 0FFh (DIR0). DIR1 contains revision, while DIR0 contains model/stepping. The following code reads them:

	mov		al, 0FEh
	call	ReadCCR
	mov		[DIR1], al
	mov		al, 0FFh
	call	ReadCCR
	mov		[DIR0], al

Example DIR0 values:

1B	Cx486DX2
31	6x86(L) clock x2
55	6x86MX clock x4

CPUID Instruction

All newer processors have the CPUID instruction, which helps to identify on what processor we are. Before using it, we must first determine if it is supported, by flipping the ID flag (bit 21 of EFLAGS).

	pushfd
	pop		eax
	xor		eax, 00200000h	; flip bit 21
	push	eax
	popfd
	pushfd
	pop		ecx
	xor		eax, ecx	; check if bit 21 was flipped
	jnz	_cpuid_supported
	jz	_no_cpuid

The only problem may be that NexGen processors do not support the ID flag, but they do support the CPUID instruction. To determine that, we must hook Invalid Opcode exception (int6) and execute the instruction. If the exception is triggered, CPUID is not supported.

Also some early Cyrix processors (namely 5x86 and 6x86) have the CPUID instruction disabled. To enable it, we must first enable extended CCRregisters and then enable the instruction, setting bit 7 in CCR4.

	; Enable extended CCRs
		mov	al, 0C3h	; C3 corresponds to CCR3
		call	ReadCCR
		and	ah, 0Fh 	; bits 7..4 of CCR3 <- 0001b
		or	ah, 10h
		call	WriteCCR

	; Enable CPUID
		mov	al, 0E8h	; E8 corresponds to CCR4
		call	ReadCCR
		or	ah, 80h 	; bit 7 enables CPUID
		call	WriteCCR

The following functions are used to read/write CCRs:

ReadCCR:	out	22h, al 	; Select control register
times 3 	jmp	short $+2
			xchg	al, ah
			in	al, 23h 	; Read the register
			xchg	al, ah
			ret

WriteCCR:	out	22h, al 	; Select control register
times 3 	jmp	short $+2
			mov	al, ah
			out	23h, al 	; Write the register
			ret

After enabling CPUID we must test if it is supported by flipping the ID flag, unless of course we have determined that we are not on a 5x86 or 6x86 by reading DIRs.

Once we have determined that CPUID is supported, we can use it to identify the processor. The instruction expects EAX to hold a function number and returns information corresponding to this number in EAX, ECX,EDX and EBX. The two most important levels are listed below.

level 0 (eax=0) returns:

eax		Maximum available level
ebx:edx:ecx	Vendor ID in ASCII characters
		Intel	- "GenuineIntel" (ebx='Genu', bl='G'(47h))
		AMD	- "AuthenticAMD"
		Cyrix	- "CyrixInstead"
		Rise	- "RiseRiseRise"
		Centaur - "CentaurHauls"
		NexGen	- "NexGenDriven"
		UMC	- "UMC UMC UMC "

level 1 (eax=1) returns:

eax		bits 13..12	0 - normal
				1 - overdrive
				2 - secondary in dual system
		bits 11..8	model
		bits 7..4	stepping
		bits 3..0	revision
		If Processor Serial Number is enabled, all 32
		bits are treated as the high bits (95..64) of
		the number.
edx		Processor features (e.g. bit 23 indicates MMX)

There are also other levels, i.e. level 2 returns cache and TLB descriptors, level 3 the rest of Processor Serial Number.

Other processors (AMD, Cyrix) also support extended levels. The first extended level is 80000000h and it returns in EAX the maximum extended level. These extended levels return information specific to that processors, e.g. 3DNow! support or processor name.

This example code determines MMX support:

; First check maximum available level
	xor	eax, eax	; eax = 0 (level 0)
	cpuid
	cmp	eax, 0
	jng	_no_higher_levels

; Now check MMX support
	mov	eax, 1		; level 1
	cpuid
	test	edx, 00800000h	; bit 23 is set if MMX is supported
	jnz	_mmx_supported
	jz	_no_mmx

As this is not the place for listing all the available information about what values are returned by CPUID, ID register or DIRs, you should get the most recent information from the processor vendors.

Also you can find very valuable information about the identification topic on:

www.sandpile.org
www.cs.cmu.edu/~ralf/files.html

Invalid-Opcode Exception Handler (INT6)

From the 80186 and upwards, all processors allow one to implement an invalid-opcode exception handler, which gives us a great way of telling the families of CPUs apart. All one does is, hook the INT6 interrupt vector with our own handler and see if some specific instructions trigger an INT6 or not. With our handler we trap those exceptions and then toggle a little flag, that show us the processor doesn't support that instruction.

In the code below I hooked the INT6 vector by changing the IVT (Interrupt Vector Table) directly, but one can also use DOS services for that, test which processor we're running on and after that restore things back to what they were before (except registers, place some push/pop code yourself according to your needs -- by the way, Robert Collins is a god!). Anyway, the code is pretty much self-explanatory:

        ; Hook INT6 -- set up our own handler
                push    0                       ; point to IVT (0000:0000) - (1
                pop     es                      ;  byte saved thanks to Chris!)
                cli
                lds     ax, es:[6*4]            ; get original handler vector
                mov     es:[6*4], offset INT6_handler   ; then, replace it with
                mov     es:[6*4+2], cs                  ;  our own handler
                sti

        ; Test if processor is at least a 80186 -- Executes "SHL DX, 10"?
                mov     cx, 1           ; set up invalid-opcode flag
                shl     dx, 0ah
                jcxz    unknown_CPU

        ; Test if processor is at least a 80286 -- Executes "SMSW DX"?
                smsw    dx
                jcxz    _is_80186

        ; Test if processor is at least a 80386 -- Executes "MOV EDX, EDX"?
                mov     edx, edx
                jcxz    _is_80286

        ; Test if processor is at least a 80486 -- Executes "XADD DL, DL"?
                xadd    dl, dl
                jcxz    _is_80386

                <if here, then it's a 80486 or higher processor>
                .
                .
                .

        ; Restore original INT6 handler address -- for all processors type!
                cli
                mov     es:[6*4], ax    ; restore original INT6 offset
                mov     es:[6*4+2], ds  ; restore original INT6 segment
                sti

                <whatever code here>
                .
                .
                .

        ; Our own INT6 handler
INT6_handler:
                xor     cx, cx          ; toggle invalid-opcode flag
                push    bp
                mov     bp, sp
                add     word ptr ss:[bp+2], 3   ; adjust  the return address to
                                                ;  after the invalid opcode  (3
                                                ;  bytes for all)
                pop     bp
                iret

Note, that for this code: 1) should only be used if you know the processor is at least a 80186, 2) if you fiddle with the contents of AX, ES and DS and change them before restoring the original INT6 handler don't forget to first save and then restore them!, 3) of course the code in the INT6_handler should only be executed by means of an INT6!

Maybe a very small extra explanation is required regarding the INT6_handler. We need to adjust the return address, since when an invalid opcode exception is issued the saved contents of CS and EIP (which are pushed onto the stack) point to the instruction that generated the exception, instead of the next one (as usually happens for other interrupts).

Instruction Prefetch Queue

16-bit (ie. 8086s, 80186s, V30s) processors have a prefetch queue 6 bytes in size and replenish the instruction queue after having at least two bytes empty in the queue, while their 8-bit bus versions (ie. 8088s, 80188, V20s) only have a 4 byte prefetch queue and initiate the prefetch cycle when there is at least one empty byte in it.

So, knowing this about their Bus Interface Unit design, it isn't difficult to write some code to distinguish between the two categories. We'll make a routine that uses self-modifying code to change the opcode at the fifth byte and then see if it was executed or not.

        xor     cx, cx
		cli                     ; prevent against queue being emptied
		lea     di, patch
		mov     al, 90h         ; load NOP opcode
		stosb                   ; patch fifth byte to a NOP
		nop
		nop
		nop
		nop
patch:  inc     cx              ; did the INC execute?
		sti
		jcxz    _is_8bit

		; if here, then it's an 16-bit processor

I believe there is enough time for the prefetch queue to fill, though I have no chance to confirm it!

Just in case you want to be on the safe side, here's a routine that will most certainly work:

        xor     dx, dx
        cli                     ; prevent against queue being emptied
        lea     di, patch+2
        mov     al, 90h         ; load NOP opcode
        mov     cx, 3
        std
        rep     stosb           ; patch fifth byte to a NOP
        nop
        nop
        nop
        nop
patch:  inc     dx              ; did the INC execute?
        nop
        nop
        sti
        test    dx, dx
        jz      _is_8bit

        ; if here, then it's an 16-bit processor

Again, I must stress that this code should only be used for the specified processors, since it will without a doubt fail on others.

CPU identification algorithm

Here is our size-optimized way of determining the processor type. It's an algorithm that uses Intel's guidelines and tests between pre-80286, 80286, 80386, 80486 without CPUID and 80486+ with CPUID support.

    ; Detection of pre-80286/80286/386+ processors
    mov     ax, 7202h       ; set bits 12-14 and clear bit 15
    push    ax
    popf
    pushf
    pop     ax

    test    ah, 0f0h
    js      _is_pre286      ; bit 15 of FLAGS is set on pre-286
    jz      _is_80286       ; bits 12..15 of FLAGS are clear on 286
                            ;  processor in real mode  (no V86 mode
                            ;  on 286)

    ; if here, then it's a 80386 or higher processor

    ; Detection of 80386/80486(w/out CPUID)/80486+(CPUID compliant)
    pushfd
    pop     eax
    mov     edx, eax
    xor     eax, 00240000h  ; flip bits 18 (AC) and 21 (ID)
    push    eax
    popfd
    pushfd
    pop     eax

    xor     eax, edx        ; check if both bits didn't toggle
    jz      _is_80386
    shr     eax, 19         ; check if only bit 18 toggled
    jz      _is_80486_without_CPUID

    ; if here, then it's a 80486 with CPUID or higher processor

And so, we got the whole code down to a measly 46 bytes!

CR0 Register - Bit 4

The 80386 DX may be differentiated from the other models by trying to clear bit 4 (ET) in the CR0 register. It can be toggled on the 80386 DX, while it is hardwired to 1 on any of the other family models. So this gives us a good way to differentiate them, by trying to clear that bit and then see if it got forced to set or not.

    ; Test CR0 register -- bit 4 (ET)
    mov     eax, cr0
    mov     edx, eax        ; save original CR0
    and     al, 11101111b   ; clear bit 4
    mov     cr0, eax
    mov     eax, cr0
    mov     cr0, edx        ; restore original CR0
    test    al, 00010000b   ; check if bit 4 was forced high
    jz      _is_a_80386DX_model
    jnz     _is_not_a_80386DX_and_therefore_is_some_other_model

Note that I'm not sure if this can safelly/trustfully be done under protected mode!

Identify NexGen 586

The NexGen 586 does not have AC bit of EFLAGS because it had designed the chip to be an 80386 clone. NexGen 586 can be distinguished from i80386 by following code:

        mov    eax, 0x5555
        mov    ecx, 2
        xor    edx, edx
        clc
        div    ecx
        jz     you_have_NexGen_586
        jmp    you_have_i80386

If you set ZF=1 and then execute a divide instruction with a nonzero remainder, if ZF changes the CPU is an Intel CPU, else is the NexGen 5x86.