| Intel originally included LOADALL
        in the CPU mask for testing purposes and In Circuit
        Emulator (ICE) support. As its name implies, LOADALL loads
        all of the CPU registers, including the
        "hidden" software-invisible registers. At the
        completion of a LOADALL instruction, the entire
        CPU state is defined according to the LOADALL data
        table. LOADALL loads all of the software-visible
        registers such as AX, and all of the
        software-invisible registers such as the segment descriptor caches. By
        manipulating the descriptor cache base registers, you can
        access the entire address space without switching to
        protected mode. In other words, by using LOADALL, you
        can access memory above 1Mb from real mode. Since the
        alternative method for the 286 (switching to protected
        mode, accessing the desired memory, then resetting the
        CPU - the only way to get the 286 back to real mode) has
        a significant performance penalty, LOADALL is most
        significant to 286 programmers. LOADALL provides
        them with a new capability that is not available by any
        other means. LOADALL Details LOADALL is closely coupled with the CPU
        hardware. Both the 286 and 386 have different internal
        hardware and Intel implemented LOADALL using
        different opcodes on the 286 and 386. 80286 LOADALL (opcode
        0F05) produces an invalid opcode exception when executed
        on the 386, and 80386 LOADALL (opcode 0F07)
        produces an invalid opcode exception when executed on the
        286. LOADALL loads all CPU registers (including MSW,
        GDTR, CSBASE, ESACCESS) from a memory image. You can
        execute LOADALL in real or protected mode, but
        only at privilege level 0 (CPL=0). If you execute LOADALL
        at any other privilege level, the CPU generates an
        exception. By directly loading the descriptor cache registers
        with LOADALL, a program has explicit control over
        the base address, segment limit, and access rights
        associated with each memory segment. Normally, the CPU
        loads these values each time it loads a segment register,
        but LOADALL allows you to load these hidden
        registers independently of their segment register
        counterparts. In real mode, LOADALL makes it possible to
        access a memory segment that is not associated with any
        segment register. Likewise in protected mode, you can
        access memory that has no descriptor table entry. LOADALL performs no protection checks against
        any of the loaded register values. When you execute it at
        CPL 0, LOADALL can generate no exceptions. The
        segment access rights and limit portions may be values
        that would otherwise be illegal in the context of real
        mode or protected mode, but LOADALL willingly
        loads these values with no checks. Once loaded, however,
        the CPU performs full access checks when accessing a
        segment. For example, you can load a segment whose access
        is marked "not present." Normally, this
        condition would generate exception 11, "segment not
        present", but LOADALL does not generate
        exception 11. Instead, any attempt to access this segment
        will generate exception 13. LOADALL does not check coherency between the
        software-visible segment registers and the
        software-invisible segment descriptor cache registers.
        Any segment descriptor base register may point to any
        area in the CPU address space, while the software-visible
        segment register may contain any other arbitrary value.
        The CPU makes all memory references according to the
        descriptor cache registers, not the software-visible
        segment registers. All subsequent segment register loads
        will reload the descriptor cache register. Beware of
        using values in CS that do not perfectly match a code
        segment descriptor table entry, or a real mode code
        segment - an interrupt return (IRET) may either
        cause an exception or execution to resume at an
        unexpected location. Likewise, pushing and subsequently
        popping any segment register will force the descriptor
        cache register to reload according to the CPU's
        conventional protocol, thereby inhibiting any further
        real mode extended memory references. 80286 LOADALL You encode the 80286 LOADALL as a two-byte
        opcode, 0F05h. LOADALL reads its table from a
        fixed memory location at 800h (80:0 in real-mode
        addressing). LOADALL performs 51 bus cycles (WORD
        cycles), and takes 195 clocks with no wait states. Table 1 shows the format you must
        prepare at location 800h before executing the 286 LOADALL
        instruction. All CPU register entries in the LOADALL
        table conform to the standard Intel format, where the
        least significant byte is at the lowest memory address. Table 2 shows the 286 format of the
        descriptor cache entries. 
   
            
            Table 1 -- 80286 LOADALL
            Table
            
		    | Physical Address | Description | Data Size | Data Value |  
		    | [800] [802]
 [804]
 [806]
 [808]
 [80A]
 [80C]
 [80E]
 [810]
 [812]
 [814]
 [816]
 [818]
 [81A]
 [81C]
 [81E]
 [820]
 [822]
 [824]
 [826]
 [828]
 [82A]
 [82C]
 [82E]
 [830]
 [832]
 [834]
 [836]
 [83C]
 [842]
 [848]
 [84E]
 [854]
 [85A]
 [860]
 [866]
 | None None
 MSW
 None
 None
 None
 None
 None
 None
 None
 None
 TR_REG
 FLAGS
 IP
 LDT_REG
 DS_REG
 SS_REG
 CS_REG
 ES_REG
 DI
 SI
 BP
 SP
 BX
 DX
 CX
 AX
 ES_DESC
 CS_DESC
 SS_DESC
 DS_DESC
 GDT_DESC
 LDT_DESC
 IDT_DESC
 TSS_DESC
 ENT OF TABLE
 | DW DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DW
 DESC_CACHE286
 DESC_CACHE286
 DESC_CACHE286
 DESC_CACHE286
 DESC_CACHE286
 DESC_CACHE286
 DESC_CACHE286
 DESC_CACHE286
 
 | 0 0
 0
 ?
 0
 0
 0
 0
 0
 0
 0
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 
 |  
                | DESC_CACHE286 STRUC
    Addr_A15_A00 DW ?
    Addr_A23_A16 DB ?
    Access DB ?
    Limit DW ?
ENDS |  Intel recommends some guidelines for
        proper execution following LOADALL. The stack
        segment should be a read/write data segment; the code
        segment can be execute on1y (access=95h), read/execute
        (access=9bh), or read/write/execute (access=93h). Proper
        protected mode operation also requires that the DPL of CS
        and DPL of SS be equal. These attributes
        determine the CPL of the processor. Also, the DPL fields
        of ES and DS should be equal to 3 to
        prevent RETF or IRET instructions from
        zeroing these registers.  The code in listing 1
        demonstrates how to exp1ore the various operating modes
        with 286 LOADALL and how to access extended memory
        while in real mode. The LOADALL test performs
        various functions that would be impossible to duplicate
        without using LOADALL. 80386 LOADALL The 386 LOADALL is encoded as a two-byte opcode
        (0F07). Unlike the 286 LOADALL, this LOADALL instruction
        reads its data from a table pointed to by ES:EDI. Segment
        overrides are allowed, but apparently ignored. The 386
        LOADALL performs 51 bus cycles (DWORD cycles) and takes
        122 clocks with no wait states. Table 3 shows the 386
        LOADALL format. However, Table 3 does
        not show that prior to reading the LOADALL table, LOADALL
        reads 10 DWORDs exactly 100h bytes beyond the beginning
        of the table (ES:EDI+100h). This data is not used to load
        any of the registers LOADALL does not load (CR2, CR3,
        DRO-DR3, TR6, TR7), or the Numeric Processor eXtension
        (NPX). At this time, the purpose of reading this data and
        its destination is a mystery. Figure 1 shows an ICE trace
        showing all the bus cycles associated with LOADALL's
        execution. As with the 286 LOADALL, all CPU register entries in
        the LOADALL table are in the standard Intel format where
        the least significant byte is at the lowest memory
        address. The 386 descriptor cache entries have the format
        shown in Table 4. Listing
        2 shows how to test 386 LOADALL. This test is more
        comprehensive than the 286 LOADALL test because of the
        expanded capabilities of the 386 microprocessor. This
        test puts the CPU into various states that are illegal
        and are impossible to duplicate through any other
        software means. LOADALL Emulation Due to the large number of systems programs that use
        286 LOADALL, all 386 and 486 BIOS's must emu1ate
        the 286 LOADALL instruction (opcode 0F05). On
        the 386 and 486, the 286 LOADALL instruction
        generates an invalid opcode exception. The BIOS traps
        this exception and does its best to emulate the
        functionality of the LOADALL instruction, but
        perfect emulation is impossible without using LOADALL itself.
        Using 386 LOADALL to emulate 286 LOADALL can
        be done, but has its risks. First of all, the 486 does
        not have a LOADALL instruction. Second, Intel has
        threatened to remove LOADALL from the 386 mask. Perfect emulation is possible on the 386 by using 386 LOADALL
        to emulate 286 LOADALL. Listing 3
        shows a TSR program that uses 386 LOADALL to
        emulate 286 LOADALL. The program first tests that
        you are a 386 before insta1ling itself. By using this
        emu1ation program, you can guarantee perfect 286 LOADALL
        emulation. Conclusion LOADALL is a very powerful instruction, but the
        features that make it so powerful also make it risky. For
        example, LOADALL can put the processor in states
        that are otherwise impossible to duplicate through any
        other software means. Using LOADALL requires a
        thorough understanding of how the CPU processes register
        loads, the ramifications of those register loads, and
        careful planning. The illegally induced processor states
        can easily cause system crashes if not properly planned
        for. The best way to avoid system crashes is to avoid
        using LOADALL unless you are totally confident in
        your understanding of the CPU and in your programming
        skills. The 286 LOADALL is described in a 15-page
        Intel-confidential document The document describes in
        detail how to use the instruction, and also describes
        many of its possible uses. LOADALL can be used to
        access extended memory while in real mode, and to emulate
        real mode while in protected mode. Programs such as
        RAMDRIVE, ABOVEDISC, and OS/2 use LOADALL. DOS 3.3
        has provisions for using LOADALL by leaving a
        102-byte 'hole' at 80:0. If you are a systems programmer
        and have a need to know this information, Intel will
        provide it, along with source code to emulate 286 LOADALL
        on the 386 (without using 386 LOADALL). Unlike the 286 LOADALL, the 386 LOADALL is
        still an Intel top secret. l do not know of any document
        that describes its use, format, or acknowledges its
        existence. Very few people at Intel wil1 acknowledge that
        LOADALL even exists in the 80386 mask. The
        official Intel line is that, due to U.S. Military
        pressure, LOADALL was removed from the 80386 mask
        over a year ago. However, running the program in
        Listing-2 demonstrates that LOADALL is alive,
        well, and still available on the latest stepping of the
	80386. 
 View source code for 286 LOADALL:http://www.rcollins.org/ftp/source/286load/286load.asm
 http://www.rcollins.org/ftp/source/286load/loadfns.286
 http://www.rcollins.org/ftp/source/286load/macros.286
 http://www.rcollins.org/ftp/source/include/cpu_type.asm
 View source code for 386 LOADALL:http://www.rcollins.org/ftp/source/386load/386load.asm
 http://www.rcollins.org/ftp/source/386load/loadfns.386
 http://www.rcollins.org/ftp/source/386load/macros.386
 http://www.rcollins.org/ftp/source/include/cpu_type.asm
 View source code for EMULOAD (286 LOADALL
emulation using 386 LOADALL):http://www.rcollins.org/ftp/source/emuload/emuload.asm
 http://www.rcollins.org/ftp/source/include/cpu_type.asm
 Download entire source code archive for 286LOAD,
386LOAD, and EMULOAD:http://www.rcollins.org/ftp/dloads/loadall.zip
 | 
            
                | DESCRIPTOR
                CACHE REGISTERSWhether in real or protected mode, the CPU
                stores the base address of each segment in hidden
                registers called descriptor cache registers. Each
                time the CPU loads a segment register, the
                segment base address, segment size limit, and
                access attributes (access rights) are loaded, or
                "cached," ) into these hidden
                registers. To enhance performance, the CPU makes
                all subsequent memory references via the
                descriptor cache registers instead of calculating
                the physical address, or looking up the base
                address in the descriptor table. Understanding
                the role of these hidden registers is paramount
                for exploiting highly advanced programming
                techniques, and for exploiting the undocumented
                LOADALL instruction.Figure 2(a)
                shows the descriptor cache layout for the 80286,
                and Figure 2(b) shows the
                layout for the 80386, and 80486. 
                    Figure
                    2 (a) 80286 Descriptor Cache Register
                    
                        | [47..32] | 31 | [30..29] | 28 | [27..25] | 24 | [23..00] |  
                        | 16-bit
                        Limit | P | DPL | S | Type | A | 24-bit
                        base address | 
 
                    
                        | 
                            Figure
                            2 (b) 80386/80486 Descriptor Cache
                            Register
                            
                                | [31..24] | 23 | [22..21] | 20 | [19..17] | 16 | 15 | 14 | [13..00] |  
                                | 0 | P | DPL | S | Type | A | 0 | D | 0 |  |  
                        | 
                            
                                | [63..32] |  
                                | 32-bit
                                Physical Address |  |  
                        |  | At power-up, the descriptor
                cache registers are loaded with fixed, default
                values, the CPU is in real mode, and all segments
                are marked as read/write data segments, including
                the code segment (CS). According to Intel, each
                time the CPU loads a segment register in real
                mode, the base address is 16 times the segment
                value, while the access rights and size limit
                attributes are given fixed, "real-mode
                compatible" values. This is not true. In
                fact, only the CS descriptor cache access rights
                get loaded with fixed values each time the
                segment register is 1oaded - and even then only
                when a far jump is encountered. Loading any other
                segment register in real mode does not change the
                access rights or the segment size limit
                attributes stored in the descriptor cache
                registers. For these segments, the access rights
                and segment size limit attributes are honored
                from any previous setting (see Figure
                3). Thus it is possible to have a four
                giga-byte, read-only data segment in real mode on
                the 80386, but Intel will not acknowledge, or
                support this mode of operation. Protected mode differs from real mode in this
                respect each time the CPU loads a segment
                register, it fully loads the descriptor cache
                register, no previous values are honored. The CPU
                loads the descriptor cache directly from the
                descriptor table. The CPU checks the validity of
                the segment by testing the access rights in the
                descriptor table, and illegal va1ues will
                generate exceptions. Any attempt to load CS with
                a read/write data segment will generate a
                protection error. Likewise, any attempt to 1oad a
                data segment register as an executable segment
                will also generate an exception. The CPU enforces
                these protection rules very strictly if the
                descriptor table entry passes all the tests, then
                the CPU loads the descriptor cache register.  Figure
                3 -- Descriptor Cache Contents (Real Mode) 
 | 
  
 
            
            Table 2 (a) -- 80286 Descriptor Cache Entry Formats 
            
                | Offset | Description |  
                | 0-2 | 24-bit physical address of the segment in
                memory. These bytes are stored in standard Intel
                format with the least significant byte at the
                lowest memory address. |  
                | 3 | Access rights. The format of this byte is the
                same as that in the descriptor table. This access
                byte is loaded in the descriptor cache register
                regardless of its validity. Therefore the
                "present" bit in the access rights
                field becomes a "descriptor valid" bit.
                When this bit is cleared, the descriptor is
                considered invalid, and any memory reference
                using this descriptor generates exception 13,
                with error code 0. The Descriptor Privilege Level
                (DPL) of the SS and CS descriptor caches
                determines the Current Privilege Level (CPL). The
                CS descriptor cache may be loaded as a read/write
                data segment. |  
                | 4-5 | Segment limit. The standard 16-bit segment
                limit stored in standard Intel format. |  
 
 
            
            Table 2 (b) -- 80286 GDT and IDT Descriptor Cache
            Entry Formats
            
                | Offset | Description |  
                | 0-2 | 24-bit physical address of the segment in
                memory. |  
                | 3 | Should be 0. |  
                | 4-5 | Segment limit. |   
 
            
            Table 3 -- 80386 LOADALL
            Table 
            
                | Offset | Description | Data Size | Data Value |  
                | [00] [04]
 [08]
 [0C]
 [10]
 [14]
 [18]
 [1C]
 [20]
 [24]
 [28]
 [2C]
 [30]
 [34]
 [38]
 [3C]
 [40]
 [44]
 [48]
 [4C]
 [50]
 [54]
 [60]
 [6C]
 [78]
 [84]
 [90]
 [9C]
 [A8]
 [B4]
 [C0]
 [CC]
 | CR0 EFLAGS
 EIP
 EDI
 ESI
 EBP
 ESP
 EBX
 EDX
 ECX
 EAX
 DR6
 DR7
 TR_REG
 LDT_REG
 GS_REG
 FS_REG
 DS_REG
 SS_REG
 CS_REG
 ES_REG
 TSS_DESC
 IDT_DESC
 GDT_DESC
 LDT_DESC
 GS_DESC
 FS_DESC
 DS_DESC
 SS_DESC
 CS_DESC
 ES_DESC
 LENGTH OF TABLE
 | DD DD
 DD
 DD
 DD
 DD
 DD
 DD
 DD
 DD
 DD
 DD
 DD
 REG_STRUC
 REG_STRUC
 REG_STRUC
 REG_STRUC
 REG_STRUC
 REG_STRUC
 REG_STRUC
 REG_STRUC
 DESC_CACHE
 DESC_CACHE
 DESC_CACHE
 DESC_CACHE
 DESC_CACHE
 DESC_CACHE
 DESC_CACHE
 DESC_CACHE
 DESC_CACHE
 DESC_CACHE
 
 | ? ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 ?
 <?>
 <?>
 <?>
 <?>
 <?>
 <?>
 <?>
 <?>
 <?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 <?,?,?>
 
 |  
                | REG_STRUC STRUC
    REG_VAL    DW     ?
               DW     0
ENDS | DESC_CACHE STRUC
              DB     0
     _Type    DB     ?
              DB     0
              DB     0
     _Addr    DD     ?
    _Limit    DD     ?
ENDS |   
 
            
            Table 4 (a) -- 80386 Descriptor Cache Entries 
            
                | Offset | Description |  
                | 0-3 | Access rights. The access rights dword
                consumes 11 bits of this 32-bit field. See figure 2 for a
                complete description of this field. |  
                | 4-7 | 32-bit base address of the segment in
                memory.. |  
                | 8-11 | 32-bit base address of the segment in memory. |  
 
 
            Table 4 (b) -- 80386 GDT
            and IDT Descriptor Cache Entry Formats 
            
                | Offset | Description |  
                | 0-3 | Should be 0. |  
                | 4-7 | 32-bit base address of GDTR or IDTR. |  
                | 8-11 | 32-bit limit of GDTR or IDTR. |   
 
    Figure 1 -- In-Circuit-Emulator
    Trace of 80386 LOADALL
    Instruction
    
        | Frame | The FRAME
        number is like a clock count for the CPU. At every CPU
        clock, the ICE takes a picture. When a valid cycle
        occurs, the ICE records its occurance. Therefore, it is
        possible to determine how many CPU clocks a sequence of
        instructions takes to execute by reading this
        information. |  
        | Type | Cycle type.
        Shown here are F=Fetch, R=Read, and X=eXecute. |  
        | Address | The 32-bit
        physical address asserted on the CPU address bus during
        each cycle. |  
        | Data | The data asserted on the
        CPU data bus during each cycle. |  
        | BE3# BE2#
 BE1#
 BE0#
 | Byte enable pins on the
        CPU. These pins determine which bytes of the 32-bits of
        data are valid. These pins are active low, so 8-bits of
        data are valid for each '0.' |  
        | W/R# | Write/Read. | Write = 1 | Read = 0 |  
        | D/C# | Data/Code. | Data = 1 | Code = 0 |  
        | M/IO# | Memory/IO | Memory = 1 | IO = 0 |  
    
        | Frame
Dec  | Type  | Address
(Hex)  | Data
(Hex)  | BBBB
EEEE
3210
####  | WDM
///
RCI
  O
### | Comments |  
        | 5
8
011
013
015
017
019
021
023
025
027
029
031
033
035
037
039
041
043
045
047
049
051
053
055
057
059
061
063
065
067
069
071
073
075
077
079
081
083
085
087
089
091
093
095
097
099
101
103
105
107
109
111
113
115
117
119
121
123
125
127
129
131 | F
X
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R | 0000DE40
executed
0000D8F0
0000D8F4
0000D8F8
0000D8FC
0000D900
0000D904
0000D908
0000D90C
0000D910
0000D914
0000D7F0
0000D7F4
0000D7F8
0000D7FC
0000D800
0000D804
0000D808
0000D80C
0000D810
0000D814
0000D818
0000D81C
0000D820
0000D824
0000D828
0000D82C
0000D830
0000D834
0000D838
0000D83C
0000D840
0000D844
0000D848
0000D84C
0000D850
0000D854
0000D858
0000D85C
0000D860
0000D864
0000D868
0000D86C
0000D870
0000D874
0000D878
0000D87C
0000D880
0000D884
0000D888
0000D88C
0000D890
0000D894
0000D898
0000D89C
0000D8A0
0000D8A4
0000D8A8
0000D8AC
0000D8B0
0000D8B4
0000D8B8 | B490070F
2bytes
01010101
02020202
03030303
04040404
05050505
06060606
07070707
08080808
09090909
0A0A0A0A
7FFFFFE0
00000002
00000133
66666666
77777777
55555555
88888888
22222222
44444444
33333333
11111111
FFFF0FF0
0000D402
xxxx0000
xxxx0000
xxxx5555
xxxx4444
xxxx2222
xxxx6666
xxxx1111
xxxx3333
00008900
00070000
00000800
00000000
00000000
000003FF
00000000
00000000
00000000
00008200
00090000
00000088
00008300
00050000
0000FFFF
00009300
00040000
0000FFFF
00009300
00020000
0000FFFF
00009300
00060000
0000FFFF
00009B00
0000DD30
0000FFFF
00009300
00030000
00FFFFFF | 0000
at
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
1100
1100
1100
1100
1100
1100
1100
1100
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000 | 001
DE40L
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011 | LOADALLfetched
LOADALLbeginsexecution
\
 \
  \   The10"mystery"
   \  reads,exactly
    \ 100hbytesbeyond
    / thebeginningof
   /  theLOADALLtable.
  /
 /
/
CR0
EFLAGS
EIP
EDI
ESI
EBP
ESP
EBX
EDX
ECX
EAX
DR6
DR7
TRRegister
LDTRegister
GSRegister
FSRegister
DSRegister
SSRegister
CSRegister
ESRegister
TSSDescriptorCache
IDTDescriptorCache
GDTDescriptorCache
LDTDescriptorCache
GSDescriptorCache
FSDescriptorCache
DSDescriptorCache
SSDescriptorCache
CSDescriptorCache
ESDescriptorCache
 |  |