New P6 OpCodes: RDPMC

RDPMC - 0F 33 - Read Performance Monitor Counter

Flags:                                      Conditional MOVE data
+-+-+-+-+-+-+-+-+-+                       +----------+----------+
|O|D|I|T|S|Z|A|P|C|                       | 00001111 | 00110011 |
+-+-+-+-+-+-+-+-+-+                       +----------+----------+
| | | | | | | | | |                       |    0F    |    33    |
+-+-+-+-+-+-+-+-+-+                       +----------+----------+




    IF (CPL != 0) && (CR4.PCE == 0) {
    IF (ECX = 0) {
        EDX:EAX = Performance_Counter0;
    } ELSE IF (ECX = 1) {
        EDX:EAX = Performance_Counter1;
    } ELSE #GP(0);


This is a native instruction which reads the P6 performance monitor counters. Like the Pentium, the P6 contains two 40-bit monitor counters which can be programmed to monitor independent events. The performance monitor counters reside in the model-specific register space, like the Pentium, but can now be accessed via this new instruction.

The purpose of providing this instruction is to eliminate the fault which occurs when a user application (like CPL-3) attempts to read the performance counters. Normally, a user-program would attempt to read the MSR, which would generate a fault to the operating system, which would then execute the instruction, and provide the results back to the user program. User access of the RDPMC instruction is not guaranteed. Like RDTSC, user access is controlled by a bit in CR4. CR4.PCE (bit-8) controls whether or not a user program can execute the RDPMC instruction without faulting. The benefits of providing user access, are finer granularity of event timing. Suppose an user program was monitoring some events, and needed a high degree of accuracy. Without user-level access of this instruction, the overhead required to fault to the operating system would ruin the results.

Like many other features, the P6 performance counters are implementation-specific. Therefore, there is no CPUID feature flags bit indicating the presence of this instruction.


The weird thing about this instructions, is that Intel doesn't guarantee that it serializes the instruction stream. The P6' dynamic execution doesn't necessarily execute instructions in order. This could imply that if two RDPMC instruction appeared in close proximity to each other, that the second could be executed before the first. If this were allowed to occur, then the results obtained would be meaningless, as the second RDPMC results were less than that of the first RDPMC. Thank goodness, the P6 will not let this happen. But the moral to the story is, that RDPMC does not serialize instructions on the P6. Consider the following code sequence:

        XOR     ECX,ECX                 ; Use 1st performance monitor counter
        RDPMC                           ; Read 1st counter
        MOV     [MEM1],EAX              ; save the performance counter
        MOV     [MEM2],EDX              ;   save upper bits of counter
        ADD     EDX,EAX                 ; do something completely useless
        RDPMC                           ; Read 1st counter again

If P6 retired the second RDPMC instruction before the first RDPMC, then the results would be invalid. Moreover due the out-of-order execution in the P6, there is no guarantee that all of the instructions between the two RDPMC instructions were completed (retired) at the time of the second RDPMC.

The above example is a derivative of one which Intel provided. In their example, they indicated that the performance counter could be programmed to track the number of instructions which were retired. If it was programmed for such an operation, then the above example should show that 4 instructions were retired between the two RDPMC instructions. (The Intel documents state that 5 instructions would be completed, but I believe this is in error.) Due to out-of-order execution, the actual number between the two RDPMC instructions could be 5, 6, or even 7 or more; or it could even be negative (though the P6 will not allow this to happen). To obtain reliable results from the counter, Intel recommends putting in a serializing instruction, like CPUID..

16-bit code:

Even though RDPMC returns values in 32-bit registers (EDX:EAX), it may be executed in 16-bit code, including v86 mode, without an operand size override. Even when executed in 16-bit code segments, RDPMC always uses the full 32-bits of ECX to select which performance counter to return (so don't set any of the high-order bits).

Flags affected:



#GP(0) as listed above.

Get description of        [AAM]        [AAD]        [UMOV]       [LOADALL]
New P6 OpCodes            [CMOV]       [FCMOV]      [FCOMI]
                          [INT01]      [SALC]       [UD2]