Why linux kernel use trap gate to handle divide_error exception?

问题

In kernel 2.6.11.5, divide zero exception handler is set up as:

set_trap_gate(0,&divide_error);

According to "Understanding The Linux Kernel", Intel trap gate cannot be accessed by a User Mode process. But it's quite possible that a user mode process also generate a divide_error. So why Linux implement it in this way?

[Edit] I think that the question is still open, since set_trap_gate() sets DPL value of IDT entry to 0, which means only CPL=0 (read kernel) code can execute it, so it's unclear to me how this handler may be called from the user mode:

#include<stdio.h>

int main(void)
{
    int a = 0;
    int b = 1;

    b = b/a;

    return b;
}

which was compiled with gcc div0.c. And the output of ./a.out is:

Floating point exception (core dumped)

So it doesn't look like this was handled by the division by 0 trap code.

回答1:

I have the Linux kernel 3.7.1 sources on the hands, and due to this I will try to provide the answer to your question on the base of those sources. What we have in the code. In the arch\x86\kernel\traps.c we have function early_trap_init() where the next code line can be found:

set_intr_gate(X86_TRAP_DE, &divide_error);

As we can see the set_trap_gate() was replaced by set_intr_gate(). If in the next turn we expand this call we will achieve:

_set_gate(X86_TRAP_DE, GATE_INTERRUPT, &divide_error, 0, 0, __KERNEL_CS);

_set_gate is a routine that is responsible for two things:

Constructing of the IDT descriptor
Installing of the constructed descriptor into the target cell in the IDT descriptors array. The second one is just memory copying and isn't interesting for us. But if we will look at how it constructs descriptor from the supplied parameters we will see:
```
struct desc_struct{
    unsigned int a;
    unsigned int b;
};

desc_struct gate;

gate->a = (__KERNEL_CS << 16) | (&divide_error & 0xffff);
gate->b = (&divide_error & 0xffff0000) | (((0x80 | GATE_INTERRUPT | (0 << 5)) & 0xff) << 8); 
```

Or finally

gate->a = (__KERNEL_CS << 16) | (&divide_error & 0xffff);
gate->b = (&divide_error & 0xffff0000) | (((0x80 | 0xE | (0 << 5)) & 0xff) << 8);

As we can see at the end of the descriptor construction we will have the next 8-bytes data structure in memory

[0xXXXXYYYY][0xYYYY8E00], where X denotes digits of kernel code segment selector number, and Y denotes digits of address of the divide_error routine.

These 8-bytes data structure is a processor defined interrupt descriptor. It is used by processor to identify what actions must be taken in reply to the acceptance of interrupt with particular vector. Let’s now look to the format of the interrupt descriptor defined by Intel for x86 processors family:

                              80386 INTERRUPT GATE
31                23                15                7                0
+-----------------+-----------------+---+---+---------+-----+-----------+
|           OFFSET 31..16           | P |DPL|  TYPE   |0 0 0|(NOT USED) |4
|-----------------------------------+---+---+---------+-----+-----------|
|             SELECTOR              |           OFFSET 15..0            |0
+-----------------+-----------------+-----------------+-----------------+

In this format the pair of SELECTOR:OFFSET defines the address of the function (in the long format) that will take control in reply to the interrupt acceptance. In our case this is __KERNEL_CS:divide_error, where the divide_error() is actual handler of the Division By Zero exception. P flag specifies is that descriptor should be considered as a valid descriptor that was correctly setup by OS and in our case it in raised state. DPL - specify the security rings on which the divide_error() function can be triggered by using soft interrupts. Some background needed to understand the role of that field.

In general there are three kinds of interrupt sources:

External device that requests a service from the OS.
Processor itself, when found that it income into the abnormal state requesting the OS to help it to get out from that state.
Program executing on the processor under the OS control, which requests some special service from the OS.

The last case has special support from the processor in the form of dedicated instruction int XX. Each time when the program wants the OS service, it setup parameters that describes request and issue int instruction with parameter that describes the interrupt vector, which is used by OS for service providing. Interrupts generated by issuing of the int instruction called soft interrupts. So here, the processor takes DPL field into account only when it handle soft interrupts, a completely ignore them in the case of interrupts generated by processor itself or by external devices. DPL is a very important feature, because it prohibits applications from simulating devices, and by this imply to the system behavior.

Imagine for example that some application will make something like this:

for(;;){
    __asm int 0xFF; 

  //where 0xFF is vector used by system timer, to notify the kernel that the 
   another one timer tick was occurred
}

In that case time in your computer will go much faster then in real life, then you expect and your system expect. As result your system will misbehaves very strongly. As you can see the processor and external devices are considered as trusted, but it is not a case for user mode applications. In our case of Division By Zero exception, Linux specify that this exception can be triggered by soft interrupt only from the ring 0, or in other words, only from the kernel. As result, if the int 0 instruction will be executed in the kernel space, processor will pass control to the divide_error() routine. If the same instruction will be executed in the user space, kernel will this as a protection violation and will pass control to the General Protection Fault exception handler (this is a default action for all invalid soft interrupts). But if the Division By Zero exception will be generated by processor itself tried to divide some value by zero, control will be switched to the divide error() routine regardless of the space where incorrect division was occurred. In general it looks like it won't be a big harm to allow application to trig Division By Zero exception by soft interrupt. But for the first it will be an ugly design and for the second some logic can be behind the scene, which relies to the fact that Division By Zero exception can be generated only by actual incorrect division operation.

TYPE field specifies the auxiliary actions that must be taken by processor in reply to the interrupt acceptance. In practice only two types of exception descriptors is used: interrupt descriptor and trap descriptor. They differ only in one aspect. Interrupt descriptor forces the processor to disable future interrupt acceptance and trap descriptor doesn't. Honestly, I have no idea why Linux kernel decided to use the interrupt descriptor for Division By Zero exception handling. The trap descriptor sounds more reasonable for me.

And last note in regard to the confusing output of test program

Floating point exception (core dumped)

By historical reasons, Linux kernel replies to the Division By Zero exception by sending SIGFPE (read SIGnal Floating Point Exception) signal to the process attempted to divide by zero. Yes, not SIGDBZ (read SIGnal Division By Zero). I know this is confusing enough. The reason of such behavior is that Linux mimics original UNIX behavior (I think that this behavior was frozen in the POSIX) and original UNIX some why consider "Division By Zero" exception as a "Floating Point Exception". I don't know why.

回答2:

DPL bit in IDT is looked at only when software interrupt is called with the int instruction. Division by zero is a software interrupt triggered by the CPU and thus has DPL has no effect in this case

回答3:

User-mode code has no business accessing system tables such as the segment and interrupt descriptor tables, they aren't intended to be manipulated outside of the OS kernel and there's no need to. Linux handlers for exceptions such as division by zero, general protection exception, page fault and others intercept exceptions originating from both user-mode and kernel-mode code. They may handle them differently based on the origin, but the interrupt descriptor table contains an address of just one handler for every kind of exception (e.g. the above). And every handler knows how to handle its exception.

回答4:

The kernel is not running under user mode. It has to handle the trap generated by user mode programs (e.g. linux processes in user-land). Kernel code is not expected to divide by zero.

I don't understand well your question. How would you implement it otherwise?

回答5:

The answer to part of your question can be found at Section 6.12.1.1 of "Intel(R) 64 and IA-32 Architectures and Software Developer's Manual, Volume 3A"

The processor checks the DPL of the interrupt or trap gate only if an exception or interrupt is generated with an INT n, INT 3, or INTO instruction. Here, the CPL must be less than or equal to the DPL of the gate. This restriction prevents application programs or procedures running at privilege level 3 from using a software interrupt to access critical exception handlers, such as the page-fault handler, providing that those handlers are placed in more privileged code segments (numerically lower privilege level). For hardware-generated interrupts and processor-detected exceptions, the processor ignores the DPL of interrupt and trap gates.

It's what Alex Kreimer answered

Regarding to the message. I'm not totally sure, but it seems that OS sends the SIGFPE signal to the process.

来源：https://stackoverflow.com/questions/8530794/why-linux-kernel-use-trap-gate-to-handle-divide-error-exception

标签

Linux

x86

kernel

interrupted-exception