Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I was reading a blog on 64-bit Firefox edition on hacks.mozilla.org.

The author states:

For asm.js code, the increased address space also lets us use hardware memory protection to safely remove bounds checks from asm.js heap accesses. The gains are pretty dramatic: 8%-17% on the asmjs-apps-*-throughput tests as reported on arewefastyet.com.

I was trying to understand how 64-bit hardware have automatic bounds check (assuming compiler does with hardware support) for C/C++. I could not find any answers in SO. I found one technical paper on this subject, but I am not able to grasp how this is done.

Can someone explain 64-bit hardware aids in bounds check?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
137 views
Welcome To Ask or Share your Answers For Others

1 Answer

Most modern CPUs implement virtual addressing/virtual memory - when a program references a particular address, that address is virtual; the mapping to a physical page, if any, is implemented by the CPU's MMU (memory management unit). The CPU translates every virtual address to a physical address by looking it up in the page table the OS set up for the current process. These lookups are cached by the TLB, so most of the time there's no extra delay. (In some non-x86 CPU designs, TLB misses are handled in software by the OS.)

So my program accesses address 0x8050, which is in virtual page 8 (assuming the standard 4096 byte (0x1000) page size). The CPU sees that virtual page 8 is mapped to physical page 200, and so performs a read at physical address 200 * 4096 + 0x50 == 0xC8050.

What happens when the CPU does not have a TLB mapping for that virtual address? Such a thing occurs frequently because the TLB is of limited size. The answer is that the CPU generates a page fault, which is handled by the OS.

Several outcomes can occur as a result of a page fault:

  • One, the OS can say "oh, well it just wasn't in the TLB because I couldn't fit it". The OS evicts an entry from the TLB and stuffs in the new entry using the process's page table map, and then lets the process keep running. This happens thousands of times per second on moderately loaded machines. (On CPUs with hardware TLB miss handling, like x86, this case is handled in hardware, and is not even a "minor" page fault.)
  • Two, the OS can say "oh, well that virtual page isn't mapped right now because the physical page it was using was swapped to disk because I ran out of memory". The OS suspends the process, finds some memory to use (perhaps by swapping out some other virtual mapping), queues a disk read for the requested physical memory, and when the disk read completes, resumes the process with the freshly filled page table mapping. (This is a "major" page fault.)
  • Three, the process is trying to access memory for which no mapping exists - it's reading memory it shouldn't be. This is commonly called a segmentation fault.

The relevant case is number 3. When a segfault happens, the default behavior of the operating system is to abort the process and do things like write out a core file. However, a process is allowed to trap its own segfaults and attempt to handle them, perhaps even without stopping. This is where things get interesting.

We can use this to our advantage to perform 'hardware accelerated' index checks, but there are a few more stumbling blocks we hit trying to do so.

First, the general idea: for every array, we put it in its own virtual memory region, with all of the pages that contain the array data being mapped as usual. On either side of the real array data, we create virtual page mappings that are unreadable and unwritable. If you attempt to read outside of the array, you'll generate a page fault. The compiler inserts its own page fault handler when it made the program, and it handles the page fault, turning it into an index-out-of-bounds exception.

Stumbling block number one is that we can only mark whole pages as being readable or not. Array sizes may not be an even multiple of a page size, so we have a problem - we can't put fences exactly before and after the end of the array. The best we can do is leave a small gap either before the beginning of the array or after the end of the array between the array and the nearest 'fence' page.

How do they get around this? Well, in Java's case, it's not easy to compile code that performs negative indexing; and if it does, it doesn't matter anyway because the negative index is treated like it's unsigned, which puts the index far ahead of the beginning of the array, which means that it's very likely to hit unmapped memory and will cause a fault anyway.

So what they do is to align the array so that the end of the array butts up right against the end of a page, like so ('-' means unmapped, '+' means mapped):

-----------++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------
|  Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  | ...
                 |----------------array---------------------------|

Now, if the index is past end of the array, it'll hit page 7, which is unmapped, which will cause a page fault, which will turn into an index out of bounds exception. If the index is before the beginning of the array (that is, it's negative), then because it's treated as an unsigned value, it'll become very large and positive, putting us far past page 7 again causing an unmapped memory read, causing a page fault, which will again turn into an index out of bounds exception.

Stumbling block number 2 is that we really should leave a lot of unmapped virtual memory past the end of the array before we map the next object, otherwise, if an index was out of bounds, but far, far, far out of bounds, it might hit a valid page and not cause an index-out-of-bounds exception, and instead would read or write arbitrary memory.

In order to solve this, we just use huge amounts of virtual memory - we put each array into its own 4 GiB region of memory, of which only the first N few pages are actually mapped. We can do this because we're just using address space here, not actual physical memory. A 64 bit process has ~4 billion chunks of 4 GiB regions of memory, so we have plenty of address space to work with before we run out. On a 32-bit CPU or process, we have very little address space to play around with, so this technique isn't very feasible. As it is, many 32-bit programs today are running out of virtual address space just trying to access real memory, nevermind trying to map empty 'fence' pages in that space to try to use as 'hardware accelerated' index range checks.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...