The Curse of External Fragmentation: Relocate or Bust!

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

External Fragmentation External fragmentation arises when free memory is separated into small blocks and is interspersed by allocated memory.— Wikipedia —Very often we can see that memory which is obtained by an application, is not returned back when demand for the memory decreases. In particular, in [NoBugs11] it was observed that modern browsers suffer from this greatly – and a somewhat-educated guess was made that the culprit is heap fragmentation (more precisely – so-called “External Fragmentation”).

Let’s research the phenomenon of external fragmentation in a bit more detail.

Who Is To Blame?

Who is to Blame? (Russian: Кто виноват?)

— novel by Alexander Herzen, 1847 —

“Let's consider a model when we have an allocator built on top of CPU virtual pages, and which has N allocation items per page.Let’s consider a model when we have an allocator built on top of CPU virtual pages, and which has N allocation items (=”memory fragments returned by malloc()”) per page. Let’s assume that (a) our allocator cannot relocate already-allocated items (implementing relocation would require support from app level at least in C/C++), and that (b) it can free (=”return back to OS”) any whole page as soon as there are no items in it.

As for the number N, we can observe that:

for modern CPUs, typical CPU page size ranges from 4K to 64K (with 2M page sizes also possible).
- in practice, even if 4K pages are used, most likely we’ll have to allocate multiple virtual pages at once, to avoid creating too many mappings (the latter is manifested under Linux as ENOMEM error returned by mmap() and rather counterintuitively – by munmap()). This, in turn, leads to being unable to free individual pages, with typical sizes of chunks-requested-from-OS being in the range of megabytes (sic!). We’ll ignore this phenomenon for the time being – mostly because even without it the picture is pretty much hopeless 🙁 .
for a typical app, average allocation size can easily range from 32-128 bytes.
which means that even for 4K chunks-requested-from-OS, we’re speaking about 32(=4Kbytes/128 bytes) to 128(=4Kbytes/32 bytes) items per chunk requested from OS.

Now, let’s consider the following use pattern:

first, we’re allocating lots of allocation items – fully filling M CPU virtual pages
then, we’re gradually and randomly deallocating our allocation items.

While this model is not perfect (no model is), we feel that it describes typical use cases (including (i) server load with daily load variations, and (ii) client-side work including closing several browser windows) reasonably well.

Now, we have run a simulation of the pattern above, to see how many items we should free to get some of the pages returned back to the OS. We have to say that the results are quite disappointing (that is, if we hoped to get any virtual memory back):

External fragmentation: freeing allocation items doesn't free virtual pages

As we can see, even while we’re ignoring the need to allocate larger-chunks-than-4K, we have to free 80% of items to get any of the pages back, and to get 50% of the pages back – we have to free 98% of the items. Which for real-world apps translates into

We won’t ever get (almost) any virtual memory back, plain and simple

If we think about it, this result should be expected: without relocation of our allocated items, even one single item per page will force us to keep it, and as we have 32-128 of them per page – chances to free the page with only a fraction of items being freed, are minuscule. This is also consistent with other observations, in particular with those made in [NoBugs11].

What Is To Be Done?

What Is To Be Done? (Russian: Что де́лать?)

— novel by Nikolai Chernyshevsky, 1863 —

We know of two approaches which can help to mitigate this external fragmentation problem.

The first one is an app-level mitigation, which is widely used by modern browsers such as Chrome and Edge. The very general idea is to split the monolithic app state into several separate states, and then those separate states, when the time comes, can be dropped as a whole without causing additional fragmentation. In browser world, this is usually achieved by running multiple processes (each process, of course, having its own heap), but strictly speaking, this idea is not limited to separate processes. In particular, it will work well with multiple (Re)Actors running within the same process (as long as each (Re)Actor has its own heap). One inherent limitation of this approach (whether heap is per-process or per-(Re)Actor) is that if state (process/(Re)Actor) is intended to live “forever and ever” – it will still suffer from the external fragmentation.

The second approach is to design our allocator in a way which allows to relocate items. This approach also requires support from app-level, but it is significantly more generic: in particular, (a) it allows to deal with external fragmentation without dropping the whole heap, which happens to be very important for apps with long-lasting states; and (b) at least in theory, relocation allows not only to reduce fragmentation, but to eliminate it entirely. How to implement it for C/C++ world, is a different story which we won’t discuss right now; all we need to know at the moment is that it is possible without too much trouble, at least for (Re)Actor-based programs. One such approach was discussed in [NoBugs17], and we’re working on another (significantly simplified) approach right now.

Summary

To summarize the whole thing in a few sentences:

“it usually doesn't make much sense to spend CPU cycles on maintaining data structures which allow to free OS pagesfor a more-or-less generic allocator, it usually doesn’t make much sense to spend CPU cycles on maintaining data structures which allow to free OS pages; most of the time it will hurt performance without any realistic benefit. OTOH, allocators+memory models which allow for relocatable allocated items, CAN provide very serious benefits for certain classes of apps.

Not that it is something new (quite a few of modern allocators are going along the lines of the first part) – but we feel that it makes sense to articulate this observation more explicitly, so that less time is spent on the unnecessary efforts in the future.

Or we can rephrase the whole thing even more shortly:

Relocate or Bust!

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Who Is To Blame?

What Is To Be Done?

Summary

[+]References

Acknowledgement

Related posts

Leave a Reply Cancel reply