Rss Feed
Tweeter button
Facebook button
Technorati button
Reddit button
Myspace button
Linkedin button
Webonews button
Delicious button
Digg button
Flickr button
Stumbleupon button
Newsvine button

Memory Fragmentation, your worst nightmare

We first know when someone has a memory fragmentation problem when we receive an email that goes something like this:

Hi, we’ve been using Memory Validator for a year or two now. It’s really helped a ton. But now we’re stuck. We can see in Task Manager that our program is using more and more memory (between 32MB and 128MB every 10 minutes) but when I exit the program Memory Validator is reporting no leaks. How can that be? Is there a problem with MV?.

At this point we ask a few questions and make sure it’s not just a problem with holding onto data until the last minute before freeing it which is causing these reports. Once we’ve ruled that out, we mention the dreaded F word. Fragmentation.

Over the last 11 years we’ve written many emails to people explaining about memory fragmentation and what you can do to mitigate it. I thought we should put this information out there for you all to benefit from. As usual if you have anything to add, any extra techniques or insights, or constructive criticism, please add a comment or email support. Thanks, Stephen.

Test application

Because memory fragmentation takes various forms ranging from very subtle to downright blatant in your face fail I’ve created a test application you can use to generate on demand memory fragmentation. You can then play with this application and the suggested tools and techniques to understand fragmentation analysis. And you can analyse the code for the test application to see just how easy it is to create memory fragmentation by using careless allocation strategies.

Learn more about mvFragmentation memory fragmentation test application.

What is memory fragmentation?

Memory fragmentation is when the sum of the available space is a memory heap is large enough to satisfy a memory allocation request but the size of any individual fragment (or contiguous fragments) is too small to satisfy that memory allocation request. This probably sounds confusing, so I will illustrate how this situation can arise.

To simplify the explanation let us consider a simple computer system that can only allocate 8 chunks of memory, each 1KB in size.

At program start the program hasn’t allocated any memory. The memory landscape looks like this:

Memory fragmentation example no memory

The application allocates 1KB, 4KB, 2KB. The memory landscape looks like this:

Memory fragmentation example 3 allocations

The program then deallocates 4KB, 1KB. The memory landscape now looks like this:

Memory fragmentation example 3 allocations, 2 deallocations

The program needs to do some other work before repeating it’s task. It allocates 1KB to store some data during the next task. The memory landscape now looks like this:

Memory fragmentation example 3 allocations, 2 deallocations, 1 allocation

Now the first task needs to repeat. The application wants to allocate the 1KB, 4KB workspace that is used last time around (and deallocated at the end of the task). The program allocates 1KB. The memory landscape looks like this:

Memory fragmentation example, memory fragmented

Now the program wants to allocate 4KB. But although there is 4KB free space, it is not contiguous. The memory is fragmented.

This is a simplified example. Because I’ve used power of two memory allocation sizes this example probably doesn’t exist in real life because many memory allocators use power of two sizing to place different allocations in different memory bins. But in terms of demonstrating what causes memory fragmentation, this example demonstrates if perfectly.

What causes fragmentation?

The previous section provided an overview of what memory fragmentation is. Now we investigate some of the causes of memory fragmentation.

Memory alignment

Many memory allocators return memory blocks aligned on specific memory boundaries. For example aligned with the width of the computer architecture pointer size. For example on a 32 bit chip the alignment would be 4 bytes; on a 64 bit chip the alignment would be 8 bytes.

Some allocators allow the allocation requestor to specify the alignment of the allocation being made.

Whatever the reason for the alignment it stands to reason that to satisfy that alignment sometimes there will be unused (or wasted) space immediately prior to the base address of the memory allocation. This wasted space will be after an earlier memory allocation. The space will often be too small for the memory allocator to use to satisfy a memory request.

Memory alignment example

Heap workspace

For the memory heap to manage itself, the heap must also use memory. For some heaps the heap uses memory in a separate space from the heap itself. An example of this is the release mode Microsoft C Runtime heap. In other heaps the heap uses the heap memory to manage itself. An example of this is the debug mode Microsoft Runtime heap. When the heap management is done in the heap each allocation adds an overhead for the amount of memory required to satisfy the allocation and to manage the allocation. This overhead increases the size of the allocation and may result in the total size required not matching the allocator’s perfect allocation size, resulting in wasted space.

Heap guard space

Related to heap workspace is heap guard space. This is typically found in debug mode heaps. The guard space is a block of a few bytes before the block and after the block. The guard space is filled with a known value that can be checked at any time to see if it has been modified. This is typically used to detect buffer overruns. The guard space increases the size of the allocation required and may result in the total size required not matching the allocator’s perfect allocation size, resulting in wasted space.

Memory allocator strategies

Each memory allocator has it’s own strategy for deciding how to allocate memory and provide it to the software calling the allocator. Each allocator strategy will be optimial for some types of usage and less useful or even dangerous in other circumstances. A common strategy is to allocate memory in bins, each bin being of a particular size and each size being twice that of the previous bin.

Memory allocator power of 2 bins

Allocations that fit into a particular bin but which cannot be served by a smaller bin are served by that bin. Bins may be created ahead of time or on a just in time basis. If a bin does not exist to satisfy a strategy a new bin will be created. This new bin will be twice the size of the previous bin. Repeat until you get to the bin size you need. This has been shown to be quite a useful strategy. The problem is that this strategy can allocate from bins that are much larger than is requied to satisfy an allocation, thus leaving large chunks of memory unused.

Heap allocation usage pattern

Even though you may be using an allocator with a great pedigree and superb performance, at the end of the day the allocator can only do so much when presented with a particular sequence of allocations, reallocations and deallocations by an application. The behaviour of your software and the memory allocation characteristics it exhibits can contribute greatly to the lack of memory fragmentation your application experiences or it can cause the very memory fragmentation problems that are causing you to lose your hair.

As such although most people don’t pay much attention to memory fragmentation because you often do not need to, you need to be aware of what memory fragmentation is, what causes it and how to mitigate memory fragmentation on the (hopefully) few occasions you encounter it.

Memory leaks

Although at the start of this article I said we try to rule out memory leaks as a possible cause you cannot discount them. The cause of the fragmentation could be something as simple as failing to deallocate a small allocation that if deallocated would allow a much larger allocation to be allocated at the same address each time through a computation loop.

An example of this would be a process that each time through it’s loop (say serving a web page) allocates 1MB to do some work, then 1 byte, then deallocates the 1MB but does not deallocate the 1 byte. Each 1 byte allocation will be locking up a region of memory larger than 1 byte (the bin size from which that 1 byte was allocated). Over time this will eventually turn into a significant memory leak, but before you get to that point there will be a lot of memory fragmentation caused by the wasted memory associated with each 1 byte leak.

Because of this you must always consider memory leaks before you start thinking about changing your memory allocator or using private/custom heaps. Fixing any memory leak is:

  • A smart thing to do.
  • A lot cheaper than changing your code to accommodate alternate memory allocation strategies (see later for details).

Memory allocation lifetime

Memory allocation lifetime also plays a part in memory fragmentation. Objects with short lifetimes only occupy space in the heap for a short period of time. As such their effect on fragmentation is minimal. But objects that live for a long period of time (or forever in the case of leaked memory), these objects prevent the larger space of free memory around them from forming a contiguous free memory region that could satisfy a memory request.

Solutions to this problem are to where possible allocate all long lived objects in their own heap so that they do not affect the fragmentation of other heaps. If you can’t do this, try to allocate the long lived objects before any other objects so that they (hopefully) get allocated at one end of the heap or the other (implementation dependent).

Is fragmentation affected by the amount of memory in my computer?

The amount of memory in your computer will not affect whether you suffer memory fragmentation. Memory fragmentation is caused by a combination of the allocation strategy used by the allocator you are using, the sizes and alignments of the internal structures, combined with the memory allocation behaviour of your software applcation.

That said the more memory you have the longer it will be before you feel the effects of memory fragmentation. That isn’t necessarily a good thing. The sooner you know about it the sooner you can fix it.

Conversely if you don’t have a lot of memory you may not experience memory fragmentation because your program doesn’t have enough workspace to get into a situation where memory fragmentation is an issue. Given the memory that most modern PCs have these days I doubt this situation will be facing you.

Does fragmentation affect all computer programs?

Memory fragmentation affects all computer programs that use a dynamic memory allocator that does not use garbage collection (or similar mechanisms) to remove memory fragmentation by compacting the memory heap.

It is important to note that some garbage collected allocators have a Large Object Heap which is used to handle large memory allocations. Examples of this are the Microsoft .Net Runtime and Java. These Large Object Heaps are not compacted. As a result even these garbage collected heaps can suffer memory fragmentation, but only for large objects. What constitutes “large” is implementation dependent. For .Net, “large” means 85,000 bytes.

Systems that do not use dynamic memory allocators do not suffer from memory fragmentation. Examples of these are many small embedded systems. Although an embedded system in the late 1980s was an 8 bit 6801 with 64KB of RAM programmed in assembler, whereas now it’s 32 bit ARM with 256MB RAM and a C compiler. So today, it’s quite possible your embedded system is at risk from memory fragmentation whereas the devices I worked on 25 years ago were not at risk.

When is fragmentation more likely to be a problem?

Fragmentation is more likely to be a problem when your application makes a series of allocations and deallocations such that each time an allocation is made it cannot re-use space that was left by a previous deallocation of a similar (or larger) size block.

Or put another way if you have a large range of widely differing memory sizes in your program’s memory allocation behaviour you probably stand a higher chance of suffering from memory fragmentation than if all your memory allocations are of similar sizes.

How can I detect if I my program is suffering from fragmentation?

There are tell tale signs that your program may be suffering from fragmentation:
  • One sign of memory fragmentation is that your program may start to run a lot slower. This is because the allocator has to spend more time searching for a suitable place to put each memory allocation. You’ll notice this for applications that have a very subtle form of memory fragmentation which only wastes small amounts of memory for each fragment.
  • Another sign of memory fragmentation is that some memory allocations fail but most memory allocations succeed. Yet when you examine the amount of memory used by your program there always seems to be enough memory to satisfy even the memory allocation calls that failed. The type of fragmentation that wastes large amounts of memory and prevent large allocations from happening – these programs tend to run at full speed then just fail to allocate memory. Much less subtle, but easier to identify the problem.
  • If you are using a custom heap and have access to some heap diagnostics then you can perform the following calcuation to determine if that heap is fragmented. Find the largest free block size in the heap (not a block that is in use). Find the total free space in heap. If the largest free block size is small compared to the total free size then you probably have a fragmentation problem. What defines “small”? Well that is for you to decide based on your understanding of the application you are working on. No absolute values. Sorry.

There are several methods you can use to detect memory fragmentation. These all involve the use of free tools and/or commercial tools. Firstly we need to establish that the software does not suffer from any memory leaks and also does not suffer from any resource (handle) leaks. You can do this with your favourite memory leak tool, for example C++ Memory Validator.

Once you know there are no leaks occurring when you run the software we can turn our attention to the memory allocation behaviour of the software. We can inspect this using various tools. These are listed in the order they were created.

  • Task Manager
  • Task Manager can be used for identifying trends in memory usage. Both the graphical display and the various memory counters can help you.

    There are various memory related counters:

    • Memory Working Set
    • Memory Peak Working Set
    • Memory Working Set Delta
    • Memory Private Working Set
    • Memory Commit size
    • Memory Paged Pool
    • Memory Non-paged Pool

    The counters you are interested in are Private Working Set and Memory Commit Size.

    The other counters may be increasing, decreasing, but they are not relevant. We are concerned about ever increasing application memory use. As such we want to know the private amount of memory in use – memory that is not shared with other applications. The commit size also shows you the amount of memory actually in use as opposed to reserved for possible use. Another counter that also reflects the total memory size of the process is Virtual Memory Size (VM Size).

    If these values continue to increase but your memory leak tool shows that you have no memory leaks then your application is almost certainly suffering from memory fragmentation.

    Task Manager viewing memory values

  • C++ VM Validator virtual view
  • C++ VM Validator is a free software tool for visualising virtual memory. We wrote this tool over 12 years ago so that it was easy to visualize memory fragmentation problems that would cause memory allocation failures when allocating large blocks of memory. Using the virtual view you can watch your application’s memory usage. This is particularly useful when you are watching what happens when you load a large image (satellite photo), do some work, unload it, do some work then load another image. If you are suffering fragmentation you can see the image doesn’t reload in the same place each time. VM Validator provides a view on to the paging behaviour of your software (for some versions of Windows) and the following three views which will be useful for investigating memory fragmentation.

    • A graphical view of virtual memory.
    • A breakdown of memory pages by memory region.
    • A breakdown of memory paragraphs by memory region. Memory Paragraphs are the minimum size (64KB) allocated by VirtualAlloc().

    VM Memory Validator graphical view of memory pages VM Memory Validator view of memory pages VM Memory Validator graphical view of memory paragraphs

  • C++ Memory Validator virtual view
  • C++ Memory Validator is our C++ memory leak detection tool. C++ Memory Validator also has a similar view to the C++ VM Validator tool. This view is the virtual view and shows:

    • A graphical view of virtual memory.
    • A breakdown of memory pages by memory region.
    • A breakdown of memory paragraphs by memory region. Memory Paragraphs are the minimum size (64KB) allocated by VirtualAlloc().
    • A visualization of the memory collected by Memory Validator so that you can see the memory gaps (or sandbars) between each object that is currently active – this is the Pages view (not the Pages subtab on the Virtual view).

    Using the virtual view you can watch your application’s memory usage. This is particularly useful when you are watching what happens when you load a large image (satellite photo), do some work, unload it, do some work then load another image. If you are suffering fragmentation you can see the image doesn’t reload in the same place each time.

    C++ Memory Validator graphical view of memory pages C++ Memory Validator view of memory pages C++ Memory Validator graphical view of memory paragraphs C++ Memory Validator pages view

  • Process Explorer
  • Process Explorer from SysInternals can also be used to monitor memory. If you go to the View menu then choose Select Columns… then go to the Process Memory tab you can select which values you want to view. Selecting Virtual Size allows you to see the total size of your application’s virtual memory. You can save these values for later use by going to the View menu then choosing Save Column Sets….

    ProcessExplorer displaying memory data

    Double clicking a graph will display the resource monitor so that you can inspect the data more clearly.

    ProcessExplorer displaying detailed memory data

  • VMMap
  • VMMap is another SysInternals tool that shows you the virtual memory map of your application. This is similar to C++ VM Validator but very different in appearance. You can use it in a similar way to how we describe above.

    VMMap displaying memory data

    VMMap also has a “fragmentation view” which you can access from the View menu. This is similar to the VM Validator Virtual view.

How do I prevent fragmentation?

It’s almost impossible to prevent memory fragmentation before you see it because the fragmentation is a function of your application’s behaviour. However once you’ve ruled out memory and resource leaks and established that memory fragmentation is the problem then there are various tactics and strategies you can use to mitigate the memory fragmentation.

Premature optimisation

You’re no doubt familiar with the phrase that the worst type of performance optimisation is premature optimisation. This is also true of memory fragmentation. Do not try to guess ahead of time which parts of your program will cause fragmentation and which part won’t. You almost certainly won’t get it right. This will mean wasted effort on custom heaps for areas that don’t need it. And most likely a more complex implementation than required. Much better to write your software then observe it’s behaviour and address the behaviour you find, if you need to.

Different approaches

There are a variety of different approaches that can be taken to mitigate memory fragmentation. You can use each approach on it’s own or in conjunction with other approaches listed here. None of these approaches are mutually exclusive.

Use the Windows Low Fragmentation Heap

The Windows Low Fragmentation Heap (LFH) was introduced with Windows XP. It was also back ported to Windows 2000 SP4 although I doubt many of you reading this will still be working on Windows 2000, although many of you are still working on Windows XP (after all, your customers still are!).

The LFH can be enabled or disabled using HeapSetInformation.

Note that you cannot enable the LFH for heaps that have the HEAP_NO_SERIALIZE flag set.

#define HEAP_LFH 2

HINSTANCE	hKernel;

hKernel = GetModuleHandle(_T("kernel32.dll"));	// kernel32 is always loaded, so can just lookup
if (hKernel != NULL)
{
	HeapSetInformationProc	hsip;

	hsip = (HeapSetInformationProc)GetProceAddress(hKernel, "HeapSetInformation");
	if (hsip != NULL)
	{
		ULONG	enable = HEAP_LFH;
		BOOL	b;

		b = (*hisp)(hHeap, HeapCompatibilityInformation,
			    &enable, sizeof(enable));

		// add error checking here

		...
	}
}

Replacement heap manager

Probably the easiest and simplest approach to take is to try swapping out the memory manager for a different memory manager. There are commercial and open-source heap managers available. Commercial:

  • Cherrystone’s Extensible Scalable Allocator (ESA). Sorry couldn’t get a useful link for CherryStone.
  • MicroQuill’s SmartHeap.
Free:

I’m not saying that you should try one of these allocator’s. I have no idea how simple or how complex it is to replace your allocator with another allocator. But if it is simple to replace then trying another allocator to see if that allocator handles your application’s memory allocation behaviour such that your memory fragmentation problems are solved, that may be a good, effective use of your time.

Custom heap manager

You could try writing your own heap manager to reduce memory fragmentation. But I don’t recommend it. This is a non-trivial task (even if it seems trivial at first glance) if you want to have good CPU performance, good memory performance, and good robustness and good allocation strategy. There are companies for which their entire business model is providing high performance heap managers. If a business can be built on this you can bet it’s not a trivial job.

That said if you can find a special edge case (as we have, see Linear Heap below) then writing your own custom heap manager can be very helpful.

Allocate objects in specific heaps

Rather than just use malloc, new etc to allocate in the C runtime heap you could choose to do all allocations for specific objects in a specific heap created by using HeapCreate(); This is useful because it forces all allocations of a specific size and type into one heap. Thus the allocation behaviour that was causing fragmentation in one heap is now split among many heaps and may not cause fragmentation when split like that.

char *ptr;

ptr = HeapAlloc(hStringHeap, 0, len);
if (ptr != NULL)
{
    strcpy(ptr, data);

    ...
}

Then when you are at a suitable point where you can destroy the heap you can do that, and then re-create the heap effectively setting fragmentation for that heap to zero.

Override operator new / operator delete

This is a variation of the previous topic. You override operator new and operator delete to place different object types in different heaps. There are many ways you can set this up. This is a simple example where you set the heap for the class using a static function. Derive all other classes for this heap from this base class.

class myObject
{
public:
	myObject();

	virtual ~myObject();

	void *operator new(size_t	nSize);

	void operator delete(void	*ptr);

	static void setHeap(HANDLE	h);

private:
	static HANDLE hHeap;
};

HANDLE myObject::hHeap = 0;

myObject::myObject()
{
}

myObject::~myObject()
{
}

void *myObject::operator new (size_t	size)
{
	return HeapAlloc(hHeap, 0, size);
}

void myObject::operator delete (void	*ptr)
{
	if (ptr != NULL)
	{
		HeapFree(hHeap, 0, ptr);
	}
}

void myObject::setHeap(HANDLE	h)
{
	hHeap = h;
}

Reduce the number of allocations and deallocations

If you can reduce the number of memory allocations and memory deallocations you are reducing the chance for fragmentation to occur. As such anything you can do to reduce how often you allocation or deallocate memory will usually help. From this stems the concept of memory pools and reuse.

Memory reuse

If you have commonly used chunks of memory of the same size that are allocated and deallocated frequently then you may be better off reusing the allocated memory rather than deallocating it then reallocating it. This places less stress on the memory allocator, is faster and reduces fragmentation.

If you are reusing a large number of memory allocations you’ll probably need to have a manager class for each group of allocations so that you can ask for a new object to work with. We do this as part of our communications buffer handling in our software tools.

Object reuse

Another variation on reducing the number of memory allocations and deallocations is to reuse objects. This reduces fragmentation. There are a few ways to reuse objects. You can simply reuse the object you have. To do this you may reinitialise it by copying a different object to it, or you may call a method to reset the object. We’ve seen cases of people calling the object destructor to destroy the object contents – this works because they don’t call delete, thus the memory is not deallocated.

    objectPtr->~dingleBerry();

Probably not the most common practice you’ll see. We prefer to implement a dedicated reset() / flush() method which resets the object. We typically call that from the destructor.

If you are reusing a large number of objects you’ll probably need to have a manager class for each group of objects so that you can ask for a new object to work with. We do this as part of our communications buffer handling in our software tools.

Memory pools

Sometimes it’s better to plan ahead and allocate all the objects ahead of time. These objects then live in a pool. When an object is needed the code asks the pool manager for an object. The object is used. When the object is no longer needed it is given back to the pool manager. The same strategy can be applied to memory chunks of given sizes.

This can be particularly effective if you allocate all the objects or memory blocks in one allocation then divide that allocation into the appropriate number of memory blocks or objects. This allows no scope for fragmentation within the large allocation.

Memory Pool

Destroying custom heaps

If you are using custom heaps to store data of a particular type if you can completely destroy the heap at a particular opportune moment and then recreate it then you can effectively set the heap fragmentation to zero for that heap. Good opportunities for this are when you close a document or when data queues get empty.

Linear heaps

You can use what we call a linear heap to provide a zero fragmentation heap. A linear heap can however only be used in a restricted set of circumstances.

A linear heap is a memory heap that allows you to dynamically allocate memory with the proviso that you must deallocate memory in the order it was allocated. Memory cannot be reallocated, expanded or compacted in place (no support for realloc() or __expand()). These restrictions means that the heap can contain many allocations and each allocation sits immediately after the previous allocation. There is never any gap between the end of one allocation and the start of another allocation (except for alignment purposes). Deallocations simply remove the data from the start of the heap. The heap is split into pages. A page is created when the current page is full and cannot hold any new allocations. As memory is deallocated from a heap page the page hold less data until eventually it holds no data. When a heap page is empty it is discarded to either the free list for reuse or it is decommited back to the operating system for reuse.

This type of heap is very fast to use as it doesn’t need to think about best fit, find an unused block that’s the right size or any of the other house keeping tasks that most memory allocators have to do. The heap also doesn’t use any of the power of 2 or other strategies to manage memory. Memory is simply allocated in a linear fashion, marching through the memory space the heap is using. When that space is exhausted more is requested and the same procedure is followed. The heap never suffers from memory fragmentation.

We use linear heaps in all our inter-process communications queues. C++ Memory Validator in particular puts quite a stress on the communications queues due to the fact it can queue up to 1 million items before switching to synchronous communications. One of our customers runs tests that monitor multiple billions of events over several days. Part of what allows that to happen is despite the wide variety of data sizes (many of which are defined by the data in the customer’s application) our monitoring software does not suffer from memory fragmentation in these key high use components.

So far as we are aware the linear heap is our own invention. We haven’t heard of anyone using them before.

Intern all strings

If you can intern various objects such that for each use of the object a single instance can be used this can prevent fragmentation caused by the creation and destruction of many instances of such objects.

Example: A classic case for interning is the use of strings. Consider that you have an application that needs to process a large number of strings but the application does not know the content of the strings but the application does know that any duplicates can be reduced to a single copy. A good example would be a debugging symbol handler. You may have 100 classes but the full symbol name for each method is className::methodName so className can be interned. What about the method names? These can also be interned so that any references to the method name are only stored once.

There are some useful side effects of this technique:

  • Reduced memory use.
  • Faster processing due to less memory allocations and deallocations.
  • Reduced fragmentation due to less heap usage.
  • You can easily store these interned objects in their own heap allowing you to deallocate all objects just by destroying the heap, reducing any fragmentation in that heap to zero.

We use a variant of this technique to manage the symbols in our software tools.

VirtualAlloc

If you are using VirtualAlloc() to allocate large blocks of memory (for loading data into or for implementing a custom heap) it may be worth trying the MEM_TOP_DOWN_FLAG to force VirtualAlloc() to allocate blocks at the top of the address space. This means the addresses of any VirtualAlloc’d allocations will not be near any allocations made by the C runtime or HeapAlloc() etc. This could prove to be quite useful in many situations for preventing memory fragmentation.

Caveat. Depending on the behaviour of your program using VirtualAlloc() with the MEM_TOP_DOWN flag may not be a good idea – it could cause things to be much slower. Read this informative blog posting before proceeding. Summary: If using VirtualAlloc() with MEM_TOP_DOWN a little bit that’s OK, but using it to make a lot of allocations in a short amount of time, it could be very slow.

Analyse your application’s memory allocation behaviour

To inform your decision for the above mentioned strategies and tactics you could also examine the number of allocations and objects of different sizes to try to identify any commonality in allocation sizes. You could also try to identify the application hotspots – places where the application performs the most of it’s allocations and see if you can then optimise these to use object/memory pools or if you can reuse a memory/object allocation rather than deallocating it and then reallocating it later.

We don’t know of any tools that can do this apart from our memory tool C++ Memory Validator.

  • The objects tab will give you the breakdown on the number of objects of each type allocated.
  • The sizes tab will give you the same information for each memory allocation of a particular size (this data includes object sizes).
  • The hotspots tab, if you set it to display All Allocations will show you a hierarchical allocation tree showing you the hotspot locations for allocations, reallocations and deallocations. This allows you to identify which functions are allocating the most objects and the callstack for that allocation.

Once you know this information you make much more informed decisions about which objects/allocations should have their own private heap space, which ones should be in memory pools and which ones should be left alone.

Is it possible to guarantee zero fragmentation?

The only way to guarantee zero fragmentation is to either write your software in a language (or style) that does not use dynamic memory allocation or to use an appropriate technique to mitigate any fragmentation you may experience. By far the best technique is to destroy each heap when you get an opportunity to do so. This resets fragmentation for the memory controlled by the heap to zero.

What about .Net – can that suffer from fragmentation?

Yes. The .Net Large Object Heap (LOH) can suffer from fragmentation because the LOH is never compacted after a garbage collection.

Also in the regular .Net heap pinned objects cannot be moved. Objects that cannot be moved prevent the heap from being compacted in the most optimal manner. Depending on how your objects are pinned this could cause quite bad fragmentation of the .Net heap.

If you do need to pin objects you may want to think about moving those objects into the native heap and then using the techniques in this article to ensure they all end up in the same place using an object pool etc. This would move the pinned objects out of the .Net heap and allow heap compaction to proceed as normal.

How can I prevent fragmentation in the Large Object Heap for C#?

With the .Net Large Object Heap (LOH) it really depends on what data you’ve got in the LOH as to what you can do to mitigate the memory fragmentation.

You should definitely consider object reuse and object pooling (as mentioned above).

Arrays of doubles

Arrays of double with 1000 objects or more are placed on the Large Object Heap. Try to keep all your double arrays smaller than 1000 items.

Don’t create large objects

Objects 85,000 bytes or larger are placed on the Large Object Heap. Arrays can easily exceed 85,000 bytes so you should be careful about creating arrays with more than 10,000 items. Alternative arrangements that split one large array into several smaller arrays that are managed by a parent object that provides an array style interface would prevent these arrays from entering the Large Object Heap as each individual array would be below the threshold for entering the Large Object Heap.

.Net 4.51 onwards

Starting with .Net 4.51 Preview there is a special option to force the Large Object Heap to compact itself. This is not automatic, but controlled by the software engineer via an API call.
    GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
    GC.Collect();     // This will cause the LOH to be compacted (once).

Conclusion

I hope you now have a better understanding of the cause of memory fragmentation and what you can do to improve any memory fragmentation issues you may be facing. If you can use a linear heap it’s an excellent, high speed solution. If you can’t then look at drop-in replacement heaps or assigning objects to specific custom heaps, object reuse, memory reuse and pooling.

Would you like to know when we post a new article?

Get expert advice on software engineering topics. No spam.

Unusual memory bit patterns

Software development is an intellectual challenge. Sometimes the process is interrupted by software failures and/or crashes. A lot of the time the reason for the failure is self evident and easily fixable. However the reason for some crashes is less obvious and often the only clue is an unusual bit pattern that is typically present each time the crash happens.

This article will describe some of the common bit patterns that have been used in Windows C/C++ software. After describing the bit patterns we’ll explain various simple rules you can follow in your software development that will make encountering these problems much less likely.

All of these bit patterns apply for the Microsoft C/C++ compiler that ships with Visual Studio (6.0 through 2010) and any compatible compilers (such as the Intel compiler). These bit patterns are found in debug builds. Release builds do not use special bit patterns.

These bit patterns evolve and change as the C runtime internals change from managing memory themselves to handing that management over to the Win32 heap family functions HeapAlloc(), etc.

Some of the allocators appear to know if they are running with a debugger present and change their behaviour so that with a debugger present you will see the bit patterns, and without the debugger present you will not see the bit patterns. HeapAlloc(), LocalAlloc(), GlobalAlloc() and CoTaskMemAlloc() both exhibit this behaviour.

0xcccccccc

The 0xcccccccc bit pattern is used to initialise memory in data that is on the stack.

void uninitFunc_Var()
{
	int	r;

	// here, r has the value 0xcccccccc

	…
}

Also, consider this C++ class.

class testClass
{
public:
	testClass();

	DWORD getValue1();

	DWORD getValue2();

private:
	DWORD	value1;
	DWORD	value2;
};

testClass::testClass()
{
	value1 = 0x12345678;

	// whoops!, forgot to initialise “value2”
}

DWORD testClass::getValue1()
{
	return value1;
}

DWORD testClass::getValue2()
{
	return value2;
}

When an object of type testClass is created on the stack, its data member "value2" is not initialised. However the debug C runtime initialised the stack contents to 0xcccccccc when the stack frame was created. Thus if the object is used, its data member "value2" will have a value 0xcccccccc, and "value1" will have a value of 0x12345678.

void uninitFunc_Object()
{
	testClass	tc;

	// here, tc.value1 has the value 0x12345678
	// here, tc.value2 has the value 0xcccccccc	because it was erroneously not initialised in the constructor

	…
}

If you are seeing the 0xcccccccc bit pattern it means that you are reading memory that is on the current thread stack that has not been initialised.

Why is the pattern 0xcccccccc used? This pattern is four bytes of 0xcc. The Intel i386 opcode 0xcc is breakpoint. Filling the stack with breakpoint instructions for uninitialised data is an interesting choice. Perhaps if a stack corruption occurs and data on the stack starts getting executed there is a slightly greater chance you’ll hit a breakpoint? I don’t know if this was the compiler designer’s intention or just that they chose to use a similar value to the dynamic memory uninitialised value of 0xcd (which we’ll describe later in this article).

0xbaadf00d

The 0xbaadf00d bit pattern is the bit pattern for memory allocated with HeapAlloc(), LocalAlloc(LMEM_FIXED), GlobalAlloc(GMEM_FIXED).

If you are seeing the 0xbaadf00d bit pattern it means that you are reading memory that has been allocated by HeapAlloc() (or reallocated by HeapReAlloc()) and which has not been initialised by the caller of HeapAlloc (or HeapReAlloc, LocalAlloc, GlobalAlloc).

0xdeadbeef

The 0xdeadbeef bit pattern is the bit pattern for memory deallocated using HeapFree(), LocalFree(), GlobalFree().

If you are seeing the 0xdeadbeef bit pattern it means that you are reading memory that has been deallocated by HeapFree(), LocalFree() or GlobalFree().

0xabababab

The 0xabababab bit pattern is the bit pattern for the guard block after memory allocated using HeapAlloc(), LocalAlloc(LMEM_FIXED), GlobalAlloc(GMEM_FIXED) or CoTaskMemAlloc().

If you are seeing the 0xabababab bit pattern it means that you are reading memory after a memory block that has been allocated by HeapAlloc(), LocalAlloc(LMEM_FIXED), GlobalAlloc(GMEM_FIXED) or CoTaskMemAlloc().

0xbdbdbdbd

The 0xbdbdbdbd bit pattern is the guard pattern around memory allocations allocated with the "aligned" allocators.

Memory allocated with malloc(), realloc(), new and new [] are provided with a guard block before and after the memory allocation. When this happens with an aligned memory allocator, the bit pattern used in the guard block is 0xbdbdbdbd.

If you are seeing the 0xbdbdbdbd bit pattern it means that you are reading memory before the start of a memory block created by an aligned allocation.

0xfdfdfdfd

The 0xfdfdfdfd bit pattern is the guard pattern around memory allocations allocated with the "non-aligned" (default) allocators.

Memory allocated with malloc(), realloc(), new and new [] are provided with a guard block before and after the memory allocation. When this happens with an non-aligned (default) memory allocator, the bit pattern used in the guard block is 0xfdfdfdfd.

If you are seeing the 0xfdfdfdfd bit pattern it means that you are reading memory either before the start of a memory block or past the end of a memory block. In either case the memory has been allocated by malloc(), realloc() or new.

0xcdcdcdcd

The 0xcdcdcdcd bit pattern indicates that this memory has been initialised by the memory allocator (malloc() or new) but has not been initialised by your software (object constructor or local code).

If you are seeing the 0xcdcdcdcd bit pattern it means that you are reading memory that has been allocated by malloc(), realloc() or new, but which has not been initialised.

0xdddddddd

The 0xdddddddd bit pattern indicates that this memory is part of a deallocated memory allocation (free() or delete).

If you are seeing the 0xdddddddd bit pattern it means that you are reading memory that has been deallocated by free() or delete.

0xfeeefeee

The 0xfeeefeee bit pattern indicates that this memory is part of a deallocated memory allocation (free() or delete).

If you are seeing the 0xfeeefeee bit pattern it means that you are reading memory that has been deallocated by free() or delete.

What can you do to prevent the cause of these problems?

Depending on the nature of the problem, it may be trivial to identify why you have an uninitialised variable or why your pointer is pointing to deallocated memory. The rest of the time you can rely on a few helpful rules and the judicious use of some software tools

Crash

When you have a crash, try to identify the cause of the crash.

  • If you are in a debugger, the debugger may be able to show you variable names, thus you will be able to identify which variable is in error.
  • If you are in a debugger you can view the register window to see if a register has one of these special bit patterns.
  • If you have an exception report, you can view the crash address, the exception address and any registers to see if any of these special bit patterns are present.

Registers

At the start of a C++ method, the "this" pointer is stored in the ECX register (RCX on 64 bit/x64). If the ECX register contains one of these bit patterns you have a good indicator that a dangling pointer to a deleted object or an uninitialised pointer is being used. Note that depending on what the compiler does the ECX register may remain valid during the method or may take on other values and thus be unreliable as to whether it still represents the quot;this" pointer.

When a method or function returns, the return value is in the EAX register (RAX for 64 bit/x64).

Simple data member rules to follow

  • Always initialise all data members in all your object constructors.
  • Always initialise all data that is used that is not data members of class definitions.
  • Always use the correct form of delete, even if you are working with intrinsic types. It is not unknown to change the type of a data member as software evolves. If you always use the correct form of delete, then a switch from (for example) "int" to a class type will not result in objects of the class type failing to be deleted.

Simple C/C++ allocator rules to follow

  • If you allocate using new, deallocate with delete.
  • If you allocate using new [], deallocate with delete [].
  • If you allocate using malloc() or realloc(), deallocate with free().

Simple pointer rules to follow

  • Always set pointers to dynamically allocated data to NULL before using them (unless you are assigning them on first use).
  • Always check pointers are non-NULL before you use them.
  • Always set pointers to NULL after dellocating the memory pointed to by the pointer.
    	delete [] accounts;
    	accounts = NULL;
     
    Do this in object destructors and anywhere else you deallocate memory. The reason you do this in object destructors is because in release code, the memory deallocator will not overwrite the contents of the deleted object. Thus if any erroneous code is still pointing to the deleted object, it will find a NULL pointer to accounts rather than a pointer to a deleted accounts object.
  • In object destructors, even if you are not deleting any objects, but you have pointers to objects, set these to NULL as well. The reason is the same reason as above – defensive programming, minimise the likelihood things can fail.
  • If possible, try to make one location in your software responsible for the management of a particular type of object. For example you may create a manager class that is responsible for creating, managing and destroying particular objects. If you then get a failure with a related object type, you can focus your attentions on the manager class, perhaps adding additional tracing code or analysing with a software tool.

Software libraries in DLLs – rules to follow

If your software is structured so that some class implementations are in certain DLLs and other DLLs rely on those class implementations (either by usage or by derivation/inheritance) then you need to consider the following issues:

  • Size

    If you change the size of a struct, union or class object then you will need to rebuild all other DLLs that use that struct, union or class. In theory the Developer Studio build system should keep everything up to date. But Developer Studio can only work with the information you give it. If you have forgotten to add appropriate header files to a project the project will not consider changes to those header files. Typically we have found the simplest and easier solution is a full rebuild of any affected DLLs after a change in size of a class object. Reasons for struct, union or class to change size:

    • Change of a method from normal to virtual or virtual to normal.
    • Addition or removal of a virtual method.
    • Addition or removal of a data member.
    • Change in type of an object that is a data member of the class (for example from int to double or from weeble to wobble).
    • Change in size of an object that is a data member of the class (for example data member of type testObject changes size).
    • Change of #pragma byte packing for some or all of the class definition.
    • Addition or removal of base classes which the class derives from.
    • Conditional compilation resulting in different data members, same data members but different ordering or different definitions for data members.
    • Macro definitions resulting in different types/sizes for data members that are not obvious. This is an obscure form of the conditional compilation problem.

    A good indicator that you have forgotten to do the above, is looking at the class object in the debugger and noticing that the fields all appear OK in one DLL, but when looked at from a function in a different DLL, the object fields appear to have different data. This is a sure sign that one or more DLLs have not been compiled with the same object definition.

  • Dynamic memory

    Passing data back from DLLs can be fraught with problems.

    The classic problem is A.dll calls B.dll to get some data. The data is dynamically allocated, populated with the result and passed back to the caller. The caller uses the data then deallocates the data. The program crashes. Why? The typical reason is that the deallocator used to deallocate the memory is not the correct deallocator to match the allocator of the memory. “But! But I’m using the C runtime all the time” exclaims the hapless and confused developer. “What have I done wrong?”.

    Typical scenarios for this are as follows:

    #AllocatorDeallocatorResult
    1Dynamically linked CRT (dynB.dll)Dynamically linked CRT (dynA.dll)OK
    2Dynamically linked CRT (dynB.dll)Statically linked CRT (statA.dll)Crash
    3Statically linked CRT (statB.dll)Dynamically linked CRT (dynA.dll)Crash
    4Statically linked CRT (statB.dll)Statically linked CRT (statA.dll)Crash
    5Statically linked CRT (statB.dll)Statically linked CRT (statB.dll)OK

    Lets take each scenario in turn.

    #1. Both dynA.dll and dynB.dll are linked to the dynamic C Runtime. This means they are both using the same allocator and deallocator functions (malloc/free, new/delete) in the same C runtime DLL. That dll is used by both dynA.dll and dynB.dll. Because the same runtime is used by both dynA.dll and dynB.dll the correct deallocator is used to deallocate the memory allocated by dynB.dll. This scenario works correctly. dynA_dll_dynB_dll

    #2. dynB.dll is linked to the dynamic C Runtime. It’s caller statA.dll is linked to the static C Runtime. The allocated memory comes from the dynamic CRT. The deallocation function is in the static CRT which uses a different heap to the dynamic CRT. The memory deallocation call fails. Most likely with a crash. statA_dll_dynB_dll

    #3. statB.dll is linked to the static C Runtime. It’s caller dynA.dll is linked to the dynamic C Runtime. The allocated memory comes from the static CRT. The deallocation function is in the dynamic CRT which uses a different heap to the dynamic CRT. The memory deallocation call fails. Most likely with a crash. dynA_dll_statB_dll

    #4. statB.dll is linked to the static C Runtime. It’s caller statA.dll is linked to the static C Runtime. The allocated memory comes from a static CRT linked to statB.dll. The deallocation function is in the static CRT linked to statA.dll which uses a different heap to the other static CRT. The memory deallocation call fails. Most likely with a crash. statA_dll_statB_dll_crash

    #5. statB.dll is linked to the static C Runtime. statA.dll can be linked to the static C Runtime or the dynamic C Runtime. Memory is allocated in statB.dll, passed back to the caller, used and then passed the memory back to a function in statB.dll that calls the deallocator. That is, statB.dll has to provide an explicit cleanup function to deallocate the memory it passed back to the caller. All allocated memory is deallocated in the same DLL that it was allocated in. It doesn’t matter how the calling DLL is linked to the C Runtime. This scenario works correctly. statA_dll_statB_dll_OK

  • ANSI / Unicode

    You need to ensure that APIs between DLLs use the the string width (char/byte for ANSI, two byte (wchar_t) for wide character) that each DLL expects. If you have C++ functions you’re golden as the name mangling will ensure the correct values are passed. But if you are exposing C style APIs and calling them via GetProcAddress you had better make sure your function prototypes match the DLL otherwise you could end up passing good data to the wrong function (which will then perceive that data as bad data).

I’ve tried all the above, what other options are there?

If all else fails, then you will probably need to turn to a software tool to analyse your memory usage and identify the cause of the deleted memory that the pointer is pointing to (has the memory been deleted too early? Or is the pointer being used in error after the memory was deleted?).

In that case you may want to consider a software tool like C++ Memory Validator.

Would you like to know when we post a new article?

Get expert advice on software engineering topics. No spam.

Why Process Injection Fails

If you are reading this web page it is most likely because you have just tried to inject into a running process and the injection failed. You’ve probably just viewed an information dialog similar to the one shown below.

ProcesInjectFailed

For the purposes of this article we’ll talk about C++ Memory Validator, but the points all apply to any of our software tools that support process injection (C++ Bug Validator, C++ Coverage Validator,┬áC++ Memory Validator, C++ Performance Validator, C++ Thread Validator)

When can injection fail?

There are three different places in our tools where injection can happen.

  1. Injecting into a running process.
  2. Waiting for a process to start and then attaching to it automatically when it starts.
  3. Launching a process when the launch method is set to any value other than CreateProcess. CreateProcess is the recommended method of launching a process. The other methods preceeded CreateProcess and all use process injection coupled with varying delays or varying process security settings to acheive their aim. Because launching with CreateProcess is so reliable (close to 100%) you are unlikely to ever use any of these other methods (which are less reliable, less than 95%).

Causes of injection failure

  • A missing DLL in your application.
    This can only be a problem if you are launching an application (item 3 above). The application will fail to launch properly if all DLL dependencies are not met. The application will start then very rapidly shutdown due to a missing DLL dependency. When this happens it is not possible to inject into the application as it doesn’t run for long enough.
  • A missing DLL in the software tool (e.g. C++ Memory Validator) you are using to inspect your application.
    This should never happen. This bug will only happen if a mistake has been made at Software Verification when creating the software installer. We list this here for completeness.
  • Multiple injection attempts.
    If you have already successfully injected into this running process you can’t inject into it again because the injected DLL is already loaded.

    The solution to this problem is to start a new process and inject into that process.

  • The application started and finished before the DLL could be injected.
    If your process only runs for a short amount of time the process may finish executing before process injection can complete. An example would a command line tool that loads, processes data then closes.
  • The application security settings do not allow process handles to be opened.
    If your application is running at a privilege level that means that C++ Memory Validator cannot open the appropriate process handle to perform the injection it will be impossible to make the appropriate actions to perform an DLL injection into your application. This is quite common when working with services, but can apply to any process that is running with particular privileges. This is one of the reasons that C++ Memory Validator runs with admin rights and require User Access Control confirmation upon startup – we do this to get a good baseline set of privileges to work with.
  • The application is a service.
    Services run in a different environment to regular applications. Services are controlled differently. Typically they run on different accounts to regular applications and with different security privileges and different access rights to parts of the system (for example, no disk access except for a particular folder, no shared memory access, etc). As such it is often impossible to gain access to the service to perform a process injection.
  • The service and C++ Memory Validator should both run on the same user account.
    If you are working with a service, then ensuring that both the service and C++ Memory Validator run on the same account can sometimes resolve some problems.
  • Injecting into some processes just does not work.
    Even when all of the above issues are resolved or do not apply some applications just do not want to be injected into. This applies for 32 bit processes injecting into 32 bit processes and 64 bit processes injecting into 64 bit processes.

    That sounds like a bogus claim with no data to back it up.

    When we first started testing our process injection software in 1999 we built a test rig that would test every application on the test computer. A test would be start the process, inject into it, send the process a WM_QUIT and wait for it to quit. We’d monitor if the injection was successful for each test. A test run would take hours and would test several thousand applications. We started testing on Windows NT 4. Then when Windows 2000 was released we tested that. We did the same tests for Windows XP, etc.

    What we found was that the failure rate for injecting on Windows NT was 2%. On Windows 2000 it was 3%. On Windows XP it was 5%. Each new variant of windows was more complex, it typically ran on more complex and more modern hardware. These variables, changes in multi-tasking scheduling and changes in operating system behaviour and timing combined with the hardware seemed to make process injection less reliable with each new version of Windows. But couldn’t the problem be that we’re injecting a really complex DLL (for example, the C++ Memory Validator DLL)? Surely the cause of the failure is the complexity of what the DLL is doing?

    That’s a really good question and that is why we did some tests with an empty DLL. An empty DLL for our experiments was a DLL that consists of nothing more than a DllMain(). See below.

    #include "stdafx.h"
    
    BOOL DllMain(HANDLE hModule,
                 DWORD  dwReason,
                 LPVOID lpReserved)
    {
        return TRUE;
    }
    

    When we tested with a DLL consisting of nothing more than the code above compiled into a DLL we found that even this trivial DLL could not be injected into application that we had previously had failures with.

    Clearly the DLL injection problem we were encountering was real and nothing to do with the code (or DLL dependencies) of the DLL we were trying to inject.

Solutions to being unable to inject into a process.

Given that we know from our testing (detailed above) that if injection doesn’t work for a particular application, it typically won’t work for that application, we strongly advise you to stop spending any more time trying to make process injection work for the particular application that is giving you trouble. We’ve spent years looking at this. It really is not worth your time and trouble.

So what can you do?

  • Launch
    Based on feedback we’ve found that some people were using process injection because they had not realised the power of the command line options for running our software tools from the command line. They were launching the process they wanted to instrument, then our software tool and doing the process injection because their application had already started. In some cases they had launched their test application from a unit testing application.

    If this is your scenario it is worth examining the command line argument for the software tool to see if you can achieve what you need to achieve without process injection.

  • NT Service API
    If you are working with services we recommend using the NT Services API which is described in detail in the Help file for the software tool. The NT Services API also comes with example usage, and example client and example service so that you can see how easy it is to integrate the NT Service API into your service. Using these APIs makes working with services really easy.
  • Linkable API
    Some of our software tools have an additional API known as the linkable API. This API is different to the NT Services API. You may find using this API helpful if you don’t wish to use the NT Services API.

Help files can be found from the Help menu on the software tool and also in our Help file download area..

Problems with Process Injection.

Even when process injection succeeds and our profiling DLL is loaded into the target application there can be problems.

The principle problem is with multi-threaded applications that have already have more than one thread by the time the process injection completes. Because process injection is not supported by the operating system and is achieved through the use of some clever memory mapping and thread execution you cannot guarantee which synchronization primitives are locked at the time of the injection. For many applications this is not an issue and injection succeeds but for some applications locks can be held and these conflict with the locks required to be held to do the job of profiling the DLL during the injection process. When this happens the target process deadlocks. This cannot be forseen. This behaviour is application specific.

If process injection succeeds but then your process deadlocks this means you cannot reliably use process injection for this application. As such, even though we’ve provided the ability to do process injection we recommend launching your process with our software tool or using the NT Services API to ensure your process is instrumented.

Recommended course of action

In summary, if you ever have the opportunity to launch your process with C++ Memory Validator (using the CreateProcess option) you should use that.

If you can’t launch your process you can try process injection (or wait for process). If those do not work for the reasons stated in this article we recommend that you use the appropriate NT Service API for that tool you are using (see the details in the Help file for that tool).

Would you like to know when we post a new article?

Get expert advice on software engineering topics. No spam.

Panorama Theme by Themocracy