The question is very simple – what is the real memory consumption of an std::string object? In C++, comparing to C, we should avoid using char array since they are bug prone. Instead, very convenient std::string class has been provided. That char array wrapper, besides being very convenient, is also optimized for some of string manipulation operations. As a result, operations such as concatenation are often much faster than using the C-style strcat function.

But, is it really so good, that we can completely forget about using char* to store strings? Are there any drawbacks?

There are.

What can be a disadvantage of using std::string instead of pure char arrays? It may be speed or memory consumption. Without any proves I will tell you, that the speed is not a problem. In this short article let’s focus on memory requirements.

How to check memory overhead? We could just peek into std::string class definition, calculate additional fields and look carefully for new allocations on the heap. However, this is just a theory, not practice. Why? It is explained below. So, let’s check the memory consumption from the opposite side.

Following experiments were performed on GNU ISO C++, G++ 4.6.2 -O2 optimizations, 32 bit binary, Linux Fedora 16 x86.

 

Memory requirements for char*

How much memory does a “jovislab” string occupies on the heap? I will give you a hint, that it is not 8. Ah, there is also null at the end so 9, right? Yes, but the memory consumption is in most cases still higher than that 9 bytes. Why? Because of memory alignment. On 32bit systems it is in most cases 32 bits. So, our string will occupy at least 12 bytes. Still at least, because system has to store additional information, such as size of the allocated memory – it will be at least additional 4 bytes on 32 bit platform (and 8 on 64 bit).

Let’s make a short experiment – allocate 1 000 000 of “jovislab” strings on the heap.

Memory before allocating 3292 KB
Memory after allocating:  19000 KB

Memory used for strings:  15708 KB
Memory used per string: ~16 bytes.

We can calculate it also different way.

However, take into account that this will only work, when second strdup will allocate memory just after previous string. In theory it is not guaranteed, therefore this experiment should be executed in the loop, and the lowest size should be chosen. The probability of good estimation goes to 1 with number of loop iterations. However, in practice, when there is enough available memory, we should get good value even after the first try. The value estimated using the second method is 16 bytes.

So, what is the overhead for our string? If we count null as a part of the string, then in this case it is 7 bytes. 9 bytes of string + size (4 bytes on 32bit platform) + alignment = 16 bytes.

To recap, the overhead in our case is 7 bytes. It is around 78% of additional data.

Memory requirements for std::string

Memory before allocating 3292 KB
Memory after allocating:  50284 KB

Memory used for strings:  15708 KB
Memory used per string: ~48 bytes.

As you can easily calculate. Overhead in case of std::string is 39 bytes. It is more than 400% of overhead. Horrible, isn’t it?

dyerware.com


In general, for tiny strings (couple of  bytes) the overhead is enormous. The memory consumption may be as high as 43 times more than actual data! However, it is not so bad for big strings. Starting from around 34-bytes length strings the overhead is less than additional 100%. For bigger strings it is better and better, since the overhead is constant.

One more thing to be aware. On 64bit systems, the overhead will be bigger, even twice bigger (but I haven’t tested it yet).

So, to recap. Is it worth using std::string instead of char arrays? In most cases you should. If you do not have to worry about memory consumption, or you do not have millions of short strings, then for sure you should use std::string class. Also, if you want to perform string manipulation, it is a good choice. However, just for storing data, especially short *and* when memory is an issue, then you should consider using raw char arrays or some other container with less overhead, such as boost::array.

Share and Enjoy

Share →

10 Responses to Memory overhead of an std::string

  1. mooingDuck says:
    What was the std::string implementation, compiler and settings? How are you measuring memory? (By your numbers, I'd guess you found an accurate way, but you don't say). Also, comparing std::string to char* based on memory alone isn't completely fair, as std::string is optimized for different things. I would also appreciate if you reworded your opening paragraph. Right now it sounds like std::string is like a char*, but with massive memory overhead. I think you should also mention that std::string is far faster for general usage, unless you add an equal amount of memory overhead to the char* code.
    • asanoki says:
      Hi mooingDuck. Thanks for your comment. The compiler I have used is g++ 4.6.2. It took me some time to figure out how to measure the memory. I just used Linux 'top' command and read Virtual memory column. As for the opening, I think that std::string is not faster than char-array with C functions such as: strcat, strdup, etc. The one difference may be, that the length is stored as a field. But, that's true that std::string is just much more convenient. Working with char-arrays in C++ is a pain. You are right, that the advantages of std::string are worth mentioning. I will fix it.
      • mooingDuck says:
        I'm rather surprised that G++ would have as much overhead as you measured. I'll have to look into their implementation and compare it to Microsoft's. As I understand, Microsoft's string object itself is larger but the heap usage is less, so it's a tradeoff.
      • mooingDuck says:
        Using my code at http://ideone.com/ynJhF and using Windows Task Manager to measure virtual memory usage (not very accurate, but enough for this code and test size), I got these measurements with VC++10 and G++ 4.5.4, both allocating 1000000 strings totaling a length of 7,502,096 chars. VC++ allocated 38,712kb for both char* and unique_ptr, and allocated 27,428kb for string. (That's a 41% savings). G++ allocated 23,052kb for both char* and unique_ptr, and allocated 33,832kb for string (That's 47% extra space). That's why saying which compiler and version is important.
  2. DeadMG says:
    In addition, you talk like std::string is the only alternative to char*. But it's not. You could roll your own if you were desperate.
    • asanoki says:
      Any suggestions? Good string implementations which has smaller overhead? I have recently implemented a limited custom string class, which has overhead similar to *char at the expense of being immutable. I haven't found any good and stable class which could be an alternative to char*, though. Thanks for your comment!
  3. mooingDuck says:
    Actually, std::string should be MUCH faster than a char*, _especially_ for strcat, especially if the messages being concatenated are shorter than the message.
    • asanoki says:
      You mean, that they have bigger capacity than size? I mean, allocates more memory in advance in case of concatenating? I will check it.
  4. asanoki says:
    I have updated the article. Take a look at the introduction and summary. Thanks.