レナート   Wunschkonzert, Ponyhof und Abenteuerspielplatz   ﻟﻴﻨﺎﺭﺕ

Fri, 25 Jul 2008

String Pools

In part 2.4.3 of Ulrich Drepper's excellent How To Write Shared Libraries (which unfortunately is a bit out-of-date these days) Ulrich suggests replacing arrays of constant strings by a single concatenated string plus an index lookup table, to avoid unnecessary relocations during startup of ELF programs. Maintaining this string pool is however troublesome, it is hard to read and difficult to edit. In appendix B Ulrich lists an example C excerpt which contains some code for simplifying the maintaining of such strings pools, after an idea from Bruno Haible. In my opinion however that suggestion is not that much simpler, and requires splitting off the actual strings into a seperate source file. Ugly!

Some Free Software uses string pools to speed up relocation, e.g. GTK+. Some development tools like gperf contain support for string pools.

All solutions for string pool maintaining I could find on the Internet were not exactly beautiful. Either they were completely manual, manual plus a validity checking tool, or very very cumbersome. Googling around I was unable to find a satisfactory tool for this purpose[1].

After Diego Petteno complained about my heavy use of arrays of constant strings in libatasmart I sat down to change the situation, and wrote strpool.c, a simple parser for a very, very minimal subset of C, written in plain ANSI C. It looks for two special comment markers /* %STRINGPOOLSTART% */ and /* %STRINGPOOLSTOP% */, moves all immediate strings between those markers into a common string pool and rewrites the input with the strings replaced by indexes. Code accessing those strings must use the special _P() macro. With these minimal changes to a source file, passing it through strpool.c will automatically rewrite it to a string-poolized version. The nice thing about this is that the necessary changes in the source are minimal, and the code stays compilable with and without passing it through the strpool.c preprocessor.

Here's an example. First the original non-string-poolized version:

static const char* const table[] = {
	"waldo",
	"uxknurz",
	"foobar",
	"fubar"
};

static int main(int argc, char* argv[]) {
	printf("%s\n", table[2]);
	return 1;
}

For later use with strpool.c we change this like this:

#ifndef STRPOOL
#define _P(x) x
#endif

/* %STRINGPOOLSTART% */
static const char* const table[] = {
	"waldo",
	"uxknurz",
	"foobar",
	"fubar"
};
/* %STRINGPOOLSTOP% */

static int main(int argc, char* argv[]) {
	printf("%s\n", _P(table[2]));
	return 1;
}

When passed through strpool.c this will be rewritten as:

/* Saved 3 relocations, saved 0 strings (0 b) due to suffix compression. */
static const char _strpool_[] =
	"waldo\0"
	"uxknurz\0"
	"foobar\0"
	"fubar\0";
#ifndef STRPOOL
#define STRPOOL
#endif
#ifndef _P
#define _P(x) (_strpool_ + ((x) - (const char*) 1))
#endif

#ifndef STRPOOL
#define _P(x) x
#endif

/* %STRINGPOOLSTART% */
static const char* const table[] = {
	((const char*) 1),
	((const char*) 7),
	((const char*) 15),
	((const char*) 22)
};
/* %STRINGPOOLSTOP% */

static int main(int argc, char* argv[]) {
	printf("%s\n", _P(table[2]));
	return 1;
}

All three versions can be compiled directly with gcc. However, the version that was passed through strpool.c compresses the number of relocations for the table array from 4 to 1. Which isn't much of a difference, but the larger your tables are the more relevant the difference in the number of necessary relocations gets.

A more realistic example is atasmart.c which after being preprocessed with strpool.c looks like this. In this specific example the number of necessary startup relocations goes down from > 100 to 9.

I am note sure if the parser is 100% correct, but it works fine with all sources I tried. It even does suffix compression like gcc does for normal strings too.

Footnotes

[1] Or maybe I just suck in googling? Anyone has a suggestion for such a tool?

posted at: 23:32 | path: /projects | permanent link to this entry | 9 comments


Posted by Damjan at Sat Jul 26 01:31:32 2008
Isn't this something gcc could do automatically?

Posted by Kristian Høgsberg at Sat Jul 26 02:06:02 2008
We have a similar one-off hack in cairo-type1-subset.c.  It's an embedded perl script, not exactly pretty but in this case it's a string table that never changes, so eh...

Posted by Anonymous at Sat Jul 26 04:36:44 2008
Why subtract 1 from the string offsets?  Why not just pre-subtract the 1?

Posted by desrt at Sat Jul 26 06:55:29 2008
two comments:

1: you should just make the offset table out of size_t.  i don't understand why you bothered with (char *).  in the event that the total size of the string pool is small (i imagine < 65536 bytes is very common) then you can use guint16 or such... your tool could easily detect this.

2: make the string pool like:

const char pool[] = "all\0strings\0here\0";

and then you will have zero relocations.

Posted by Lennart at Sat Jul 26 13:37:05 2008
desrt: 1. Uh? Because I'd then have to rewrite the type of the table. Which is very difficult if the type is a struct and not just a simple (const char*)

2. Uh? This is exactly what happens! Check the second line of the thrid source excerpt!

Posted by Lennart at Sat Jul 26 13:45:31 2008
Anonymous: I added the extra 1 for making sure a string doesn't accidentaly end up being detected as NULL, although it actually should just be string that points to index 0.

Posted by desrt at Sat Jul 26 17:37:07 2008
Lennart: i know this is what happens... and because if it, there should be zero allocations.

but you claim that there is one.  where is it?

Posted by Helge at Tue Jul 29 10:25:04 2008
Uhm, I'm not sure this is really worth it... I just checked, creating a shared library containing 1000 string constants as well as a pointer array results in 1000 run-time relocations (as would be expected) -- HOWEVER... once I "prelink" I end up with exactly 0 run-time relocations

cheers

Posted by lu_zero at Tue Aug 12 12:26:31 2008
I think such stuff could be managed by the compiler...

Leave a Comment:

Your Name:


Your E-mail (optional):


Comment:


As a protection against comment spam, please type the following number into the field on the right:
Secret Number Image

Please note that this is neither a support forum nor a bug tracker! Support questions or bug reports posted here will be ignored and not responded to!


It should be obvious but in case it isn't: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.

Please note that I take the liberty to delete any comments posted here that I deem inappropriate, off-topic, or insulting. And I excercise this liberty quite agressively. So yes, if you comment here, I might censor you. If you don't want to be censored your are welcome to comment on your own blog instead.


Lennart Poettering <mzoybt (at) 0pointer (dot) net>
Syndicated on Planet GNOME, Planet Fedora, planet.freedesktop.org, Planet Debian Upstream. feed RSS 0.91, RSS 2.0
Archives: 2005, 2006, 2007, 2008, 2009, 2010, 2011

Valid XHTML 1.0 Strict!   Valid CSS!