Fast unsigned integer to hex string

Once I started down this road with Fast unsigned integer to string, the need for symmetry forced me to write an algorithm to convert unsigned integers to hex strings.

Here we go.

Concept #1

Use bitmasks and bit-shifts to isolate individual nibbles of the 32-bit number. Code can do this in O(lg N) steps.

Concept #2

Wrinkle for conversion to hexadecimal characters is that the code has to map a hexadecimal nibble (4 bits) into 2 different ranges (0x30-0x39, ‘0’-‘9’) and (0x41-0x46, ‘A”-‘F’ or the lower hex digit equivalent, 0x61-0x66, ‘a’-‘f’)

Code can add 6 to each isolated nibble per byte. If the nibble addition overflows to the higher nibble, then that nibble is in the alphanumeric hexadecimal character range. Shift the overflow carry bit into the low bit of the byte and we’ve created a mask for all 8 bytes in a 64-bit register.

uint32_t UlongToHexString(uint64_t num, char *s, bool lowerAlpha)
	uint64_t x = num;

	// use bitwise-ANDs and bit-shifts to isolate
	// each nibble into its own byte
	// also need to position relevant nibble/byte into
	// proper location for little-endian copy
	x = ((x & 0xFFFF) << 32) | ((x & 0xFFFF0000) >> 16);
	x = ((x & 0x0000FF000000FF00) >> 8) | (x & 0x000000FF000000FF) << 16;
	x = ((x & 0x00F000F000F000F0) >> 4) | (x & 0x000F000F000F000F) << 8;

	// Now isolated hex digit in each byte
	// Ex: 0x1234FACE => 0x0E0C0A0F04030201

	// Create bitmask of bytes containing alpha hex digits
	// - add 6 to each digit
	// - if the digit is a high alpha hex digit, then the addition
	//   will overflow to the high nibble of the byte
	// - shift the high nibble down to the low nibble and mask
	//   to create the relevant bitmask
	// Using above example:
	// 0x0E0C0A0F04030201 + 0x0606060606060606 = 0x141210150a090807
	// >> 4 == 0x0141210150a09080 & 0x0101010101010101
	// == 0x0101010100000000
	uint64_t mask = ((x + 0x0606060606060606) >> 4) & 0x0101010101010101;

	// convert to ASCII numeral characters
	x |= 0x3030303030303030;

	// if there are high hexadecimal characters, need to adjust
	// for uppercase alpha hex digits, need to add 0x07
	//   to move 0x3A-0x3F to 0x41-0x46 (A-F)
	// for lowercase alpha hex digits, need to add 0x27
	//   to move 0x3A-0x3F to 0x61-0x66 (a-f)
	// it's actually more expensive to test if mask non-null
	//   and then run the following stmt
	x += ((lowerAlpha) ? 0x27 : 0x07) * mask;

	//copy string to output buffer
	*(uint64_t *)s = x;

	return 0;

I’ll use a modified benchmark harness from Fast unsigned integer to string to test for hexadecimal digit groups.

But we’ll need other alternative solutions.

Easiest one is to lean on the standard library:

uint32_t stdlibHexString(uint64_t num, char *s, bool lowerAlpha)
	sprintf_s(s, 9, (lowerAlpha) ? "%8.8x" : "8.8X", (uint32_t)num);
	return 0;

Next is a simple naive solution:

uint32_t naiveHexString(uint64_t num, char *s, bool lowerAlpha)
	uint32_t x = (uint32_t)num;
	int i = 7;
	while (i >= 0)
		int digit = (x & 0x0F);
		uint8_t ch = digit + '0';
		if (ch > '9')
			ch += (lowerAlpha) ? 0x27 : 7;

		s[i] = ch;
		x >>= 4;
		i -= 1;
	return 0;

Finally, I think a lookup-table-based solution would work.

It turns out that checking each char to see if it’s an alphabetic hexadecimal character is expensive timewise, so using a separate lookup table for lowercase hex digits is faster.

uint32_t lutHexString(uint64_t num, char *s, bool lowerAlpha)
	static const char digits[513] =
	static const char digitsLowerAlpha[513] =
	uint32_t x = (uint32_t)num;
	int i = 3;
	char *lut = (char *)((lowerAlpha) ? digitsLowerAlpha : digits);
	while (i >= 0)
		int pos = (x & 0xFF) * 2;
		char ch = lut[pos];
		s[i * 2] = ch;

		ch = lut[pos + 1];
		s[i * 2 + 1] = ch;

		x >>= 8;
		i -= 1;

	return 0;

Get ready, set, GO!

Here are the results:

Solution Performance

The standard library solution was the slowest, but it’s a lot more flexible.

One can tell that the naive solution is O(N) from the nice slope of the execution time.

My solution has a nice mostly-flat O(lg N) execution slope. It’s about 8% slower than the fastest solution, but it doesn’t require the 1KB of lookup tables.

The lookup table solution turned out to be the fastest - space vs speed tradeoff in play.

Written on December 9, 2016