Bitshifting by example

Mike Solomon

If you are a programmer with a history similar to my own then you have seen enough to know vaguely that the << and >> operators in many languages shift bits (usually in integers) around and that there are some techniques to isolate certain blocks of bits out for reading, but you haven’t actually tried these techniques.

Let’s fix that by example.

Bitshifting

Bitshifting is actually pretty simple. Take the 32 bit unsigned integer 232 - 1, which is represented by all ones. I have separated it into octets (8-bit bytes).

                 0xffffffff
<-- most significant ... least significant -->
     11111111 11111111 11111111 11111111
      byte 1   byte 2   byte 3   byte 4

Let’s represent this in some C code:

uint32_t field = 0xffffffff;

It is convenient to represent this value using hex notation, since there is a 1:1 mapping between hex and the resultant bits. This is the same value from above, just written in hexadecimal notation instead of binary.

Here is a quick table to convert hex to binary, in case you are unfamiliar or out of practice:

0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7
0000 0001 0010 0011 0100 0101 0110 0111

0x8 0x9 0xa 0xb 0xc 0xd 0xe 0xf
1000 1001 1010 1011 1100 1101 1110 1111

Now let’s bitshift this left 4 bits:

field = field << 4; // equivalently, field <<= 4

which results in:

             0xfffffff0
11111111 11111111 11111111 11110000

If each 1 were a buffalo and each 0 a plot of empty land, then we just forced 4 buffalo off of the left hand cliff in order to make more empty land on the right hand side.

The rule is simple: when you bitshift left, shift all the bits to the left and pad the right with zeros. For some reason this makes me think of hunting by buffalo jump.

Bitshift right

Bitshifting right is much like bitshifting left–in fact, it is identical unless you are using signed integers, in which case things get implementation-specific in some languages (basically, sometimes you get padded with the sign-bit instead of always 0).

Today, let’s just worry about unsigned integers.

Continuing with our result from before, a small bitshift right

field = field >> 1;

results in

             0x7ffffff8
01111111 11111111 11111111 11111000

Notice that since field is unsigned we pad with a 0 on the left, and that everything else has been shifted right by one location.

UUIDs

Let’s look at a time when all this bitshifting might come in handy.

Universally Unique IDentifiers are 128-bit identifiers that are often displayed in 36-character strings rendered thusly:

63afa317-ef9e-4127-93a8-a8996d99d5ee

But what do UUIDs have to do with bitshifting?

As it happens, there is an RFC to standardize UUIDs and set certain bits in order to identify a given 128 bit object as a UUID and to specify which variant it is.

The UUID specification breaks apart integers into fixed-width fields within each integer and assigns different meanings to each field.

The RFC gives us this diagram showing each field and is split into 4 vertically-stacked 32-bit ints:

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          time_low                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       time_mid                |         time_hi_and_version   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         node (2-5)                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

It’s time to use our new skills. Let’s try reading time_hi_and_version since it’s one of the more complicated fields.

Simple field-reading by double bitshifting

A simple way to read such a field is to first “shift off” all of the bits we don’t care about that are to the left of the field of interest, and then to shift that same field as far right as we can go without shifting that field “off the edge.” This technique leaves the field we care about in the least significant bits, where it can be easily interpreted as an integer.

Let’s take the second 32-bit integer, the one that includes time_mid and time_hi_and_version.

              0x11223344
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0001000100100010 0011001101000100   |
|     time_mid    |time_hi_and_version|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

We can extract the version field out of this. The RFC describes this field:

The version number is in the most significant 4 bits of the time stamp (bits 4 through 7 of the time_hi_and_version field).

Counting from most to least significant (left to right) we can see that those are bits 20 through 23. We should first shift out the 19 most significant bits we don’t care about:

field <<= 19;

leaving us with

              0x9a200000
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1001101000100000 0000000000000000   |
|     time_mid    |time_hi_and_version|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Now we want to remove everything but the 4 bits we care about:

field >>= 28;

yielding

              0x00000009
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0000000000000000 0000000000001001   |
|     time_mid    |time_hi_and_version|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Looking this up in the table given in the RFC, we see that this is not a valid version for a UUID! The first bit must be zero, but we can clearly see that it is not.

Summary

It turns out that bitshifting isn’t as hard as it seems. This simple technique won’t cover every case (in particular, bitmasking may be useful in related situations) but it is a tool that is occasionally useful to have.