If you are a programmer with a history similar to my own
then you have seen enough to know vaguely that the
>> operators in many languages shift bits
(usually in integers)
around and that there are some techniques to isolate certain blocks of bits out for reading,
but you haven’t actually tried these techniques.
Let’s fix that by example.
Bitshifting is actually pretty simple. Take the 32 bit unsigned integer 232 - 1, which is represented by all ones. I have separated it into octets (8-bit bytes).
Let’s represent this in some C code:
It is convenient to represent this value using hex notation, since there is a 1:1 mapping between hex and the resultant bits. This is the same value from above, just written in hexadecimal notation instead of binary.
Here is a quick table to convert hex to binary, in case you are unfamiliar or out of practice:
Now let’s bitshift this left 4 bits:
which results in:
1 were a buffalo and each
0 a plot of empty land,
then we just forced 4 buffalo off of the left hand cliff
in order to make more empty land on the right hand side.
The rule is simple: when you bitshift left, shift all the bits to the left and pad the right with zeros. For some reason this makes me think of hunting by buffalo jump.
Bitshifting right is much like bitshifting left–in fact, it is identical unless you are using signed integers, in which case things get implementation-specific in some languages (basically, sometimes you get padded with the sign-bit instead of always 0).
Today, let’s just worry about unsigned integers.
Continuing with our result from before, a small bitshift right
Notice that since
field is unsigned we pad with a
0 on the left,
and that everything else has been shifted right by one location.
Let’s look at a time when all this bitshifting might come in handy.
Universally Unique IDentifiers are 128-bit identifiers that are often displayed in 36-character strings rendered thusly:
But what do UUIDs have to do with bitshifting?
As it happens, there is an RFC to standardize UUIDs and set certain bits in order to identify a given 128 bit object as a UUID and to specify which variant it is.
The UUID specification breaks apart integers into fixed-width fields within each integer and assigns different meanings to each field.
The RFC gives us this diagram showing each field and is split into 4 vertically-stacked 32-bit ints:
It’s time to use our new skills. Let’s try reading time_hi_and_version since it’s one of the more complicated fields.
Simple field-reading by double bitshifting
A simple way to read such a field is to first “shift off” all of the bits we don’t care about that are to the left of the field of interest, and then to shift that same field as far right as we can go without shifting that field “off the edge.” This technique leaves the field we care about in the least significant bits, where it can be easily interpreted as an integer.
Let’s take the second 32-bit integer, the one that includes time_mid and time_hi_and_version.
We can extract the version field out of this. The RFC describes this field:
The version number is in the most significant 4 bits of the time stamp (bits 4 through 7 of the time_hi_and_version field).
Counting from most to least significant (left to right) we can see that those are bits 20 through 23. We should first shift out the 19 most significant bits we don’t care about:
leaving us with
Now we want to remove everything but the 4 bits we care about:
Looking this up in the table given in the RFC, we see that this is not a valid version for a UUID! The first bit must be zero, but we can clearly see that it is not.
It turns out that bitshifting isn’t as hard as it seems. This simple technique won’t cover every case (in particular, bitmasking may be useful in related situations) but it is a tool that is occasionally useful to have.