Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Inclusion-Exclusion problem: How many bit strings of length 15 have bits 1, 2, a

ID: 1942447 • Letter: I

Question

Inclusion-Exclusion problem: How many bit strings of length 15 have bits 1, 2, and 3 equal to 101, or have bits 12, 13, 14, and 15 equal to 1001 or have bits 3, 4, 5, and 6 equal to 1010?
Hint: The fact that the third bit appears in two of the required patters means some special care will be needed to get the count correct.

Explanation / Answer

A common bit operation is inserting a bit string into an operand or extracting a bit string from an operand. Previous chapters in this text have provided simple examples of packing and unpacking such data, now it is time to formally describe how to do this. For the purposes of the current discussion, we will assume that we're dealing with bit strings; that is, a contiguous sequence of bits. A little later in this chapter we'll take a look at how to extract and insert bit sets in an operand. Another simplification we'll make is that the bit string completely fits within a byte, word, or double word operand. Large bit strings that cross object boundaries require additional processing; a discussion of bit strings that cross double word boundaries appears later in this section. A bit string has two attributes that we must consider when packing and unpacking that bit string: a starting bit position and a length. The starting bit position is the bit number of the L.O. bit of the string in the larger operand. The length, of course, is the number of bits in the operand. To insert (pack) data into a destination operand we will assume that we start with a bit string of the appropriate length that is right-justified (i.e., starts in bit position zero) in an operand and is zero extended to eight, sixteen, or thirty-two bits. The task is to insert this data at the appropriate starting position in some other operand that is eight, sixteen, or thirty-bits wide. There is no guarantee that the destination bit positions contain any particular value. The first two steps (which can occur in any order) is to clear out the corresponding bits in the destination operand and shift (a copy of) the bit string so that the L.O. bit begins at the appropriate bit position. After completing these two steps, the third step is to OR the shifted result with the destination operand. This inserts the bit string into the destination operand. Figure 5.3 Inserting a Bit String Into a Destination Operand It only takes three instructions to insert a bit string of known length into a destination operand. The following three instructions demonstrate how to handle the insertion operation in Figure 5.3; These instructions assume that the source operand is in BX and the destination operand is AX: shl( 5, bx ); and( %111111000011111, ax ); or( bx, ax ); If the length and the starting position aren't known when you're writing the program (that is, you have to calculate them at run time), then bit string insertion is a little more difficult. However, with the use of a lookup table it's still an easy operation to accomplish. Let's assume that we have two eight-bit values: a starting bit position for the field we're inserting and a non-zero eight-bit length value. Also assume that the source operand is in EBX and the destination operand is EAX. The code to insert one operand into another could take the following form: readonly // The index into the following table specifies the length of the bit string // at each position: MaskByLen: dword[ 32 ] := [ 0, $1, $3, $7, $f, $1f, $3f, $7f, $ff, $1ff, $3ff, $7ff, $fff, $1fff, $3fff, $7fff, $ffff, $1_ffff, $3_ffff, $7_ffff, $f_ffff, $1f_ffff, $3f_ffff, $7f_ffff, $ff_ffff, $1ff_ffff, $3ff_ffff, $7ff_ffff, $fff_ffff, $1fff_ffff, $3fff_ffff, $7fff_ffff, $ffff_ffff ]; . . . movzx( Length, edx ); mov( MaskByLen[ edx*4 ], edx ); mov( StartingPosition, cl ); shl( cl, edx ); not( edx ); shl( cl, ebx ); and( edx, eax ); or( ebx, eax ); Each entry in the MaskByLen table contains the number of one bits specified by the index into the table. Using the Length value as an index into this table fetches a value that has as many one bits as the Length value. The code above fetches an appropriate mask, shifts it to the left so that the L.O. bit of this run of ones matches the starting position of the field into which we want to insert the data, then it inverts the mask and uses the inverted value to clear the appropriate bits in the destination operand. To extract a bit string from a larger operand is just as easy as inserting a bit string into some larger operand. All you've got to do is mask out the unwanted bits and then shift the result until the L.O. bit of the bit string is in bit zero of the destination operand. For example, to extract the four-bit field starting at bit position five in EBX and leave the result in EAX, you could use the following code: mov( ebx, eax ); // Copy data to destination. and( %1_1110_0000, ebx ); // Strip unwanted bits. shr( 5, eax ); // Right justify to bit position zero. If you do not know the bit string's length and starting position when you're writing the program, you can still extract the desired bit string. The code is very similar to insertion (though a tiny bit simpler). Assuming you have the Length and StartingPosition values we used when inserting a bit string, you can extract the corresponding bit string using the following code (assuming source=EBX and dest=EAX): movzx( Length, edx ); mov( MaskByLen[ edx*4 ], edx ); mov( StartingPosition, cl ); mov( ebx, eax ); shr( cl, eax ); and( edx, eax ); The examples up to this point all assume that the bit string appears completely within a double word (or smaller) object. This will always be the case if the bit string is less than or equal to 24 bits in length. However, if the length of the bit string plus its starting position (mod eight) within an object is greater than 32, then the bit string will cross a double word boundary within the object. To extract such bit strings requires up to three operations: one operation to extract the start of the bit string (up to the first double word boundary), an operation that copies whole double words (assuming the bit string is so long that it consumes several double words), and a final operation that copies left-over bits in the last double word at the end of the bit string. The actual implementation of this operation is left as an exercise at the end of this volume. 5.6 Coalescing Bit Sets and Distributing Bit Strings Inserting and extracting bit sets is little different than inserting and extract bit strings if the "shape" of the bit set you're inserting (or resulting bit set you're extracting) is the same as the bit set in the main object. The "shape" of a bit set is the distribution of the bits in the set, ignoring the starting bit position of the set. So a bit set that includes bits zero, four, five, six, and seven has the same shape as a bit set that includes bits 12, 16, 17, 18, and 19 since the distribution of the bits is the same. The code to insert or extract this bit set is nearly identical to that of the previous section; the only difference is the mask value you use. For example, to insert this bit set starting at bit number zero in EAX into the corresponding bit set starting at position 12 in EBX, you could use the following code: and( %1111_0001_0000_0000_0000, ebx ); // Mask out destination bits. shl( 12, eax ); // Move source bits into posn. or( eax, ebx ); // Merge the bit set into EBX. However, suppose you have five bits in bit positions zero through four in EAX and you want to merge them into bits 12, 16, 17, 18, and 19 in EBX. Somehow you've got to distribute the bits in EAX prior to logically ORing the values into EBX. Given the fact that this particular bit set has only two runs of one bits, the process is somewhat simplified, the following code achieves this in a somewhat sneaky fashion: and( %1111_0001_0000_0000_0000, ebx ); shl( 3, eax ); // Spread out the bits: 1-4 goes to 4-7 and 0 to 3. btr( 3, eax ); // Bit 3->carry and then clear bit 3 rcl( 12, eax ); // Shift in carry and put bits into final position or( eax, ebx ); // Merge the bit set into EBX. This trick with the BTR (bit test and reset) instruction worked well because we only had one bit out of place in the original source operand. Alas, had the bits all been in the wrong location relative to one another, this scheme might not have worked quite as well. We'll see a more general solution in just a moment. Extracting this bit set and collecting ("coalescing") the bits into a bit string is not quite as easy. However, there are still some sneaky tricks we can pull. Consider the following code that extracts the bit set from EBX and places the result into bits 0..4 of EAX: mov( ebx, eax ); and( %1111_0001_0000_0000_0000, eax ); // Strip unwanted bits. shr( 5, eax ); // Put bit 12 into bit 7, etc. shr( 3, ah ); // Move bits 11..14 to 8..11. shr( 7, eax ); // Move down to bit zero. This code moves (original) bit 12 into bit position seven, the H.O. bit of AL. At the same time it moves bits 16..19 down to bits 11..14 (bits 3..6 of AH). Then the code shifts the bits 3..6 in AH down to bit zero. This positions the H.O. bits of the bit set so that they are adjacent to the bit left in AL. Finally, the code shifts all the bits down to bit zero. Again, this is not a general solution, but it shows a clever way to attack this problem if you think about it carefully. The problem with the coalescing and distribution algorithms above is that they are not general. They apply only to their specific bit sets. In general, specific solutions are going to provide the most efficient solution. A generalized solution (perhaps that lets you specify a mask and the code distributes or coalesces the bits accordingly) is going to be a bit more difficult. The following code demonstrates how to distribute the bits in a bit string according to the values in a bit mask: // EAX- Originally contains some value into which we insert bits from EBX. // EBX- L.O. bits contain the values to insert into EAX. // EDX- bitmap with ones indicating the bit positions in EAX to insert. // CL- Scratchpad register. mov( 32, cl ); // Count # of bits we rotate. jmp DistLoop; CopyToEAX: rcr( 1, ebx ); // Don't use SHR here, must preserve Z-flag. rcr( 1, eax ); jz Done; DistLoop: dec( cl ); shr( 1, edx ); jc CopyToEAX; ror( 1, eax ); // Keep current bit in EAX. jnz DistLoop; Done: ror( cl, eax ); // Reposition remaining bits. In the code above, if we load EDX with %1100_1001 then this code will copy bits 0..3 to bits 0, 3, 6, and 7 in EAX. Notice the short circuit test that checks to see if we've exhausted the values in EDX (by checking for a zero in EDX). Note that the rotate instructions do not affect the zero flag while the shift instructions do. Hence the SHR instruction above will set the zero flag when there are no more bits to distribute (i.e., when EDX becomes zero). The general algorithm for coalescing bits is a tad more efficient than distribution. Here's the code that will extract bits from EBX via the bit mask in EDX and leave the result in EAX: // EAX- Destination register. // EBX- Source register. // EDX- Bitmap with ones representing bits to copy to EAX. // EBX and EDX are not preserved. sub( eax, eax ); // Clear destination register. jmp ShiftLoop; ShiftInEAX: rcl( 1, ebx ); // Up here we need to copy a bit from rcl( 1, eax ); // EBX to EAX. ShiftLoop: shl( 1, edx ); // Check mask to see if we need to copy a bit. jc ShiftInEAX; // If carry set, go copy the bit. rcl( 1, ebx ); // Current bit is uninteresting, skip it. jnz ShiftLoop; // Repeat as long as there are bits in EDX. This sequence takes advantage of one sneaky trait of the shift and rotate instructions: the shift instructions affect the zero flag while the rotate instructions do not. Therefore, the "shl( 1, edx);" instruction sets the zero flag when EDX becomes zero (after the shift). If the carry flag was also set, the code will make one additional pass through the loop in order to shift a bit into EAX, but the next time the code shifts EDX one bit to the left, EDX is still zero and so the carry will be clear. On this iteration, the code falls out of the loop. Another way to coalesce bits is via table lookup. By grabbing a byte of data at a time (so your tables don't get too large) you can use that byte's value as an index into a lookup table that coalesces all the bits down to bit zero. Finally, you can merge the bits at the low end of each byte together. This might produce a more efficient coalescing algorithm in certain cases. The implementation is left to the reader... 5.7 Packed Arrays of Bit Strings Although it is far more efficient to create arrays whose elements' have an integral number of bytes, it is quite possible to create arrays of elements whose size is not a multiple of eight bits. The drawback is that calculating the "address" of an array element and manipulating that array element involves a lot of extra work. In this section we'll take a look at a few examples of packing and unpacking array elements in an array whose elements are an arbitrary number of bits long. Before proceeding, it's probably worthwhile to discuss why you would want to bother with arrays of bit objects. The answer is simple: space. If an object only consumes three bits, you can get 2.67 times as many elements into the same space if you pack the data rather than allocating a whole byte for each object. For very large arrays, this can be a substantial savings. Of course, the cost of this space savings is speed: you've got to execute extra instructions to pack and unpack the data, thus slowing down access to the data. The calculation for locating the bit offset of an array element in a large block of bits is almost identical to the standard array access; it is Element_Address_in_bits = Base_address_in_bits + index * element_size_in_bits Once you calculate the element's address in bits, you need to convert it to a byte address (since we have to use byte addresses when accessing memory) and extract the specified element. Because the base address of an array element (almost) always starts on a byte boundary, we can use the following equations to simplify this task: Byte_of_1st_bit = Base_Address + (index * element_size_in_bits )/8 Offset_to_1st_bit = (index * element_size_in_bits) % 8 (note "%" = MOD) For example, suppose we have an array of 200 three-bit objects that we declare as follows: static AO3Bobjects: byte[ (200*3)/8 + 1 ]; // "+1" handles trucation. The constant expression in the dimension above reserves space for enough bytes to hold 600 bits (200 elements, each three bits long). As the comment notes, the expression adds an extra byte at the end to ensure we don't lose any odd bits (that won't happen in this example since 600 is evenly divisible by 8, but in general you can't count on this; one extra byte usually won't hurt things). Now suppose you want to access the ith three-bit element of this array. You can extract these bits by using the following code: // Extract the ith group of three bits in AO3Bobjects and leave this value // in EAX. sub( ecx, ecx ); // Put i/8 remainder here. mov( i, eax ); // Get the index into the array. shrd( 3, eax, ecx ); // EAX/8 -> EAX and EAX mod 8 -> ECX (H.O. bits) shr( 3, eax ); // Remember, shrd above doesn't modify eax. rol( 3, ecx ); // Put remainder into L.O. three bits of ECX. // Okay, fetch the word containing the three bits we want to extract. // We have to fetch a word because the last bit or two could wind up // crossing the byte boundary (i.e., bit offset six and seven in the // byte). mov( AO3Bobjecs[eax], eax ); shr( cl, eax ); // Move bits down to bit zero. and( %111, eax ); // Remove the other bits. Inserting an element into the array is a bit more difficult. In addition to computing the base address and bit offset of the array element, you've also got to create a mask to clear out the bits in the destination where you're going to insert the new data. The following code inserts the L.O. three bits of EAX into the ith element of the AO3Bobjects array. // Insert the L.O. three bits of AX into the ith element of AO3Bobjects: readonly Masks: word[8] := [ !%0000_0111, !%0000_1110, !%0001_1100, !%0011_1000, !%0111_0000, !%1110_0000, !%1_1100_0000, !%11_1000_0000 ]; . . . sub( ecx, ecx ); // Put remainder here. mov( i, ebx ); // Get the index into the array. shrd( 3, ebx, ecx ); // i/8 -> EBX, i % 8 -> ECX. shr( 3, ebx ); rol( 3, ecx ); and( %111, ax ); // Clear unneeded bits from AX. mov( Masks[ecx], dx ); // Mask to clear out our array element. and( AO3Bobjects[ ebx ], dx ); // Grab the bits and clear those // we're inserting. shl( cl, ax ); // Put our three bits in their proper location. or( ax, dx ); // Merge bits into destination. mov( dx, AO3Bobjects[ ebx ] ); // Store back into memory. Notice the use of a lookup table to generate the masks needed to clear out the appropriate position in the array. Each element of this array contains all ones except for three zeros in the position we need to clear for a given bit offset (note the use of the "!" operator to invert the constants in the table). 5.8 Searching for a Bit A very common bit operation is to locate the end of some run of bits. A very common special case of this operation is to locate the first (or last) set or clear bit in a 16- or 32-bit value. In this section we'll explore ways to accomplish this. Before describing how to search for the first or last bit of a given value, perhaps it's wise to discuss exactly what the terms "first" and "last" mean in this context. The term "first set bit" means the first bit in a value, scanning from bit zero towards the high order bit, that contains a one. A similar definition exists for the "first clear bit." The "last set bit" is the first bit in a value, scanning from the high order bit towards bit zero, that contains a one. A similar definition exists for the last clear bit. One obvious way to scan for the first or last bit is to use a shift instruction in a loop and count the number of iterations before you shift out a one (or zero) into the carry flag. The number of iterations specifies the position. Here's some sample code that checks for the first set bit in EAX and returns that bit position in ECX: mov( -32, ecx ); // Count off the bit positions in ECX. TstLp: shr( 1, eax ); // Check to see if current bit position contains jc Done // a one; exit loop if it does. inc( ecx ); // Bump up our bit counter by one. jnz TstLp; // Exit if we execute this loop 32 times. Done: add( 32, cl ); // Adjust loop counter so it holds the bit posn. // At this point, ECX contains the bit position of the first set bit. // ECX contains 32 if EAX originally contained zero (no set bits). The only thing tricky about this code is the fact that it runs the loop counter from -32 to zero rather than 32 down to zero. This makes it slightly easier to calculate the bit position once the loop terminates. The drawback to this particular loop is that it's expensive. This loop repeats as many as 32 times depending on the original value in EAX. If the values you're checking often have lots of zeros in the L.O. bits of EAX, this code runs rather slow. Searching for the first (or last) set bit is such a common operation that Intel added a couple of instructions on the 80386 specifically to accelerate this process. These instructions are BSF (bit scan forward) and BSR (bit scan reverse). Their syntax is as follows: bsr( source, destReg ); bsf( source, destReg ); The source and destinations operands must be the same size and they must both be 16- or 32-bit objects. The destination operand has to be a register, the source operand can be a register or a memory location. The BSF instruction scans for the first set bit (starting from bit position zero) in the source operand. The BSR instruction scans for the last set bit in the source operand by scanning from the H.O. bit towards the L.O. bit. If these instructions find a bit that is set in the source operand then they clear the zero flag and put the bit position into the destination register. If the source register contains zero (i.e., there are no set bits) then these instructions set the zero flag and leave an indeterminate value in the destination register. Note that you should test the zero flag immediately after the execution of these instructions to validate the destination register's value. Examples: mov( SomeValue, ebx ); // Value whose bits we want to check. bsf( ebx. eax ); // Put position of first set bit in EAX. jz NoBitsSet; // Branch if SomeValue contains zero. mov( eax, FirstBit ); // Save location of first set bit. . . . You use the BSR instruction in an identical fashion except that it computes the bit position of the last set bit in an operand (that is, the first set bit it finds when scanning from the H.O. bit towards the L.O. bit). The 80x86 CPUs do not provide instructions to locate the first bit containing a zero. However, you can easily scan for a zero bit by first inverting the source operand (or a copy of the source operand if you must preserve the source operand's value). If you invert the source operand, then the first "1" bit you find corresponds to the first zero bit in the original operand value. The BSF and BSR instructions are complex instructions (i.e., they are not a part of the 80x86 "RISC core" instruction set). Therefore, these instructions are necessarily as fast as other instructions. Indeed, in some circumstances it may be faster to locate the first set bit using discrete instructions. However, since the execution time of these instructions varies widely from CPU to CPU, you should first test the performance of these instructions prior to using them in time critical code. Note that the BSF and BSR instructions do not affect the source operand. A common operation is to extract the first (or last) set bit you find in some operand. That is, you might want to clear the bit once you find it. If the source operand is a register (or you can easily move it into a register) then you can use the BTR (or BTC) instruction to clear the bit once you've found it. Here's some code that achieves this result: bsf( eax, ecx ); // Locate first set bit in EAX. if( @nz ) then // If we found a bit, clear it. btr( ecx, eax ); // Clear the bit we just found. endif; At the end of this sequence, the zero flag indicates whether we found a bit (note that BTR does not affect the zero flag). Alternately, you could add an ELSE section to the IF statement above that handles the case when the source operand (EAX) contains zero at the beginning of this instruction sequence. Since the BSF and BSR instructions only support 16- and 32-bit operands, you will have to compute the first bit position of an eight-bit operand a little differently. There are a couple of reasonable approaches. First, of course, you can usually zero extend an eight-bit operand to 16 or 32 bits and then use the BSF or BSR instructions on this operand. Another alternative is to create a lookup table where each entry in the table contains the number of bits in the value you use as an index into the table; then you can use the XLAT instruction to "compute" the first bit position in the value (note that you will have to handle the value zero as a special case). Another solution is to use the shift algorithm appearing at the beginning of this section; for an eight-bit operand, this is not an entirely inefficient solution. One interesting use of the BSF and BSR instructions is to "fill in" a character set with all the values from the lowest-valued character in the set through the highest-valued character. For example, suppose a character set contains the values {`A', `M', `a'..'n', `z'}; if we filled in the gaps in this character set we would have the values {`A'..'z'}. To compute this new set we can use BSF to determine the ASCII code of the first character in the set and BSR to determine the ASCII code of the last character in the set. After doing this, we can feed those two ASCII codes to the cs.rangeChar function to compute the new set. You can also use the BSF and BSR instructions to determine the size of a run of bits, assuming that you have a single run of bits in your operand. Simply locate the first and last bits in the run (as above) and the compute the difference (plus one) of the two values. Of course, this scheme is only valid if there are no intervening zeros between the first and last set bits in the v