Fog Creek Software
Discussion Board




Common subset algorithm design

There was a thread a while ago about challenging algorithm design, so I thought I'd throw this one out and see if anyone has any insights.

I have a collection of N bitmasks (fixed width) and I want to identify subsets of bits such that if one bit is set within a bitmask all other bits in that subset are set.

For example:

A: 1 0 1 0 0 1 0
B: 1 0 1 1 1 0 1
C: 0 1 0 1 1 0 1

(0 , 2) forms a subset, as does (3, 4,  6)

My first cut at this is highly iterative.  Basically try each pairwise combination that presents itself from each bitmask in turn.

When you find a pair that works remember the match and clear the second bit everywhere.

Keep iterating until you have considered all the bit masks without making a change.

Can anyone think of a way of speeding this up?  I'd like a way to identify the subsets proactively without trying every combination.

Rob Walker
Thursday, February 12, 2004

Python code:

def slice(position):
  result = 0
  for n in numbers:
    result = (result<<1) | ((n>>position)&1)
  return result

d = {}
for i in range(0,n_bits):
  x = slice(i)
  try: d[x].append(i)
  except KeyError: d[x] = [i]

subsets = d.values()

Explanation: We're looking for sets of identical columns. So we build columns using the "slice" function and put them into a hash table.

Gareth McCaughan
Thursday, February 12, 2004

Something like this (Perl) could work as a starting point:

my $numbits = 7;

my @elements = (82,93,45);

foreach my $mask (0..(2**$numbits-1)) {
  my $good = 1;
  foreach (map {$_ & $mask} @elements) {
    $good &= ($_ == $mask || $_ == 0);
  };
  print "$mask\n" if $good;
}

Note that the input elements and output are decimal representations of the bit vectors using the usual low-order-bit-is-on-the-right convention, which differs from the notation in the original post.

Also note that the original post left a couple of things unspecified. For example, are single-bit subsets like (1) valid? What about the zero-bit subset ()? Also, if none of the elements has any ones in the subset, is it still valid?

Anyway, in pseudocode:

Iterate through bitmasks representing every possible subset (1 in a bit position means it's part of the subset, 0 means it's not)

For each mask, check to make sure that the desired condition is true for all elements in the list. Specifically, the cases we want to track are where all bits in the subset are 1 (ie the value is equivalent to the mask value) or where all are 0.

John C.
Thursday, February 12, 2004

I should note that in contravention of my usual test-first principles I just hacked this together for a quick post, so I make no representations as to its correctness. Please don't use it for anything important like targeting missiles or running a nuclear power plant :-)

John C.
Thursday, February 12, 2004

Aren't you just looking for matching columns?

Brian
Thursday, February 12, 2004

Thank you both for your replies and brain cycles.  Unfortunately the implementation isn't in either Python or Perl so I have to translate :)

The idea of using the column values as keys into a hash table is great.  I was too busy trying to find a solution that manipulated the rows mathematically to slice the problem the other way.

John: to fill in a few more details. I'm only interested in subsets made up of 1's and with a cardinality of at least 2.

In this application each bitmask is 4096 bits long (and there are up to 4096 of them).  So iterating over 2**4096 patterns would take a little too long -- I know ... I didn't specify that in the original question :)

And don't worry this isn't being used in anything life threatening - but I'll make sure there are some unit tests to cover it.

Thanks again!

Rob Walker
Thursday, February 12, 2004

> Aren't you just looking for matching columns?

Yes -- I was just missing the obvious.  The original problem domain doesn't present the problem in quite those terms though.

Rob Walker
Thursday, February 12, 2004

Welcome to this week's "do my homework" thread.

Mr. Fancypants
Thursday, February 12, 2004

"I have a collection of N bitmasks (fixed width) and I want to identify subsets of bits such that if one bit is set within a bitmask all other bits in that subset are set."

What in the world does this statement mean?

NoName
Friday, February 13, 2004

"I have a collection of N bitmasks (fixed width) and I want to identify subsets of bits such that if one bit is set within a bitmask all other bits in that subset are set."

Sounds like compression.

Jason McCullough
Tuesday, February 17, 2004

*  Recent Topics

*  Fog Creek Home