Fog Creek Software
Discussion Board




Binary search in 5 lines of code? How?

In "Guerilla Interviewing" Joel has a list of C functions to ask during the interview, including "binary search", and says each function should fit in about 5 lines of code.

Is this a textbook example? Does anyone have a short example of code that does binary search in about 5 lines? Whatever I can think of would take much more.

2. On a related matter, how do you shift a 64-bit value in C? In assembly you could go

; edx:eax holds the value
shl eax, 1
rcl edx, 1

But in C you can only do

int hi, lo;
if (lo & 0x80000000) {
  lo <<= 1;
  hi <<= 1;
  hi++;
} else {
  lo <<= 1;
  hi <<= 1;
}

Which would compile to much messier code because it has to do an AND with an immediate value, then an increment, as opposed to just two shifts. Or is there a better way?

Alex
Monday, December 08, 2003

here's your one free hint: recursion

Tony Chang
Monday, December 08, 2003

For your second one, if your compiler supports double long ints or long long ints or whatever they are called now of 64 bits, you can use that. If not, you may be able to find a math library that wraps some assembly shifts in a function call. If not, you should do the assembly yourself as an inline function perhaps if the extra cycles are that important to you that the C method doesn't work.

And absolutely, never ever use a plain int for anything that has bit fields. Use char short or long or long long, and specifiy signed or unsigned explicitly.

Tony Chang
Monday, December 08, 2003

>> here's your one free hint: recursion

could you elaborate?

>> never ever use a plain int for anything that has bit fields.

why not?

Alex
Monday, December 08, 2003

>> never ever use a plain int for anything that has bit fields.

> why not?

I'm a perl programmer (who did C way back when in college), but lets see if I can't get this right.

You should use an unsigned int.  Because what does a negative bitmask mean?  Also, you reduce the risk of the compiler doing something funny with your nice bitmask when it tries to convert it to two's complement or something.

So, do I get the prize? :)

Andrew Hurst
Monday, December 08, 2003

I can write that code in >2< lines of code. :-P

Beware the power of my stinky feet!
Monday, December 08, 2003

Sign extension is one bad reason, a worse one is that an int is the smallest word on the machine (confabulations of typing and macros aside) and so you really can't be sure of its size.

An unsigned char, is a byte of 8 bits (lets not talk about ICL and sextets), and regardless of word size will always be 8 bits.

Simon Lucy
Monday, December 08, 2003

I believe

  hi =  (hi << shift) | (lo >> (32 - shift));
  lo <<= shift;

works for 0 <= shift <= 31.

Alyosha`
Monday, December 08, 2003

>> hi =  (hi << shift) | (lo >> (32 - shift));
>>  lo <<= shift;

that looks nice in C, Alyosha, but the actual machine code will still be messy - it needs to do a subtract, shift lo twice, etc.

i'm not picking, just commenting :)

Alex
Monday, December 08, 2003

>> I can write that code in >2< lines of code. :-P

PLEASE please share!

Alex
Monday, December 08, 2003

2) C doesn't say how big ints are.  Ints could be 64-bits and the "obvious" code would work fine, and you wouldn't have to do that weirdness.

Roose
Monday, December 08, 2003

Sorry Andrew, no prize. Free (h)int for you: you should not use a signed int nor should you use an unsigned int when doing anything dealing with bit masks.

You can do so of course, but it's better not to.

Tony Chang
Monday, December 08, 2003

Alex, please do your own homework my friend. We have given you so many hints. I'll can do it in 2 lines as well. How much do you bid for my service assisting you with homework?

Tony Chang
Monday, December 08, 2003

Maybe I don't understand the problem. Is it "search for a 32-bit pattern" or "search for a bit pattern of arbitrary length?"

Alex
Monday, December 08, 2003

Ah ha! Now we're getting somewhere! In the context of an interview, with a inadequately specified requirements, the correct thing to do is to nail down the requirements. You pass.

But lets say that instead you have a written programming test with this question and you are not allowed to ask for clarification. Thus your best bet is to implement a binary search. Best to pick a data structure and type that will allow for the algorithm to be shown plainly without confusion. If you are into templates though, you could show a generic solution that reveals only the algorithm. For ease of understanding, I would pick the solution of searching an array of integers. The assumption there with a binary search is that the input is of course sorted. If this was a trick question, the interviewer would tell you at the end 'I never said the data was sorted at the beginning.', which is why you state your assumptions in an interview for maximum brownie points.

Now, given an array of integers and the length of that array and teh element to find, you should be able to show us a one or two-liner that will do it.

Tony Chang
Monday, December 08, 2003

// Assume T::operator< exists
template<class T>
vector<T>::iterator> bsearch(vector<T> data, const T searchVal)
{
    size_t low = 0, high = data.size();
    while( low<= high ) {
        int mid = (high+low)/2;
        if(data[mid] < searchVal) low = mid;
        else if(searchVal < data[mid]) high = mid;
        else  return & data[mid];
    }
    return data.end();
}

This is probably about the best you can do without squishing lines together unaturally. (Also, note recursion is not used here, and it shouldn't be in this algorithm, to write a recursive algorithm here is extremely wasteful.)

Not five lines for sure, but pretty close.

Note I didn't test this (as one could not do in an interview situation) however, it is in the right ballpark. BTW, you will note that I used two < comparisons to determine equality (with the theoretical minimum cost) since !(a<b) and !(b<a) generally implies that a==b. THis reduces the constraints on the type T, requiring that it only have T::operator< rather than also requiring T::operator==

Jessica Boxer
Monday, December 08, 2003

Is this it? This just doesn't look like a 2-liner to me, and it only searches for a 32-bit key.

bool Find (unsigned int* p, int length, unsigned int key)
{
  while (p++, --length)
      for (int i = 0; i < 32; i++)
        if (key == (*p << i) | (*(p+1) >> (32 - i))) return true;
  return false;
}

I guess Joel wouldn't hire *me*.

And what if the key is, say, 47 bits long? That's why I thought it would take more than 5 lines.

Alex
Monday, December 08, 2003

Okay, so I got this all wrong? Binary search actually means "find a value in a sorted list"???

Yeah, I can do *that* in two lines :):):)

I though it meant "find sequence of bits, no matter where it starts".

Alex
Monday, December 08, 2003

Familiarity with standard CS terms might be part of an interview also :)

sgf
Monday, December 08, 2003

Wow, I was wondering why these comments about bitwise operations kept arising on this question. Thought the night's beers were obscuring something painfully obvious here.

FWIW, I'm not a C programmer, but my thought process in response to the question actually went like this:
1. OK, binary search. What data type are we searching, anyway? A (hopefully sorted!) list is probably a reasonable default assumption, but it could be a binary tree or something else. Can you clarify the requirements?
2. What kind of values are we working with? Your comparison operation might be different if you are looking up numbers versus, say, case-insensitive alpha strings.
3. Do you simply want to know whether the data exists in the target data structure, or do you want its index/position in the data structure? (Probably the latter, but never hurts to confirm.)
4. Is this something I should really be implementing in the first place? Isn't there a library or something we can use that's already tested and optimized?
5. OK, since you want me to write it, does the project have any guidelines as to the relative importance of code readability/maintainability, performance, stack usage, object code size (blech), or other parameters that I should be aware of?
6. Do you mind if I grab Knuth off my bookshelf? Even though this is a pretty easy algorithm, it's probably susceptible to things like off-by-one errors if I have to write it under pressure, and I'm pretty sure that TAOCP3 has a provably correct version that I could adapt.
7. Ah, I see, you want to see whether I can write code or not. Well then, first let me spit out a few lines of tests so I can be sure my code does what I think it does. (Yes, I am one of those weirdos who believes in writing tests before the code. Or at least concurrently with. Look into my eyes and I will Reveal the Truth ;-)
8. OK, *now* here's my real code and comments.

By the way, as someone who spent a number of years as a hiring manager (though mostly for non-programming positions), I have to say this. If I asked a candidate a question and (s)he didn't know what I was talking about, didn't ask for clarification, and went completely down the wrong path, that would make him or her an immediate and probably unrecoverable no-hire. Buh-bye (though I'd be courteous about it). Not knowing something is fine; nobody knows everything. Not knowing something, but not being aware that you don't know, and not asking appropriate follow-up questions to clarify, is totally unacceptable. If I told one of my developers to implement a binary search and (s)he nodded and said, yes, OK, whatever you say sir, and then went off and started mucking with bitwise operators because (s)he didn't understand the terminology I was using, I would be apoplectic. I'd do my best not to take it out on the person individually (though I might have a pointed and rather candid conversation the technical person who approved their hire), but the fact is that someone who is unwilling to raise questions (or at least do basic research) when they don't understand something is a serious liability to the team and to the company.

John C.
Monday, December 08, 2003

make that: conversation >with< the technical person...

John C.
Monday, December 08, 2003

Hey Alex,

OK, I can see you are getting the wrong sense of the word binary. That's actually kind of interesting from a linguistic standpoint that you though of binary numbers and thought it had to de wiht searching a subfield. The term 'binary search' is kind of idiomatic. It does appear you can write C code which is very good, but I would guess that you did not have any formal training? Implementing a binary search is certainly something that would be covered as a matter of course at any freshman programming class, so I'd say that a employer interviewing you would mainly be concerned that your background was a bit weak. HOWEVER, that said, it's also quite possible that you are entirely self-taught and you have amazing skills from implementing cutting-edge games or who knows what. So I think it would be premature to dismiss you from an interview just because of a simple (and rather funny) misunderstanding, but I would suggest that you bring a portfolio of your self-taught work to an interview.

(to others)

I think it would be a mistake to rule Alex out just because of terminology impedence mismatches. His assumption that it was searching binary numbers is not unreasonable if you think about it, but it does indicate he hasn't been around an intensive programming environment and is probably self taught. He could be a diamond in the rough. Or not. You'd have to look at more than that in the interview. If the whole interview was one misunderstanding after another though, It'd be problematic.

Tony Chang
Monday, December 08, 2003

Thanks Tony!

Actually, to dispel this great mystery, I'm Romanian. I did computer science in high school, I have my engineering degree, but everything was taught in Romanian.

John -- to be honest, I would absolutely grill the interviewer to the ground with questions about everything before I ever started to write anything.

Wouldn't it be lame to *pretend* to understand what they ask, then take this huge chance of coming off like an idiot and just dive into code?

I completely agree with you, a candidate like that would be a joke.

Your diamond in the rough,

Alex
Tuesday, December 09, 2003

Alex,

My apologies. I didn't realize from your postings that English wasn't your first language. Shame on me. I would expect a native English speaker to be familiar with the term "binary search", or at least to admit ignorance and ask for clarification. Given the linguistic differences, I'd be much more inclined to give you the benefit of the doubt.

John

John C.
Tuesday, December 09, 2003

Do it in perl.  Nothing should take more than one line ;-)

just couldn't resist
Tuesday, December 09, 2003

I always imagined a Perl one-liner as something like "Hey baby, wanna come back to my place and check out my signed first-edition Camel book?"

Rob VH
Tuesday, December 09, 2003

return &data[mid];

makes an unwarranted assumption about the implementation of vector iterators.  Some STL implementations wrap their vector iterators to provide better error checking.

Best use

return data.begin() + mid;

which will work because vector iterators are random-access iterators.

David Jones
Tuesday, December 09, 2003

Can you use the stl with C, then?


Tuesday, December 09, 2003

Who really writes binary searches anymore?  Years ago, I would have thought of it as a standard tool for any programmer to know.  Now just use the STL.

Happy to be working
Tuesday, December 09, 2003

I thought the stl spec gauranteed contigous memory layout of vectors precisely so you could use the [] operator on them..

not mr. johnson
Tuesday, December 09, 2003

SortedList so = new SortedList( );
/*
stuff it with data in here:
...
*/

so.BinarySearch( 42 );

C# has a number of wonderful objects, some being linked lists, sorted lists, etc. that can store objects.  If you have to write your own search algorithm you're being wasteful. :)

MR
Tuesday, December 09, 2003

John,

I was certainly not looking for an apology :)

Others:

Yes, it's already implemented, but it's still an interview question.

Alex
Tuesday, December 09, 2003

I think not mr. johnson is correct about the STL iterator.  &data[mid] should be fine.

Roose
Tuesday, December 09, 2003

There is a 5-line solution on http://www.cs.williams.edu/~kim/cs334.97/Lec3.html#RTFToC6  (see "fun find").

It's working on a binary tree, but I'm sure the solution would be similar for an ordered array of items.


Tuesday, December 09, 2003

// return pointer into array of searched for value,
// NULL if not found
int* bfind( int *pdata, int ndata, int value )
{
    if( ndata/2 == 0 && value != pdata[ndata/2] ) return NULL;
    else if( value < pdata[ndata/2] ) return bfind( pdata, ndata/2, value );
    else if( value > pdata[ndata/2] ) return bfind( &pdata[ndata/2], ndata - ndata/2, value );
    else return &pdata[ndata/2];
}

That's my lunch hour's worth of effort.  Very ugly, but it works  I liked Jessica's answer better.

This type of question is supposed to help decide who gets hired...  ridiculous.

hoser
Tuesday, December 09, 2003

Alex : I may be wrong, but I think that in Latin languages we're more used to call this "dichotomic search".

Pakter
Tuesday, December 09, 2003

> Can you use STL with C then?

Oops! My mistake, didn't read the question carefully enough. Of course, in the interview I would quickly recover my explaining why there is no good reason to write any windows application in C today, since C++ is fundamentally better in every dimension. The only possible reason might be the absence of a C++ compiler, but that is rarely true, and certainly not of a windows environment.

Of course the real cheesy answer to this would be:

template<class T>
T::const_iterator bfind(const vector<T> data,
                                  const T search)
{
    return search(data,begin(), data.end(),
                          bind2nd(less<T>(), search));
}

Which, with wider columns, could be four lines.

Jessica Boxer
Tuesday, December 09, 2003

I'd call that one a two-liner.

--

Here's a good interview question:

Name three good reasons to write an application in C today rather than C++.

la la
Tuesday, December 09, 2003

Here's a response,

Why?  Don't you have a system you want me to design/implement/test/get out the door/support?

Simon Lucy
Tuesday, December 09, 2003

Here's a non-recursive 5-liner in Java, if you don't count the method signature and curly braces as lines:

  public static int binarySearch(Object[] arr, Object key)
  {
    int mid = 0, lo = 0, hi=arr.length-1;
    for (mid=(lo+hi)/2; lo <= hi && !(arr[mid].equals(key)); mid=(lo+hi)/2) {
      lo = ((Comparable)arr[mid]).compareTo(key) < 0 ? mid+1 : lo;
      hi = ((Comparable)arr[mid]).compareTo(key) > 0 ? mid-1 : hi;
    }
    return arr[mid].equals(key) ? mid : -1;
  }

T. Norman
Tuesday, December 09, 2003

"Name three good reasons to write an application in C today rather than C++. "

Good reasons?

1) If portability to platforms without good C++ compilers is required.

2) So you can compile in legacy source code that is in C.

3) to be a pain in the class.

I can't think of any other reason when using the C-like subset of C++ wouldn't be good enough.

Keith Wright
Tuesday, December 09, 2003

In C.... It's just not natural any more, looking at those global functions which you "know" are only written to handle a specific data type, yet everyone sees them.

Also, constructor/destructor is such a natural thing by now, when you go back to C, you have these orphan-looking "init" functions for every data type, and you have to remember to call them...

Alex
Tuesday, December 09, 2003

Of course, this begs the question, what is a "line of code"?

T. Norman
Tuesday, December 09, 2003

Maybe 3) should be "You need a standard library that cannot throw exceptions"  Is it feasable to write C++ without the possibility of exceptions being thrown?

Keith Wright
Tuesday, December 09, 2003

Simon,

You would fail my interview. Sorry.

la la
Tuesday, December 09, 2003

Keith,

Thanks.

#1 is true enough.
#2 is not true for linkers I am aware of -- C++ linkers can bind to C compiled object code without trouble on most systems I've seen.
#3 isn't a good reason.

la la
Tuesday, December 09, 2003

Oh your second 3 is at least an interesting response. i'd have to think about it though. It's true that there can be reasons not to use libraries with exceptions. You can certainly write full blown c++ apps without exceptions if you choose tohugh so that's a bit different from C vs C++.

la la
Tuesday, December 09, 2003

Oh and regarding your question, exceptions can be disabled in your compiler flags, so yes, it can be done with assurance.

la la
Tuesday, December 09, 2003

1. C++ libraries are not supported when writing device drivers.
2. C++ constructor and assignment are sufficiently misused/misunderstood that bloat in runtime and size are a problem.
3. When writing OO code Java or C# is a better language.

Which brings me to my opion:
C is really a machine independent assembler languarge.
C++ is an attempt to use C syntax for OO.  On the face of it, since I like the syntax, I'm inclined to like the language.  But I don't - it merely frustrates the hell out of me.

My favorite example is I've got a large class, with lots of data members, which I "new" on the heap.  How do I add this thing into an STL container?

Yeah, I know, use an counted ptr class to wrap the base class which allows for a self referencing copy constructor and assigment operator and performs reference counting.

What a pain in the @ss.  And the point of C++ was, ... what? It seems like a pretty half baked idea to me.

hoser
Wednesday, December 10, 2003

Good stuff hoser, I especially like #3, though please add Objective C to the mix. Maybe Visual Basic as well...

My personal favorite is when you want to allow for a third party plugin architecture, going with C++ will limit your customers to the exact compiler and version you are using, which is really a bad thing that will limit the # of plugins that get written and can dampen a product's wildfire catching on phase.

la la
Wednesday, December 10, 2003

"Oops! My mistake"

A reasonable one to make if one knows that the company produces desktop app software, IMO.

It can still be necessary in memory/processor limited systems of course (vtables can be a real performance killer if you allow them, for example).

C code. C code run. Run code, run! Pleaaase!
Wednesday, December 10, 2003

la la, I'd never work for you.

Simon Lucy
Wednesday, December 10, 2003

A couple of comments. First three reasons to use C on a project:

1. Because there is no suitable good quality toolset for the platform you are working on that uses C++. (This is extremely rare these days, though five years ago it was more of a problem.)

2. Your programmers only have expertise in programming in C. (Short term projects only, for projects over 12 months, they should retrain.)

3. Some external requirement forces you to use C, (such as the one suggested earlier -- an API requires external linkage, which, as the poster pointed out, can be quite inconsistent in C++.) Of course most likely you would implement a C interface on top of a C++ core, or use something like CORBA or COM as a shared common interface. And of course you can specify global functions as having straight C linkage in C++ too with extern "C", which is supported in every compiler I have ever used (though it is technically implementation dependent.)

Now, to answer the poster who put forward other ideas critical of C++:

1. C++ libraries can't be used in a device driver...

Perhaps in the archaic system you are using, but they most certainly can in most systems. (And even if they couldn't, C++ language features offer massive benefit over straight C, and can use all C libraries too.) Of course there are all sorts of special restrictions when writting device drivers, but they don't go away when you write in C.

2. Code bloat due to poor use of ctors and dtors...

Not clear why ctors and dtors in particular cause bloat, however code bloat is rarely a problem in todays' computer systems. See Joel's article on such at http://www.joelonsoftware.com/articles/fog0000000020.html

Of course there are special cases where this is more significant, such as embedded software, however, such software has unique constraints that require many different careful choices, including considering code size. There is nothing intrinsic in C++ that requires bloated code, it is a design choice.

3. When writing OO code Java and C# is a better choice...

Better by what measure? You might argue that on a fundamental language level Java or C# are better languages, (a debate that I doubt you would win with someone who knew what they were talking about), however there are many, many other things to consider. What platform, what tools what skillset. Do you want it fast, pretty, small, large, standalone, compilable onto an 8051 embedded processor?

All these factors bear directly on what software development system you use. Generally speaking I'd agree that C# is a better tool for building Windows GUI focused applications (doesn't do so good on Solaris though!!!) but on a fundamental level it is a great IDE rather than a great language. It is, for example, missing the whole metaprogramming paragdim that is so powerful in C++. And, though generics are due in the next version, they are much, much weaker than C++ templates.

As to Java, well nobody has ever produced a pretty program in Java that works within the normal bounds of windows applications. (Not to mention all the flaws in the language.)

The most commonly cited flaw in C++ is the lack of GC. However, the reality is that, regardless of all the fancy templatized shared pointers and so forth (and I'd like to see you implement a language extension like shared pointers in Java or C#, even with their new generics mechanism! Ha you say, I don't need to, they already have GC! OK, then lets see you implement something like the lambda library in Java or C#!)

Regardless, the reality is that with appropriate application of the STL it is largely possible to eliminate all news and deletes in a program. There are some esoteric cases where this is still necessary, but the vast majority of new and delete in most C++ programs are there out of habit or legacy rather than necessity. vector is C++ garbage collection.

But that is just my opinion.

Jessica Boxer
Wednesday, December 10, 2003

> la la, I'd never work for you.

You wouldn't be happy here. Different strokes as they say.

la la
Thursday, December 11, 2003

PLEASE KINDLY MAKE PROGRAM FOR ME IN C++
OF
BINARY SEARCH TREE
FUNCTIONS:CREATE TREE
                    CREATE N INSERT IN TO NODES
                    TRAVERSE TREE
                    DELETE NODE

SALMAN
Wednesday, April 21, 2004

*  Recent Topics

*  Fog Creek Home