Fog Creek Software
Discussion Board




checked atoi() or strtol() for C/C++

Does anybody know of an integer and hex parser that checks for invalid characters and returns an error if the string contains an invalid character?

The standard atoi() and strtol() functions ignore leading white space and consider the first non-valid character to be the end of the number. 

If the string contains either, I consider it an error.

I could wrap strtol(), but that requires two passes over the string.  One to see if contains invalid characters, and one to convert it to a long.  Seems a bit wasteful when it could be handled in one pass.

christopher (baus.net)
Thursday, August 05, 2004

I think I implemented one once in about a half-dozen lines of C. Why don't you do the same?

Christopher Wells
Thursday, August 05, 2004

I don't know if it handles that case, but you could try boost::lexical_cast.

_
Thursday, August 05, 2004

> I think I implemented one once in about a half-dozen lines of C. Why don't you do the same?

Lazy.  Rather use a proven implementation. 

christopher (baus.net)
Thursday, August 05, 2004

If the stop character returned by strtol is not the null terminator then you know something is wrong with the string.  If you really don't want leading whitespace then check the first character with isspace or comparable.

Doug
Thursday, August 05, 2004


Good one.  I wish I could also give the string length to strtol rather than having it null terminated, but I could probably make this work...

christopher (baus.net)
Thursday, August 05, 2004

Do you want C or C++? They're different languages, you know.

If you want C, you can do it with one call to strtol():

result = strtol(s, &endp, 10);

if (endp && *endp == '\0')
  /* Entire string was valid */


That'll work in C++ too, but boost::lexical_cast is much nicer.

comp.lang.c refugee
Thursday, August 05, 2004

char *endptr ;

long result = strtol ( numberstr, &endptr, 10 ) ;

if ( (numberstr[0]  != '  ') && !*endptr )
{
  // -- number is valid and has no leading spaces
}

Code Monkey
Thursday, August 05, 2004

Should've supplied an example...

int foo = boost::lexical_cast<int>(str);

boost::lexical_cast throws an exception if the conversion is invalid. See http://www.boost.org/ for details.

comp.lang.c refugee
Thursday, August 05, 2004

I personally don't think that C and C++ are totally different languages.  C++ is pretty much a superset of C. 

I looked at boost::lexical_cast, and I read it was slow, and throwing is exception is bad IMHO. 

This will be used to validate input.  It is possible devise a DOS scenario where malicious code constantly sends bad data causing exception after exception to be thrown. 

Personally I don't think exception handling should be used to validate public network traffic, but that's an ideological discussion.

christopher (baus.net)
Thursday, August 05, 2004

"I personally don't think that C and C++ are totally different languages.  C++ is pretty much a superset of C. "

C++ is a logically superset, but the mindset is totally different.

"I looked at boost::lexical_cast, and I read it was slow, and throwing is exception is bad IMHO. "

It is slower, but thats the price you pay for it being checked.

"This will be used to validate input.  It is possible devise a DOS scenario where malicious code constantly sends bad data causing exception after exception to be thrown.
Personally I don't think exception handling should be used to validate public network traffic, but that's an ideological discussion."

<sarcasm>
Yeah, thats right!, exceptions and exception handling should be only used in non mission critical applications.
</sarcasm>

Craig
Thursday, August 05, 2004

Good god have we sunk this low! Mr Baus, surely you could have written such a simple function in less time than it took to post.

Or could you?

Wondering
Thursday, August 05, 2004

The reason lexical_cast is slow is because it uses streams behind the scenes, and most stream implementations are incredibly slow. I wrote a string library including conversions to/from all native types. It's very easy to do, so that's what I'd recommend.

sid6581
Thursday, August 05, 2004

The old C way to do this might be...

e=sscanf(str, "%*[ \t\n]%x%n", &i, &p);

(e<1 || *(str+p)!=0) --> failure

Afterward, i is the number if you succeeded.

A more general solution that avoids using streams (like lexical_cast)
would  be to use the old regular expression library as a precheck and
then run a naive converter when your string is acceptable an RE.

The soln above is using a leading suppressed conversion of spaces, tabs,
and newlines... most decent scanf implementations allow that to be replaced
by a single space but maybe there are other leading characters you'd want
to throw away.

If your implementation doesn't support %n, then you can always
convert one character after %x (using %c).  It's an exercise for
the reader to determine if that character will receive a NULL or whether
sscanf will report only one successful conversion.

Thomas E. Kammeyer
Friday, August 06, 2004

Regular expressions? sscanf? Come on now, that's just insane! Write your own conversion function, it's only a couple of lines of code and you can be as strict as you want to. Converting ints is to strings and back is trivial.

sid6581
Friday, August 06, 2004

It may seem trival, but how many buffer overflows have occured from using atoi() incorrectly?  Plenty. 

Also how many programs have blown up from allocating the number of bytes returned from atoi().  Plenty.  I am well aware of the problems with parsing input data, thank you very much.

It is MUCH better to use a proven off the shelf implementation in this case, and that's what I plan to do.  There is a reason this stuff is done is done in libraries.  It is common, and commonly done wrong. 

I stick to my guns on the exception handling.  An incorrectly formed input stream is not an exceptional case when dealing with the public internet traffic.  Treat as such and you are certain to find yourself with vulnerabilities.  In correctly formed requests should NOT be considered as an exceptional after thought.

In fact guru Andrei Alexandrescu agrees with me and explicilty stated in a talk I attended that errors in internet processing should not be considered exceptional cases. 

christopher (baus.net)
Friday, August 06, 2004

Could you please expand on that point of view?, 'Andrei says so' is not a reason.

To me exceptions are a design question, I am assuming that you perceive them as a performance problem?

Craig
Friday, August 06, 2004

> C++ is a logically superset, but the mindset is totally different.

Maybe.  C++ doesn't really enforce a mindset.  I've used C++ commercially for about 10 years, and I used to think like this years ago, but I don't anymore.  I've lost my C++ religion.

Every couple years somebody steps up and says they have the "right" way to do something in C++.  10 years ago it was all about inheritence, now it is all about generic programming and templates.  Nobody seems to care that data hiding is completely lost. 

I pretty much use a small subset of C++.  I rarely use exceptions.  I use abstract base classes and the strategy pattern.  I rarely use interitence other than from abstract base classes.  I use RAII and multiple return values extensively.  I would like like to use templates more often, but the thought of putting all my implementation in the header file doesn't sit well with me. 

I do what works.  If that happens to be a library written ANSI C, I use it. 

christopher (baus.net)
Friday, August 06, 2004

When I said trivial I meant writing your own, safe alternative that does exactly what you want is trivial. Something that is safer than atoi(). My conversion library works with std::strings, so nothing will blow up. (Except if there's a bug in the library, but then it would only have to be corrected in one place.)

The fact that you came here asking what you did is proof that the libraries aren't good enough. They don't give you what you need to do it safely, so you either need to find an existing solution outside the standard libraries or make something yourself.

As for exception handling, I agree. Errors parsing data should be expected and explicitly handled. My library does not throw exceptions.

sid6581
Friday, August 06, 2004

By the way, my usage of C++ is almost exactly like yours. I've been at this too long to buy into all the hype. I've found a sane subset that works, and I'm using that until something better comes along. I especially don't buy into the overuse (IMHO) of templates that seems to be so popular these days.

sid6581
Friday, August 06, 2004

> expand your thoughts on exceptions.

Certainly.  The question is what is an exceptional case?  In my opinion processing malformed public internet traffic is not exceptional.  It is is the norm..

I work on two C++ projects.  One is a a HTTP validator.  The whole point of the validator is to check input and accept or reject it based on how it is formed.  So processing malformed data is not an exceptional afterthought.  It is in fact the intent of the function, and program.

The try{}catch{} doesn't simplify the logic in anyway.

And performance is an issue.  The point is to efficiently reject malformed requests when faced with a large scale DOS attack.  If a request takes an order of magnitude longer to process because of the exception handler, that could be the difference between the server be brought to its knees, and continue to operate. 

I'm sorry to invoke Andrei here.  That is not an arguement. 

christopher (baus.net)
Friday, August 06, 2004

I must have had to to to much coffee today.  I seem to be stuttering. 

christopher (baus.net)
Friday, August 06, 2004

On the template thing.  I actually like the concept of templates.  I think it could be really useful and powerful.  It is just the implementation in C++ that I don't like.  The error messages alone are reason enough to run away.

I decided it wasn't good enough to say, I don't like templates, and I need to have a consise arguement, so I wrote an article to explain my position.  It got queued at CUJ and I haven't heard back in for ever. 

I'm currently reformatting it for Chuck Allison's new web site -- C++ source.  If it doesn't get accepted, I'll self publish it to my blog.

christopher (baus.net)
Friday, August 06, 2004

You use the phrase "exceptional afterthought" over and over, why are exceptions an afterthought?

Craig
Friday, August 06, 2004

I do use templates, I use them in my string conversion library for instance. But I don't use template metaprogramming and things like that much. IMHO code like that isn't terribly readable.

sid6581
Friday, August 06, 2004

If you look at C++ and not languages like Python or Java, I am one of the rare birds that tends to agree with Joel's policy on exception handling. 

I personally don't like that exceptions change the locality of the logic, and interrupt the flow of control.  That disruption may not bother some people, but it bothers me.

With exceptions the error handling logic becomes secondary.  If validating data is your intention, invalid data shouldn't be a secondary concern.  It is a primary concern.  Handling the result of the primary concern of a function should not be relegated to a exception handler.  Maybe with lexical_cast<> the primary concern isn't validation.  Fine, then it is the wrong function for me.

The best argument for exceptions is to provide the ability to perform clean up in one place with out using a goto.  In Java and Python this is true.  In C++ we fortunately have deterministic destruction.  That's what I use.

Other's say exceptions are great because of stack unwinding.  Well I've been unwinding stacks for a long time with return statements. 

I also don't like that the exceptions thrown from a C++ function are not specified in the interface as they are in Java.  At least return values are explicitly specified in the function interface.  To know what exceptions must be caught you must either hope the external documentation is correct and up to date, or look at the implementation.  Frankly I think this sucks. 

Some people claim that return values are bad because you can ignore them.  Certainly, but you are a bad programmer if you do that.  Oddly you can ignore a C++ exception as well.  If you do, I'll see you in terminate.  Usually not where you want to be if the user inputs an invalid filename.

christopher (baus.net)
Friday, August 06, 2004

I agree, I don't use exceptions much either, except catch exceptions thrown from code I don't control. (ActiveX controls, RPC calls, etc.)

I find that RAII and multiple returns gives the cleanest code. That's just my opinion.

sid6581
Friday, August 06, 2004

Upon thinking about this some more, I think one problem is that the function I want really performs two operations.

1) validate buffer
2) determine integer value of buffer.

Generally it would be better to make these separate concerns.  Then I would argue that is maybe better to just assert (hey you broke the precondition, sorry). 

The reason the two operations are combined is for performance reasons, since both operations can be preformed in one pass of the buffer.  Honestly if the result can fit in a long the size of the buffer isn't large enough for this to matter much. 

So...

Maybe the best thing to do is vaidate the buffer, then call lexical_cast<>...  But, unfortunately I still get the virtual function overhead...

christopher (baus.net)
Friday, August 06, 2004

One thing I have noticed with C++ programmers is that they tend to stake out some well known (to them) subset of C++ and then dismiss everything else as "hype" or "unreadable".

To me this is totally missing the point, templates are not "hype" they are just a new tool. If a tool is applicable to your problem, use it!, if not then don't. Inheritance, STL, templates, exceptions, boost are just tools not ideological positions.C++ gives you choices, most people choose to play favourites instead.

I see this "one method/solution looking for problem" approach over and over and it leads to bad code. I interviewed a guy for a position on my team, I asked him to write strrev, he couldn't, but appearantly he could write a meta program to calculate factorials!!, obviously this is the other side of the coin to what is being discussed here.

If one digs, most of the time people dimiss things they don't understand properly. C guys dismiss C++, old C++ guys dismiss templates and exceptions, medium C++ guys dismiss meta programming etc etc....

I think exception handling and return values are both part of good program design, just using one all the time is naive.

BTW this is not aimed at anyone, it's just a rant in general.

Craig
Friday, August 06, 2004

I've put A LOT of thought into exception handling, templates, etc.  Probably too much. 

I feel like I have a good, although not David Abraham's, level of understanding of these features, and I've drawn conclusions based on my use and understanding of them over a period of years. 

Policy programming, meta programming,  etc., are new paradigms to be sure.  Some of it is very cool.  Some of it is horribly bad.  Like waste a day figuring out compiler errors bad.

My biggest gripe about policies is that they are often used, with less expressiveness than their abstract base class based counter parts.  Maybe in some limited cases you can justify them for performance reasons, but that I believe is the exception (har har) and not the rule.

Sans concept specification and checking, I feel policies are mostly bad.

christopher (baus.net)
Friday, August 06, 2004

Policies are tools for certain types of problems, that is all.
They are neither good or bad, just misused.

Craig
Friday, August 06, 2004

Craig,

What you wrote seems like a direct attack on what I wrote, since you're quoting the exact words I used. Just to be clear, I'm not saying templates are hype. I use templates all the time, I find them incredibly useful. However, you are wrong when you say that these things are not also ideological positions. They are, to many. Some people just go nuts when they learn a new technique, and try to apply it all over the place. That's what I don't agree with. Templates are fine, exceptions are fine, just use them reasonably.

Personally I don't find that exceptions buy me anything over the way I do things now, in fact on the contrary I find them to complicate the code and cause confusion about who should handle what, what exceptions could/should be thrown, and which should be handled. There's also confusion about what errors are critical enough to merit throwing an exception, and what errors should be returned as error codes. Even the well known gurus that have spent a lot of time thinking about this have a hard time with it. If you like working with them, all power to you.

I use the techniques I find give me the cleanest code, and I use different techniques depending on the situation. I learn new things that come along, evaluate them, and apply them where they make sense. I'm not blind to the benefits of new approaches where they make my life easier, and I'm not one to be scared of new techniques because I've gotten into the habit of doing things a certain way.

sid6581
Friday, August 06, 2004

I wasn't attacking you "sid".

I find that in programs where many nested calls may be made in the act of processing some input, instead of plumbing back return values to some higher point (unless the code is just one big function:), which I'm assuming it's not), it's easier to just throw an exception, and handle all errors at the same point. This does not mean the code is littered with try/catch blocks (if done properly).

I write a lot of server software, it's often useful to adopt a transactional approach. Where you do the work off to the side then commit the data as the last act. This allows you to throw at any point and have auto pointers clean up the work if an error occurs. Obviously checking of return values is all part of this.

I think discussions on what is or is not an exception miss the point, exceptions are a tool, which can be misused or put to good use.
Exceptions are a complex issue in library code, but more straight
forward in application code.

The team that I lead write 27x7 call center software, it cannot just "assert" and die, all errors have to be handled even operating system exceptions (access violations). Anything can happen in a production enviroment.

This is how we do it, people may have other opinions, thats cool.

Craig
Friday, August 06, 2004

A Craig.  You are hitting my hot button here. 

The problem is, do you trust a system that has raised an access violation?  In C++ where the stack or other memory can be totally hosed, it is best to just get out and restart.  You can pretend like you can do something sane after an access violation, but realistically you can't. 

An access violoation isn't an operating system exception, it is a problem with the logic in your code -- a problem which must be fixed.

You are probably going to discuss how you throw exceptions from set_se_translator.  This is evil, evil, evil.  The problem is if you use something like ScopeGuard which does catch(...) to prevent throwing from a destructor, then you will mask access violations.  This is terrible as the system could be in a completely unstable state at this point. 

Asserts are very useful for checking invariants.  When invariants are broken, you are almost always screwed, and you want to know about them right away.

The only real HA solution for something like this is a voting type system where components are written to the same specification by multiple teams.  That's the approach the Space Shuttle uses, but it probably pretty rare in the world of buisness apps.

christopher (baus.net)
Friday, August 06, 2004

Craig,

Ok I'm game.  How do you handle access violations?

christopher (baus.net)
Friday, August 06, 2004

Check out http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_CRT__set_se_translator.asp
and
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/exception_record_str.asp
the same thing is also possible on Unix.

You can arrange for an exception to by thrown for operating system errors, such as access violations.
This can allow the application to continue, even if a fatal flaw may exist on a certain code path.

This information can be invaluble for onsite dignosis.
Often it allows us to patch the application without the user having a system meltdown.

Craig
Friday, August 06, 2004

> You are probably going to discuss how you throw exceptions from set_se_translator. 

Boy saw that one coming a mile away.

christopher (baus.net)
Friday, August 06, 2004

Why ask then?

Craig
Friday, August 06, 2004

I stick to my previous comment...  Certaintly...  Register a handler and log the exceptions.  Don't expect to recover from them. 

christopher (baus.net)
Friday, August 06, 2004

I have no idea how throwing a big fat exception "masks" errors......

Touche!, can't argue with logic like that.

Craig
Friday, August 06, 2004

Craig,

The reason I'm calling you on it is that I've had this discussion a ton of times, and I think there is a general misconception about exception handling out there.  Languages like Java and Python which can make some guarentees about the state of the stack and memory when faced with exceptions might be able to do something useful when faced with flaws in the application logic.  For instance a servlet container can catch unchecked exceptions from a servlet it is running, and print a stack trace, and restart the servlet.  Unfortunately we don't have that luxury in C++ within one process. 

It isn't that bad.  Logic flaws need to be fixed anyway.  There just isn't a way around it. My policy is to log, exit, restart when faced with a logic error.  This seems like a less than idea policy, but there really isn't anything else to do.  Logging is dangerous enough, and I wouldn't want to risk user's data by attempting to recover. 

IMHO exceptions are used in cases, especially in C++, where you'd be better off with an assert type mechanism. 

christopher (baus.net)
Friday, August 06, 2004

Look at ScopeGuard's implementation...


catch(...){
}

Consider the exception masked.

christopher (baus.net)
Friday, August 06, 2004

I feel it is you my friend who is missing my point.
I'm not saying catching a translated exception is going to leave the application in a known state, I never said that.

So I'll spell it out again.....

Exceptions are a useful for some things, that is all I'm trying to say. You can split hairs as much as you want, but in live software you need all the backstops you can get.

In a call center I can't just drop everybodies calls and restart the PBX.

Craig
Friday, August 06, 2004

"Look at ScopeGuard's implementation...


catch(...){
}

Consider the exception masked."

What is ScopeGuard?, why do you keep assuming I'm using it?

Craig
Friday, August 06, 2004

Craig,

If you are dependant on set_se_handler.  I'm curious.  Grep your code for catch(...). 

Sorry if this seems heated, but I'm trying to point out the problem here. 

The other problem is stack overflow.  It is one of my pet peeves.  Using set_se_handler to covert a SE to C++ exception totally breaks C++ exception mechanism when faced with a stack overflow totally breaks C++'s exception mechanism.  There becomes no such thing as a nothrow function.  It is just really weird.

christopher (baus.net)
Friday, August 06, 2004

ScopeGuard is an automatic RAII mechanism.  For good reason it uses catch(...) in its destructor.  I just use it as an example of how SE as C++ exceptions can be a problem.  Really the problem is catch(...)...  Or at least one problem. 

christopher (baus.net)
Friday, August 06, 2004

In a call center I can't just drop everybodies calls and restart the PBX.

And when faced with a access violation, how do you know that isn't going to happen?  Maybe when the hosed stack is popped, it will go directly to the restart PBX function.  That's the problem.  In C++ you just don't know what is going to happen.  You are in a completely undetermined state.

christopher (baus.net)
Friday, August 06, 2004

Gotta pickup Girlfriend from supermarket:), back in half hour.

Craig
Friday, August 06, 2004

In my opinion dropping all the calls and restarting to a known state might be better than having a 1000 phones simulataneously make a long distance call to South Africa. 

How do you know that isn't going to happen?  Your system is in an unknown state. 

Ok I'm tired now...  But is going to be pretty hard to convince me you do something reasonable other than log and restart when faced with an SE. 

Hell maybe you'll reconsider your strategy, and I'll save you a bunch of long distance fees. ; )

christopher (baus.net)
Friday, August 06, 2004

Right back.....

I think what we are talking about now is the correct useage of exceptions, not if they are good or bad.
catch(...) is no good, except in the main to log errors, anybody can see  that.
Catching translated exceptions is a LAST RESORT to keep the system taking calls while reporting an error.

Hard logic dictates that we don't have a known state, in reality the system keeps working without calling South Africa.
It's saved our asses a couple of times, we discover we have a problem with a certain condition on site we fix it, the system keeps working in the meantime......

Access violations and such are not always fatal, sometimes it can just be a bad iterator, which is thrown away anyway in the stack unwind. No point in droppping 100's of calls for this.

Craig
Friday, August 06, 2004

> Access violations and such are not always fatal

The problem is you don't know if the error is fatal or not at the point the exception is caught.

The truth of the matter this is always the case of looking at the worse case scenario.

I've worked on multiple systems with multiple worst case scenarios.  In our desktop app the worst case scenario the user might loose their workbook data, or it might be corrupted.

We attempt to save to another file and exit.  But there is always the possibility that the save routine could loose its mind and start writing to the disk in an endless loop.  Is this the right thing to do?  I don't know.  I can't say I'd want a pace maker to work this way.

In our web app, I generally shutdown and restart.  There are multiple levels of back up, so generally at most I loose one request.

We did have a problem where an error would slowly take down our whole system.  The user would just keep making the same request, before the other servers came back online.  There really isn't a lot to do in the case other than fix the error. You need a diverse implementation so two servers don't fail in the same way.  Again this is space shuttle type of availability, and it is VERY expensive and difficult to maintain. 

Here's an article I wrote that addresses that problem. 

http://www.baus.net/archives/000051.html

If you want cheap availablity, in the article I recommend vendor diversity if possible.  This is what the root DNS servers do.  Run multiple DNS implementations on multiple OSes with routers from multiple vendors, etc.  It is expensive, but uptime is everything for them.

Unfortunately, for custom software, this obviously isn't an option.

christopher (baus.net)
Friday, August 06, 2004

All your points are valid, I think each project requires a different approach. A pacemaker...thats a good case for a re-start.
But I'm still not conviced it's the best approach in all cases.

Our product is dependant on Cisco voice gateways, those things are constantly re-starting because of memory errors (dropping their calls in the process). This drives customers insane.

Perhaps a better approach would be upon catching an operating system exception, to schedule a restart for when there are no calls.
Hmmmm.... this has me thinking now...

Craig
Saturday, August 07, 2004

I read the shitstorm you started on comp.lang.c++.mod a couple of years ago on this subject....boy oh boy.
It seems though at the end of it all no strong conclusion is reached.

Craig
Saturday, August 07, 2004

> comp.lang.c++ discussion

Yep that was fun. 

David Abrahams is now doing some consulting work for us.  He now has a solution to the problem, and I agree with it.  Basically he says don't use _set_se_translator to convert SE to C++ exceptions.  Otherwise C++'s exception mechanism becomes hopelessly broken.  I see Dave's point now, and I agree 100% with it.

Honestly nobody had put enough thought into this, so I feel I did my part and brought the issue to light.  A lot of the C++ "gurus" totally ignore the real world environment in which the code must run.  Stack overflow is a real possibility when using thread/connection architectures.

christopher (baus.net)
Monday, August 09, 2004

> Perhaps a better approach would be upon catching an operating system exception, to schedule a restart for when there are no calls.

Personally I think the real solution is to figure out why the OS is throwing an exception, and prevent that from happening.  Once the OS throws an SE, you're pretty much screwed. 

I bet that the cisco device is failing from resource exhaustion.  This should never happen in a closed box system.  The developers know exactly how much RAM, etc. is available.  They should use it at startup. 

In my validation server I take a really obscure approach.  I call it "pessimistic resource management."  Basically I allow the user to specify the maximum number of requests that want to handle, then I allocate all the memory needed to handle the worst case scenario at startup.  I feel this far better than just restarting when I'm out of RAM.  You wouldn't believe how much this reduces the potential ways the server can fail. 

It would be better if I could force the OS to preallocate file handles and buffers as well, but that isn't really easy on most multi-purpose OSes. 

To try to tie this in with the original discussion, the reason that I don't just use boost because it is good C++, is a lot of times the STL and boost dynamically allocate memory, and leave me open to memory exceptions.  I am considering working around this using boost::pool_alloc.  Just allocate the pool at startup and limit the number of nodes in, say, std::map based on pool size.

christopher (baus.net)
Monday, August 09, 2004

*  Recent Topics

*  Fog Creek Home