Fog Creek Software
Discussion Board




The Cost of Validation

I'm not too sure if Joel has ever covered this one.  Often I find that doing input and logical validation takes a lot of time and effort.
Everyone wants to write bullet proof software but to what degree do you try to anticipate every idiotic move a user could make?  Especially if a single obscure scenario can set you back a few days - is it worth fixing?
Joel wrote a while back about "perfection" and how ultimately things should boil down to a cost/benefit analysis.  To pose that question in terms of validation, not bugs, is the cost of the extra work worth it?
(Of course this is relative, but I'm interested in any benchmarks/tendencies you all have.)

David Seruyange
Wednesday, June 30, 2004

I think there are degrees of severity:

1) Display a cryptic error message that only a programmer can understand and fix (for example, exception text and a stack-trace), instead of a more user-friendly message saying "This isn't supported: please do X instead".

2) Crash the application.

3) Corrupt the user's data beyond repair.

Users (including "beta testers") might forgive 1) or 2) <submit a bug report and let you fix it>, but less so 3).

Christopher Wells
Wednesday, June 30, 2004

The more mission critical the app is, the more important it is to be validated.  If you are in a heavily regulated industry, you can either validate your app or use labor intensive processes to manually check outputs.  Short term it may seem worthwhile to manually check the outputs, but there is no sense in hiring a whole slew of workers whose sole function is to manually check over data before sending it out to the client.  It makes much more sense to just bite the bullet and validate your app.

Aaron F Stanton
Wednesday, June 30, 2004

Common sense needs to be applied, but I don't think the developer of the program can effectivly decide what's a valid test scenario and what's not. Ideally you will have someone who's not the programmer with business interests in having a long term successful product decide the level of testing required.

MilesArcher
Wednesday, June 30, 2004

I agree that input validation takes time, usually isn't any fun and generally is a pain in the ... but, it is worthwhile and necessary.

Validation is the last chance you have to make sure the data being entered is correct. Forget about the documentation; the user didn't read it! Forget about the training session; the user was busy thinking about either what they did last weekend, or what they were doing the next!

And this same user will fill your database with crap that they thought was 'correct' because they're from the planet Moron and like the Vulcans, they have a Machine-Mind Meld that allows them to intuitively know what they’re supposed to enter, except it doesn’t work. And when you confront them with that factoid, they look at you like YOU’RE the Moron and retort “Well, the program let me type it in.”

And that, you can’t argue with.

Rob Leighton
Wednesday, June 30, 2004

From a shrink-wrap point of view, your highest cost is support.

It costs time to answer the support calls.  If you have a dedicated support staff, this costs you money.

If you don't have a dedicated support staff, then the cost is time that could have been spent on development.

If you don't validate the user input adequetly and it results in unpredictable behavior, you're going to either lose a customer or get a support call.

Either way, writing validation code is going to pay for itself.

Richard P
Wednesday, June 30, 2004

Don't forget one the most important reasons for validation: security.

If you run an exposed web application, you can't ever trust any data that any user gives you until you've validated that it's safe/cleansed. Failure to do so will give hackers the ability to break your site.

Brad Wilson (dotnetguy.techieswithcats.com)
Wednesday, June 30, 2004

Often the simple solution for validation is improving the UI.

For example, if I need to prompt the user for a report start, and end date, then it is a simple matter to throw up two calendars, and then it is IMPOSSLE for the users to

Enter a start date earlier  then the end date (my “end date” calendar auto defaults to the start date IF start date is > end date).

In addition, users can’t get mixed up mm/dd/yyyy, or is that dd/mm/yyyy etc.

What about a city prompt? If you can use a combo box, then things like comma, or extra quotes in the string can’t mess up your sql.

Here is some screen shots in ms-access of what I mean by the above.

http://www.attcanada.net/~kallal.msn/ridesrpt/ridesrpt.html

<quote from above>

In all of the above examples, the user NEVER has to type in anything to select options for a report. This approach makes the application easy to use, in fact fun to use. In addition, it also means that VERY LITTLE code needs to be written to check for user input errors. In fact, the user can't make a input error, since the user is only presented with available options.

</end quote>

So, often a great UI solution to testing input is to simply limit the options, or provide some means to EASLY select the option. You actually kill two birds with one stone when doing this. The application is easer to use, and less code is required for validation of input, since the input options themselves are limited.

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Wednesday, June 30, 2004

"What about a city prompt? If you can use a combo box, then things like comma, or extra quotes in the string can’t mess up your sql."

This is almost true - the small problem is that any vaguely competent hacker-wannabe (forget the skilled hackers, even the wannabes can be a problem) can easily submit any data they want, and do all sorts of unfortunate things to your query - if you're not careful, someone's town name will cause the query to be completed and a second arbitrary query to be executed. This is, um, er, not entirely a new and creative attack and any website that isn't careful with processing input is just asking for trouble.

You wern't serious, were you? Would you actually just take a user's input and insert it into your SQL queries without any checking for commas, quote marks or other control characters?

It also fails to account for some annoying person who dares to live in Nowheresville (population 3 and a goat) who gets upset because his town isn't in your combo box - or you could irritate everyone who lives in the 10 largest cities in the country, who don't want to search through a list of every tiny village in existance just to find their own city.

How about having a simple text box, and just assume that most people actually know where they live? Then you just do some simple checking to ensure that the entire entry is treated as simple text and can't be used to abuse your SQL, and then all your problems go away. (Validation for security is always essential.)

Besides, if someone can't spell their own city's name, how do you plan to stop them selecting the wrong city from a list?

(Note: an ideal compromise would be an editable text box, with an optional drop-down box for the most common entries. This probably means web browsers need to provide more and better controls for forms, though. Given a useful auto-complete system for filling in standard fields most people should be able to type their address once and then not need it again regardless of what websites they visit.)

Another thought: if anyone actually can come up with a high quality address validation system, why not just apply it to the address typed in? That would you let you indicate that the address wasn't recognised, and allow the user to correct it on their own, or note that they've got some weird-ass requirements that the postal service can deal with even if your website doesn't know about the postal system everywhere on the planet.


Wednesday, June 30, 2004

I've always chosed the "one-right-way" over "many-wrong-ways" approach to validation. There are only a fixed number of ways my app will permit a address field to be. If that is not available, a help page is launched (or offered) to show what is the right way to enter.

So if permitted entries are 0-9, a-z, A-Z, both the code and the documentation explicitly state only that. It will not check for the _absence_ of /()-' or ".  That way even if someone managed to enter data like Ý which is not one of the valid entries, I don't need to add that to the exclusion list.

.
Thursday, July 01, 2004

>the small problem is that any vaguely competent hacker-wannabe (forget the skilled hackers, even the wannabes can be a problem) can easily submit any data they want, and do all sorts of unfortunate things to your query

Huh? What the hell you talking about? Who says we even using a web based application here? Where did you get such a stupid idea?

> any website that isn't careful with processing input is just asking for trouble.

Well, now..all of sudden we talking a web application!

Yes, and exactly what is your point? M point is that if you restrict the users choices, you save a ton of input validation code.

If you are telling me, or giving me an example where you can’t restrict the input, then duh? If you can’t restrict a users input, then how can my advice be of any use to anyone with a functional brain? (man, did you read my post…or do you just hate me?).

>You wern't serious, were you? Would you actually just take a user's input and insert it into your SQL queries without any checking for commas, quote marks or other control characters?

Yes, I am 100% serious. Only someone like you who would put words in my mouth, or assume that I was talking about a web based application. And, my advice applies 100% to web based systems anyway. Why do you assume that taking input from a web based system can’t restrict choices? Have you ever run terminal services (remote desktop) inside of a browser? You can most certainly restrict user input, and present those restricted inputs on a web bases system if you have the right tools, and I sorry to say, no hacker is gong to get around this.

How the hell can a user hack a check box value I present on a VB , or ms-access form? Did you not bother to read my post one bit. Or, perhaps you are just looking to un zip your pants, and do a big crap dump on me? What the hell did I do to deserve this crap from you?

Please go take a look at the sample screens shots and the link I provided. At least show me some that you made SOME effort here before you un zipped and start shooting at me.

Now, please turn your brain on, and explain to me how my users are going to insert, or change, or enter illegal sql data in those sample forms? (all of those forms do result in sql queries by the way).

So, I am VERY ABLE to take the raw input from those forms and use them for the sql. Just because YOU can’t do that in YOUR development environment is too bad.

>It also fails to account for some annoying person who dares to live in Nowheresville (population 3 and a goat) who gets upset because his town isn't in your combo box –

Why are you assuming that the application will actually allow one to report on that non existent city? Why even run the sql at this point? (that is dumb to run the sql when you don't know you have a legal city???)  Why bother?

Why even allow the user to enter a non legal city? How stupid can you get? Why should we even ALLOW, or present the user to enter a city that is not legal? Have we not heard of field level validation here?

I suppose perhaps YOU write YOUR software that allows combo choices that don’t get their source from the actual data. However, just because you admit to working in such a limited environment, that is just too bad.

The fact is, I would say close to 100% of my combo boxes are derived from the actual data (and, the limit to list is set = yes. This means ONLY choices from the combo box are allocated).  Further, a very high percentage of my combo boxes search/display by text, but return the primary key id of the record (not text, but pk values). Of course, if your combo box control does not allow searching by text, return a pk, and also limiting to list, then you are using a crappie combo control!

Further, the combo boxes I use have auto-complete, and thus they match the characters as you type.

>How about having a simple text box, and just assume that most people actually know where they live?

Why, you mean give up a nice drop down list that is pre-populated, and auto searches as the user types? (hum, gee I will have to ask my users which they prefer, some goof-off text box, or a nice auto compete list? Hum, I am straining my brain here to figure out if my users will prefer your suggestions on mine?...NOT!

My approach is both easier from the users point of view, AND ALSO REDUCES the amount of coding!. It is win win.

> Then you just do some simple checking to ensure that the entire entry is treated as simple text and can't be used to abuse your SQL

Why even have to bother to write the above code? My WHOLE point of my post was that you can eliminate all the problems of data input if you can limit the users choices. (you must not have read my post).

So, in place of asking the users to type in “yes”, or “no” to include overdue accounts in a report, I would simply offer the user a check box. How can a user hack a check box, and why do I need any input validation for the sql when I feed it a true, or false based on the check box? In your approach, you put up a text box, hope the user entering some allowable le text, and then process the text for bad characters etc. Golly, what a stupid idea in place of check box, or a nice auto compile combo box ?


>  (Validation for security is always essential.)

Its seems that you did not read the original post, or my post at all.

That concept (since you don’t seem to get my point at all).

That point in plain English:

If you can restrict the users options, and use controls to present options in place of open text box, then you save a ton of code, very little, if ANY validation of input is needed..

>Besides, if someone can't spell their own city's name, how do you plan to stop them selecting the wrong city from a list?

Now, there is a stupid remark! That is like saying you can’t use  spell checking on a word document? How can users spell check document? You mean each incorrect word they see, they don’t know which one to choose? (boy…you got me going here!).

Fact is, it is a ZIOLLON TIMES better to present a list, then having the user have to Guess the spelling. Are you actually serious on this point?

And, in my example, they might choose the wrong city..but THEY WILL be choosing a city that exists in the database.

With your example, the users are free to type in a animals name, or who knows what. At least my users will see a list of cities. Just seeing a list of cities is in of self a visual cue. (same goes for a list colors etc).


Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Thursday, July 01, 2004

However, there is no 'right' way to enter an address.  Different countries have different protocols.

If you can validate the address  then all well and good, if not you have to allow the user to enter their data as they know it.  If its their own address that's likely to be fine, if it isn't it will probably be wrong.

For countries with post codes or zip codes make those mandatory and validate them.

You can't necessarily strip all non alphabetic characters, the '-' is used frequently in place names.

Simon Lucy
Thursday, July 01, 2004

I have to say: Design By Contract.

Having spent some time with Eiffel I have fallen in love with DbC.  It solves these problems so elegantly.

For example, for your function that puts together the SQL you make it a precondition that the string only contains acceptable characters.

It also fixes the problems with checked or unchecked exceptions.  Rather than saying "What am I going to do when everything goes wrong," you say "I'm not going to start unless all this is right." 

On top of this it gives superior documentation.  Eiffel's short form says for input you must provide this, this and this.  At the end you will get that.

Eyes glaze over, voice becomes dreamy: Design by Contract, it is the true way. 

Ged Byrne
Thursday, July 01, 2004

Tomato, Tamaato

.
Thursday, July 01, 2004

I do what is needed for the users. If the app is a helper tool for only myself, it tends to have little to no validation. If it is used at the office by smart folks, it might have some. If it is used by harried folks who might have 2 seconds to think in between interruptions, I tend to make it about as bulletproof as if it were a mission critical app that runs in the middle of the night (I like to sleep, so I do NOT want to get pages at 2am).

Peter
Thursday, July 01, 2004

> If you can restrict the users options, and use controls to present options in place of open text box, then you save a ton of code, very little, if ANY validation of input is needed..

You don't really understand the problem of web security then.  Just because user input can be limited with a text box, doesn't mean some l33t h0x3r isn't going to write a script to break the site outside of a web browser.  All HTTP input data must be validated. 

christopher (baus.net)
Thursday, July 01, 2004

>You don't really understand the problem of web security then.


And who in the hell made the claim that this is, or is not a problem?

Why are you attaching me on this issue? Where did I claim this is , or is NOT a problem?

Exactly what problem am I not understanding here?

The only problem I see here is bunch of miserable people aiming guns at me, and for what reason?

The fact is, who said anything about this being a web based application? Where was this assumption make? Further, where did I make such an assumption?

What exactly are you eating for breakfast?

>All HTTP input data must be validated.

Fine, anyone with a brain knows that, but why the hell quote my post, and make such a ridiculous assumption on my part? Where did I imply the above? In fact, I specificity mentioned that even in a web based environment, if you use a non http means of delivering web based stuff, then my points on validation still work. (I mentioned TS inside of a web browser, and thus we are now not using http..and in fact he screen is only a graphic image, and is in-vulnerable to hacking). Where in the hell did id I talk about http, and a web form?

What the hell does a web form got to do with my post?

When on planet earth did I start talking about a http web based forms here?

Did you not read my post? Please, make some half based human effort here before you un-zip your pants and start dumping on me.

And, further, where did the original poster even mention web based forms?

Are you really this ignorant and stupid?

Did you not bother to read my post, or are you just being a 100% first class jerk?


Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Thursday, July 01, 2004

The cost of not validating is too high in my opinion. Any tainted data must be validated.

I read about DBC in the pragmattic programmer and Iwas doing it and still do.

One thing I do is I run all tainted data through a routine. Since tainted data comes from *outside* it must come becuase of *requirements*. Rarely, as in never, does anyone think should a state be free-form or a select or if free-form must we match it with a state or should a phone include area code and fail if not or does a zipcode include zip+4 or what kind of matching do we do on it and do we have Canadaian user or just U.S. or worldwide how do we handle a failure. Nope, like never.  I'm waiting over 8 months for IT to get back on my question about how we standardize on country names. If you do this you can write up a config file that can be used by your input validator engine. This is what I do. So it's not a lot of time ... let me see:

clean_data = cleanData(dirtydata,'config_name')
(I use hashes to group data according to origin/purpose)

where the config_name matches where you got that data from so it knows the keys and what sort of cleaning you want to do.

The hard part is just trying to get anyone to think of the rules for  your config file. The engine rarely changes.

me
Thursday, July 01, 2004

"Improving" the UI can help in some scenarios.  But often times I find I have a variety of users.  Some who are familiar with the application's rules, use it a lot, and prefer to keep their hands on the keyboard.  Versus others who are new to the app, not as computer savvy, or whatever, and prefer point and click.

As a recent example, I required a date-time picker that would restrict the user to selecting dates based on an arbitrary recurrence cycle and reference date attached to the database record they happened to be working with.  The .NET wrapper around the same old buggy drop-down picker control wasn't cutting it, so to make everyone happy, I created a combo-like picker control that bolds the allowed dates, but you can also type in a date if you like.  Either way, validation takes over and ensures your chosen date matches a valid date (and chooses the nearest match for you if it doesn't).

Overall I think validation is one of the most important things your application can do.  Otherwise, you'd just set up a big flat table and let everyone bang up against it with Access.  But it's important to find ways to make the validation code generic and reusable so that you don't spend 80% of your time on it.  For example, I have a single static class in my app that I register appropriate data bindings with, and it handles formatting and parsing dates, times, SSN's, phone numbers, zip codes, currencies, null values, etc.  ~ 350 LOC, all in one place...piece of cake.

Joe
Thursday, July 01, 2004

*  Recent Topics

*  Fog Creek Home