Fog Creek Software
Discussion Board




Data validation: best practices

When writing data-centric applications, what are the best practices for implementing data validation?

Obviously, anything that you can catch at the user interface should be done at that level.  Also, I assume the appropriate constraints, triggers, etc. should be implemented at the database to ensure the integrity of the data.

But what about the middle-tier? Do you re-validate the same data that you validated at the UI?  Do you pre-validate the data prior to any database transactions?

On one hand, redundant validation would ensure that each layer is robust in its own right (e.g., in case of component reuse). On the other hand, it leads to added code, maintenance, and processing overhead.

What do most of you do?

Nick Hebb
Wednesday, November 06, 2002

<snip>
Obviously, anything that you can catch at the user interface should be done at that level. 
</snip>

Ummm ... no.  When writing a tiered application validating data in the UI is a no no.  I usually write a thin tier that sits on top of the business layer to do this.

I have had projects that started out as fat client GUIs that moved to web based interfaces.  By having this additional tier, there was no need to write awkard handlers via scripting languages.  The same code could be used for both platforms.

If your application is written with a modular approach in mind, this tier allows you to easily break out its components, if needed.  On larger projects this is essential, especially commercial software.

Also, you will find that your debugging process goes much more smoothly.

HTH

beach bum
Wednesday, November 06, 2002


As a general rule I try to make the middle tier the repository of all validation rules. I may also perform the same validation in the client and/or in the database but every rule is enforced in the app-server.

This may cost because of wasted effort but it means that any app that uses the application-server is gaurenteed to use the validation, even if I it is written by a third party.

I usually only use the database for highly data-centric data validation. (e.g. fk constraints, field-sizes, etc. ) I do not believe complex business validation rules (e.g. Accounts can only be in state X if condition A, B and C are true) belong in the database. I do not use triggers as a general rule, as their operation can be hidden from the general logic flow (especially if they are dropped or disabled). I prefer the validation rules to be out in the open.

I tend to do as much validation in the GUI as can be performed without having to do a round-trip back to the server. generally this means only the current object or record in question.

A generally try to use the GUI itself to control the type of data that is entered, so that it always valid. For example using text fields that are limited to the appropriate length, or using combo-boxes that are already limited to one of valid values that may be used. A really useful one is graying out the OK button until the data is entered correctly.

On technique I use with the client is to define the validation rules in textual format (e.g XML, or sometimes even javascript) and download them from application-server to the client. The rules contain the basic validation rule and a base error message to display. This means that you can change the validation rules in one place and they automatically propogate. It also allows you to dynamically change the language of the error messages.

Anyway that's my take, I am sure many would beg to differ.

JB
Wednesday, November 06, 2002

Let me vote for NOT greying out the OK button until the data is OK. This is incredibly frustrating for people who don't know what they've done wrong. It's almost always better to let them hit OK and then tell them what's wrong.

Similarly I tend to think you should never grey out menu items, even if they are obviously not relevant. Better keep it ungreyed out and explain why that menu item doesn't work now when someone clicks it.

This is strictly from a usability perspective, not a data validation perspective. I agree with the philosophy of making the UI represent the data model so closely that almost no validation is necessary (for example, in FogBUGZ, pardon my obsession, when you enter or edit a bug, there is literally nothing you can do that is not valid. When you submit a new bug or a bug edit it ALWAYS goes through, we don't even have a mechanism for the UI to fail.)

Joel Spolsky
Wednesday, November 06, 2002

>Let me vote for NOT greying out the OK button until the >data is OK. This is incredibly frustrating for people who >don't know what they've done wrong. It's almost always >better to let them hit OK and then tell them what's wrong.

Fair enough I guess.

I kind of find it useful myself, but I guess that is one of the first rules of useability is that not every one is like me :-).

Mind you in any case ( grey OK or no grey OKs ) you should generally tell people if the field is optional or not.

JB
Wednesday, November 06, 2002

It amazes me how many people do the data validation in Javascript on the client when building web applications instead of validating on the server.

This is one of the biggest security risks around as it means a malicious user can disable Javascript and post any data they like to your application.

The golden rule is never to trust the client, just because you are writing a GUI with verication it doesn't mean a user won't connect with their own client (say telnet) and enter malicious data.

Matthew Lock
Wednesday, November 06, 2002

The general rule is to be sparing  with data validation rules. The chances of fouling up and preventing valid data input is great, and the frustration it causes users immense.

Things are getting better now, but I've lsot count  the number of ficticious addresses and ZIP codes I've had to use on web forms because I live outside the States.

The question of greying out icommand buttons and menu items seems to split people. Everett McKay in "Developing User Interfaces for Microsoft Windows" argues that you should always grey out options that can't be used. However as Joel says it can be confusing; I'm often confused by having the formatting buttons greyed out when I open a new Outlook mail message, or not been able to alter Word 2000 Options when there is not a document open (the reason for the latter presumbably having something to do with not having a template to save in, but still lazy programming in my opinion).

I would however vote for having the inacessible options greyed out to start with. It's quite possible you never will have time to get round to writing all the error messages, and the worst scenario is having an option that is not greyed out but does nothing when you click on it.

Stephen Jones
Wednesday, November 06, 2002

I hate greyed out options.  Many people do not understand that the option is disabled on purpose, to them it's simply not working!

disabled options are stupid
Wednesday, November 06, 2002

just a few tidbits -

1) if you're working with a web app, then 'maxlength' is your friend. Use it. It works. Of course there's a bunch more to validation than just this, but it's so shit-simple, I can't see why it would not always be used (on the controls for which it applies, certainly). Surprises me how many folks just don't even include this attribute, leaving themselves open to any manner of over-long intputs making their way into the back-end. Having said that, you still need to rely on doing validation on the server because it's just too damned easy to get around a web browser and send http directly into your server.

2)  I also prefer to avoid pushing very much validation to the client because that just means more javascript in the client. The more javascript you jam into the pages themselves, the more you're asking for a page compatibility problem, at least for a public website. I've tested a bunch of them, and it's generally proven to be better to keep the javascript in the web page as lean and simple as circumstances permit. We typically do the bulk of validation server-side.

3) Based on my experiences, I suggest to you that the single thing you're likely to have the most difficulty with getting done regarding validation is not the implementation, or the decision of where to put it, it's going to be gathering up the validation rules you need and making sure everybody on your team has that info by the time they need it.  When I have to start working on the tests for a system, I always have to chase around for a data dictionary, for example, and when I get it it's normally not very complete. That's just one example.

4) might want to talk with your QA/Testing folks. They normally spend a large part of their time digging into the validation issues relevant to your shop and projects, they may have some insights that would help.

5) again for working on web systems, don't forget to check for valid commands embedded in a data field's input. At the current OpenHack (see www.eweek.com), the first successful attack, though minor and not causing any damage, was a scripting attack by submitting valid html in a web page form field.

cheers,

anonQAguy
Wednesday, November 06, 2002

Thanks everyone for the good advice.

I guess I've always been anal about wanting to catch a problem at the earliest point possible, so validating at the UI just made sense to me. Before posting this topic, I realized that most of the books on my bookshelf address what you _can_ do, but few address what you _should_ do.

> Let me vote for NOT greying out the OK button until the data is OK.

Joel, can you talk to the folks at Microsoft about this? I get tired of having to accept those damn EULA's before being able to click Next>>.

Nick Hebb
Thursday, November 07, 2002

Here is a great run down on validating data and security issues. The examples use Perl but the principles will apply to any language:

http://www.oreilly.com/catalog/cgi2/chapter/ch08.html

Matthew Lock
Thursday, November 07, 2002

As Joel says somewhere, so far as the user is concerned the application is the UI.  So, no matter where the actual validation is done the user must feel as if the application in front of them has done it and not some strange faraway animal.

There are few things more irritating to a user than to fill in a form, hit a button, wait some amount of time and then get some dialog or page saying 'You didn't enter x' or 'X is invalid'.

This is not to say that you should validate everything in the UI. 

You should give the user the greatest help in providing valid entries in the first place;by filling list boxes correctly and updating them if the context changes;by providing zip code/post code lookups if you validate addresses rather than have the user guess with their fingers;date formats should be validated in the UI as should all masked data entry;checksum credit card numbers before sending them;and so on.

The user shouldn't be left with the feeling that the application is looking to catch them out, or that in order to successfully use the application that they need to mirror the thought processes of the developer.

Simon Lucy
Thursday, November 07, 2002

I always have an small information pane on my data entry forms that tells the user what (if anything) is wrong with the data they have entered. FOr example in VB I have a simple status bar at the bottom of the form with an icon which pops up if the user has entered something bad and tells them what the error is. I use an error DLL with all the error codes and descriptions etc in it. Also, I grey out the OK button when an error is displayed.

Hell, the end losers can see what's wrong can't they?

Alberto
Thursday, November 07, 2002

I know several of the Delphi-based n-tier solutions (Datasnap, Asta) allow for constraints to be stored on the server, but to have them transferred to the clients along with data.  So the logic for making sure constraints are satisfied is on each client, but the constraints themselves are stored on the server. 

Taking advantage of this setup gives you advantages of each:  (1) constraints are stored on the server and can be changed with no need to modify clients, and (2) constraints can be checked immediately on the client without a round-trip to the server.

Herbert Sitz
Thursday, November 07, 2002

I completely disagree with Joel and others opposed to greying out action buttons.

It gives the user instantaneous feedback when their input data is not valid. It's better to find out as soon as possible when data is not valid.

For subtle/inobvious data format problems, an explanation of what is wrong can be drawn to an info bar at the bottom of the window as soon as the format goes bad.

It seems likewise absurd to me to not grey out menu options that do not apply at the moment -- greying out is much better than removing the menu items, because users prefer menus where the items appear in a relatively fixed postion as much as possible (temporarily disabled items should not be removed).

As much as possible, the interface should not allow invalid data to be entered, but when invalid data is entered, there should be feedback given.

Waiting until the user presses 'OK' to tell them there is a problem that could have been easily diagnosed at a previous time seems to me to just be lazy. When I am using software and press 'ok' and get some lame dialog that says 'One or more of your items contains invalid data. Please correct and resubmit.' I curse the programmer and the loins that bore him.

X. J. Scott
Thursday, November 07, 2002

An implementation question:

A lot of the validation which should be done at the UI level can be described in a straightforward manner, perhaps with something like XSD.

Are there tools which do XSD to UI mapping or the equivalent? I'm thinking of things like mapping choices to dropdowns (this item has to be one of 50 states) in addition to basic typing and required fields. I suppose you could use XSL + original data + XSD + second XSD to generate HTML, but that XSL would be pretty complicated on its own.

I'm basically trying to generalize this and not have to write code on a per-control basis, not even the code required by the .net validation controls. This also allows the validation to easily be driven by some data handling in the middle tier--we know you're responsible the west of the US, so the UI layer gets a list of western states as valid data.

mb
Thursday, November 07, 2002

X. J. Scott,
I think the reason people are saying graying out items is not a good idea is because it gives users no feedback as to *why* the option is not available.  If it is not grayed out, on the other hand, and the user selects it when it is not a valid choice, the UI can then say something like "Option disabled because no document is open" or something.

I'm not sure I really like that idea better as a user though... Maybe you can get the best of both worlds by just having the reason given in the status bar or in a tooltip

Mike McNertney
Thursday, November 07, 2002

As a general rule, I agree with Joel on greying out the OK button.  However, arbitrary statements like "don't grey out menu items" make me a little queasy.  There are too many axis to make blanket statements about UI.  Casual user vs. professional user.  Keyboard user vs. mouse user.  Trusted user vs. untrusted user.  Inquisitive user vs. tell-me-how user.  Asthetic user vs. utilitarian user.  The list is endless.  For example, we do accounting systems where some users want a simple, self-explanitory interface, while others just want to 10-key as fast as they can.  As a result, some parts of our app look awful, but perform well, while others are the opposite.

We have extensive data validation concerns in our app.  Fortunately, we can require a thick, windows client.  Even as such, a roundtrip over the internet is still slow enough that validation when tabbing out of a grid field is not practical (scalability is not a problem for us - low number of users).

The amount of data that we need to validate against is moderately large (up to 200,000 records).  We've taken a couple approaches:

- A nightly or even hourly process that preaggregates the data into a file, applies custom compression and places the file on the web server.  When "logging on" to the system, the client receives this file.  All validation and lists happen locally.

- A variation on the above where a cache is kept on the client.  When logging on, only new/changed/deleted records are sent over.  This is pretty nice.  Performance close to a desktop app, even over 56K.

With a lot of data, it helps to be performance conscious.  We found that the standard .NET DataSet classes were woefully inadequite in this regard (they send XML, even when in "binary" mode).  Some very basic manual serialization bumped end-to-end throughput by almost 60x (really!).

Depending on your audience, a thick client app with a great UI is not a bad way to go.  Deployment issues don't have to be the "DLL hell" nightmare that everyone expects with a desktop app.  Self-registering your COM objects each time the app starts and providing "no options" installation goes a long way.  Also, .NET nicely solves some thick client deployment problems with its Windows Forms stuff.

Hopefully, IT departments will wake up and realize that, while HTML UIs are good for them from a management standpoint, they're often not the best thing for the users.  In time, perhaps thick client apps will regain some of their luster.

Bill Carlson
Friday, November 08, 2002

I agree with not greying out the OK button, I also agree with X J Scott's loathing of 'one of more fields is invalid'. It should tell you exactly what's wrong and why (unless of course it's a security validation form). However I believe menu items should be greyed out when unavailable, since it immediately tells you what you can do.

Hmm, perhaps if they were greyed out, but could tell the user why if hovered over or clicked on that would combine the best of both worlds?

Mr Jack
Friday, November 08, 2002

I'm not sure if I like the fact that you have to go to the server for data validation.  Sometimes I hate it when I'm filling out a long form, and I click submit, I wait 5-10 seconds while things are churning, only to be taken back to the same page again (to see that I missed a requried field).

Whereas, if there was some javascript on the front-end...I click the submit button...a message box pops up "This field is required"...I click ok...and it focuses me right to that field.

Yes, they might have javascript turned off...well, I put the same validation on the middle-tier too.  Yes, it's extra work.  Yes, I'm duplicating the same code twice (unless someone knows something I can use on the gui side and the middle-tier that can be shared).  But, I think the programmer should put in all of the effort to save the user effort.

I'm not 100% set in stone with this thinking.  Just putting it out there.

Why go to the server?
Friday, November 08, 2002

Although there are some differences of opinion, I would summarize the best practices for data validation as follows:
1) Make the UI match the data model as closely as possible, minimizing the validation requirements.
2) Perform all validation checks in the middle tier, even if the same validation is performed at the UI.
3) Additionally, perform any validation that you can at the UI in order to provide immediate feedback to the user without a round trip to the server.

A couple of comments:
a) Only one person commented on validation at the database.  I would tend to be conservative here, myself.
b) Security: I've only worked on intranet apps, so the concern that someone is going to telnet in corrupt data hasn't really been an issue (but feasibly, it could be). Regarding JavaScript – I've always used standard buttons not submit buttons. The standard button calls a function, which in turn calls a validation function.  If the validation function returns true, then the form is submitted.  Otherwise, the user is alerted of the omission / error and the focus is changed.  So, if JavaScript is disabled the form cannot be submitted.
c) "Why go to the server" writes, "I'm duplicating the same code twice (unless someone knows something I can use on the gui side and the middle-tier that can be shared)."

You want to take a look at Microsoft's remote scripting: http://msdn.microsoft.com/downloads/default.asp?url=/downloads/sample.asp?url=/msdn-files/027/001/734/msdncompositedoc.xml .

An excerpt reads, "With remote scripting … the Web application can now validate data while the user is still filling out the rest of the form, without having to reload the page. Specifically, the script must first identify a server page to which to connect, then transmit the call to the server. Any return value from the call is transmitted back to the originating script."

I've used it before with good results.  However, I don't know if there are security issues with it, and it only runs on IIS. I'd be curious if anyone knows of any issues with using it.

Nick Hebb
Friday, November 08, 2002

The implementation question I ask above is exactly to remove the duplication of validation between the server and browser. I can write some rules once, and have the execute in jscript/whatever on the browser, and again in the middle tier, plus additional work.

You can do 'remote scripting' type stuff with some client side hacks: create an invisible iframe/layer, then have your jscript post the form to the invisible iframe and gather the results back. You can also use xmlhttp in a similar fashion. I haven't looked into the details of the official 'remote scripting' code, anyone want to give a one-sentence overview of how it works?

mb
Friday, November 08, 2002

It uses an applet (not visible) to communicate between the client page and a page located on the server.  When using it, you specify the name of the function that you want to call on the server page.  So, the server-side function could be written to create an instance of a COM object and run your middle tier validation code.  All this is done without navigating to a new page or requiring a page reload.

Nick Hebb
Friday, November 08, 2002

*  Recent Topics

*  Fog Creek Home