Fog Creek Software
Discussion Board

The problem of the obvious

I do some programming and I'm thinking about what is the appropriate level of checking function parameters.

There is a program which gets the content of a URL (web page in this example). The user can enter the URL into a textbox, and when clicking a button the program retrieves the page and displays the raw content it in another textbox.

The question is how many checks do I need to add into my program and where to put these. I also want to not duplicate too much validation logic.

Some options:
1. I can check if the URL is empty before starting retrieve the page
2. I can check if the URL is a valid URL (correct format) before starting retrieve the page
3. I can check the URL, and if it's not empty try to format it into a correct URL (eg. into
4. I do not need any checks for the URL because the underlying internet function will return an error in any case

Now probably any of the above option would work but for each option I can put this logic into several places:
1. On the event handler of the button
2. Only in the function which retrieves the page
3. Both places, one is a UI check, the other is an internal class check

How do you design/program these types of problems?

Tuesday, April 13, 2004

Make a URL parser/builder class that takes care of all of that.  You can do much of the error checking that you mention just by defining what a URL is by a simple grammar:

url ::= (protocol "://" | $) machine ("/" subdirectory* ("?" ... | $) | $);
protocol ::= ...

Then you can extract the data that is necessary, using defaults for eg the protocol.  An invalid URL string (including an empty string) won't match the grammar.

All of this could easily go behind the programming interface to your library.

Tuesday, April 13, 2004

Agree with the second poster. Assuming you're working in C++ or a language with similar OO and exception facilities, make a URL class which has a constructor that takes a string which is the input URL. Have the constructor check that the URL is syntactically valid, and throw an exception if it the string is invalid.

Then, any URL object is a valid URL. That makes using it way easier.

Exception guy
Tuesday, April 13, 2004

better yet, use someone else's URL class--there are all kinds of wacky rules beyond the one mentioned above.

(note that urlmon and/or wininet can help you with this too on windows)

the real question is what do you want to allow--just http(s)? anything the underlying OS can handle? something in between? the danger is that the user types in something funky which makes the OS do something you didn't quite expect (c:\windows\system32.calc.exe)

Tuesday, April 13, 2004

"URL parsing class", my ass! Learn how to use regular expressions.

Wednesday, April 14, 2004

Regardless of what you choose, avoid putting any kind of logic in an event handler.

As a rule, event handlers should forward to your app components, which is where the logic should be.

Paulo Caetano
Wednesday, April 14, 2004

That's brilliant Egor, you just go ahead and stick your URL parsing regular expression everywhere throughout your software.  Whenever you need to parse a URL, just copy/paste that regular expression into your code.

Of course, if there's a problem with the expression or the ordinality of a capturing group has to change in one instance, you've got a lot of problems.  Maybe your language has dynamic typing and native support for iterators on a subsequence of capturing groups and is aware of url encoded strings, so you could look at all query string fields and do something with the data very easily, but that's probably not typical.

In any case, you'd think that since somebody *already brought up* defining a URL with a context free grammar, bringing up regular expressions would be redundant.  But that person still thought that putting the grammar behind the interface of a URL parser class was a good idea, so maybe you ought to consider why.

Wednesday, April 14, 2004

Use an already written library to parse URLs, unless you want to spend a lot of time reading RFC's.  The sample grammar shown here by the second poster is thoroughly bogus, as even a few minutes of reading RFC 2396 will reveal.  :)

Phillip J. Eby
Wednesday, April 14, 2004

I thought it was clear that I wasn't presenting a complete or correct grammar for all URLs.  Obviously I left out alternative ports, user/password arguments, etc.  The point is that it's complex enough to require a formal grammar, and it ought to be put behind a reader/writer class.

Wednesday, April 14, 2004

*  Recent Topics

*  Fog Creek Home