Fog Creek Software
g
Discussion Board




Monolithic Function

I have no idea what to do about this:

I have a class, call it "customer", for blandness sake.  This is a decent sized customer class, and it has about 70 properties, things ranging from "First Name" to "Address (line 1, 2, 3, subdivision, etc.)" to "Mother's Maiden Name."  It is used to hold data coming back from a legacy system.

Now that's not so bad.  But part of the required functionality is to create a new customer.  The New Customer function that I have to emulate has around 80 parameters, some (but not all) that overlap with the properties on the class.

Generally I like making my functions with the parameters explicit.  But do I really just go ahead and make a massive function requiring 80 parameters?  That seems incredibly unwieldy.

Another option would be to make the New Customer call take zero (or just a few) parameters, and then before you call it you have to populate the object with data that will get used in the call.  Then the first line of the method will be checking to make sure you have put in all the required data.  I hate this approach, but I'm open to reasons why it's best.

Am I missing some other elegant and concise way out?  It seems like it should be easy, but I have nothing.

Dignified
Friday, April 16, 2004

A simple, general answer would be "if the data is required then make it a mandatory constructor parameter, otherwise make it an optionally-settable property."

> Am I missing some other elegant and concise way out?

A single constructor parameter, consisting of a struct with 80 fields?

Christopher Wells
Friday, April 16, 2004

You could have a few smaller functions that break up the functionality.  For example on Amazon, you've got customer information, preference information, billing information, and shipping information.  So you could break the function up into a few smaller chunks.  For that matter, do all 80 properties really belong in one class?  Would it be more appropriate to split out sperate classes as mentioned above?  Just some ideas. . .

Elephant
Friday, April 16, 2004


If some of the properties are only used on some of the methods, then it does seem kinda screwy to force the caller to populate all of the properties. I wouldn't want to set all 80+ properties, just to call MethodX that only uses 5 of them.

If all of the properties do have to be set before any methods can be called, then I agree passing in a struct to the constructor is a good way of doing it.

Otherwise, I'd just let the user set the properties that they want and then within each method check to see if the appropriate properties have been set.  Sure, this requires more validation code but it's probably a cleaner design that forcing the user to set a bunch of un-needed properties.

Mark Hoffman
Friday, April 16, 2004

I think I explained it poorly.  Sorry for that.

When I say there is a "New Customer" function, I'm not talking about the constructor.  That's my mistake for putting "new" in the function name. :)  This object basically sits between a type of database (which is miles away, and which we have no control over), and a UI (which is currently non-existant, and several different ones will be developed).  So, the "New Customer" is used to generate a new customer record in the database.

For clarity, there is also a "Load Customer" function, which gets passed one ID, and populates the object from the database.

So the constructor approach isn't really what I'm after, since it's very feasible (and almost certain), that the majority of use will be pulling back existing customers and displaying their data.

Does that make more sense?

Dignified
Friday, April 16, 2004

Is this the real Elephant?

RP
Friday, April 16, 2004

This time it is.  I know that Joel wants a low barrier of entry for posting on the site, but I wish there was some way that the frequent posters could lock in their aliases.  I guess that doesn't really present any value add, so it goes by the wayside.

Elephant
Friday, April 16, 2004

"You could have a few smaller functions that break up the functionality."

How would this be done?  The "New Customer" call is basically a one to one mapping of the stored procedure in the database that creates the Customer record.

If you want to create a new customer, the 80 parameters have to be specified some how before that call.  I don't know how I could split it up to be more effective, unless you mean having 8 different function calls which populate 10 parameters a piece.  And then call the 9th to actually create the customer, which doesn't seem that grand.  Unless I'm not understanding you exactly...

Dignified
Friday, April 16, 2004


Couldn't the customer class just have a method:

sub InsertIntoDb()
{

}

?

Matt H.
Friday, April 16, 2004

In other use:

Use the class to encapsulate and abstract DB serialization, updates, loads and deletes.

Matt H.
Friday, April 16, 2004

Do you really have 80 columns in your table?  If this is the case, then I guess you're looking at one monolithic function, or passing in a struct.  I really think that some of this information couldn't be better segmented out.  There hasn't been enough information presented though for any conclusions to be drawn.  I am just merely offering up an idea.

Elephant
Friday, April 16, 2004

Draw a state chart for your customer object. If it has just one state "valid", then you could have the one initializer doing the whole enchilada (you can help it out by chunking it a bit). If you are seeing more possible states, e.g. an isBillable or hasEmail states ( with matching predicates)that will allow certain operations even though the object might only be partially specified, then go the partial way with only the minimal universally required fields needed at first.
Whatever you start out with, over time your application will migrate to this second form anyway ;-).

Just me (Sir to you)
Friday, April 16, 2004

"Do you really have 80 columns in your table?"

Maybe.  Maybe not.  I have no idea.  The company who owns the database is hundreds of miles away, and they basically exposed an RPC for "Create Customer" which requires 80 parameters.  So if I want to create a customer, they need 80 parameters from me, one way or another.

Dignified
Friday, April 16, 2004

"If you are seeing more possible states..."

There are more possible states.  Tons of them.  Load Customer requires only one field.  Create Customer requires 80.  Change Address requires about 5.

Thinking of this, I'm guessing the "populate parameters before the call, combined with a validation routine for the required parameters" is going to be the way to go.

I don't know why it makes me feel so dirty. :D

Dignified
Friday, April 16, 2004

I'm not sure why everyone is making this out to be so complicated!

So you have a Customer object and it has 80-odd public fields.  I see the process for creating the customer as this:

1. Create a new customer object.
2. Fill the public fields of the customer object with data.
3. Call NewCustomer() on the object.

Now NewCustomer() takes all those fields and does that huge RPC call to create the customer.

If some fields MUST be filled in, the NewCustomer() does the data validation and raises an exception if any of the fields are not filled in.

This is certainly MUCH smarter than filling in a big struct or having a function with 80 parameters!

Almost Anonymous
Friday, April 16, 2004

Just a nuance, but instead of NewCustomer, make that SaveCustomer - if the id property is unassigned, it implicitly implies the creation of a customer, while a value indicates the modification of an existing customer. Put a getter on the id (or make it read only with access functions) that auto-loads the related customer when the id is set.

Dennis Forbes
Friday, April 16, 2004

AA,

I agree, that seems to be what I have to do.  But for some reason, it just seems poor to have 81 lines of code (all the property sets, and then the call) to call an RPC.

Plus, there's not a good way to figure out at design time what the function takes without looking at the documentation, or checking the code for the ValidateData routine.

Oh well.

Dignified
Friday, April 16, 2004

If you have too many variables, and especially if you need them in the constructor, it's a bad smell indicating that you should start grouping your properties into structs/beans/whatever.

For instance, you can have a customer have a street, city, and state, or you can have him have an Address. You can have him have Address, email, and website, or you can have ContactInfo. etc.

genius
Friday, April 16, 2004

Dennis,

That's an interesting point.  I don't think it's a good idea in this case for a couple reasons:

1.)  There is no "Save Customer" RPC.  The "New Customer" RPC takes 80 parameters, but saving the customer is done through *several* different calls; one for address, one for name, one for status, etc. etc.
2.)  Each RPC call has a cost.  I don't mean like network time, overhead, and the like, I mean dollars.  Or really fractions of a penny, but that's neither here nor there.  So then a "Save Customer" generic function would need to have state values tracked to find out if information is fresh, or "dirty", and then call the appropriate updating RPC's.  I think this would add a lot of complexity, and would be easy to introduce errors, or at the very least, cost-wise inefficiency.  For this same reason, I'm not a fan of having "auto-load on ID change" functionality, because somebody could change the ID, but not necessarily want to load the new info.  If it costs, I like the programmer consuming the object to explicitly say "Call LoadCustomer()... OK, that's gonna cost me $0.0001," or what have you.

The third reason is one that could make for an interesting discussion.  I like having explicit function names to describe what's happening.  New Customer creates a new customer, and save (or update) updates the existing record.  Is it easy to read code, if you call Save Customer without an ID, it actually calls code to create a new customer?  With an ID it's totally different code.  Plus, extra parameters (not part of the object) are required for New and not Save.  Do you like this style of design?

Dignified
Friday, April 16, 2004

"For instance, you can have a customer have a street, city, and state, or you can have him have an Address."

The object that I'm creating does have those distinctions, and object properties.  The RPC needs it all one after another, and has no notion of any objects (or beans/structs).

In fact, the database gives me everything in Text, and takes everything in text.  So there's a considerable effort required in boxing the data into meaningful types in the first place.  But that's a headache for another thread.

Dignified
Friday, April 16, 2004

"I agree, that seems to be what I have to do.  But for some reason, it just seems poor to have 81 lines of code (all the property sets, and then the call) to call an RPC."

Maybe I'm missing something here, but I really don't see how you can avoid it.

You have an interface to a function that takes 80 parameters. Unless you can change that interface, you're stuck with having to pass 80 parameters. Something like

MyRemoteCustomerSvc.CreateCustomer( Param1, Param2, Param3, ..., Param80);

Does is suck? Verily so :)

The least they could've done is use a struct. It would be a lot cleaner when change hits the fan.

Study case (sort of): One of the apps I "manage" (whatever that means) has an interface with SAP. The interface accepts/returns struct-like objects. We're not the only ones to use that interface, and a few months back another parameter had to be added, because of another app. Since we don't use that parameter, we didn't have to change anything in our app, we didn't even have to rebuild - the SAP people just added another member to the struct-thingy, et voila.

I'm not tying to take the credit for this - it was done by our SAP team :)

Paulo Caetano
Friday, April 16, 2004

"1.)  There is no "Save Customer" RPC.  The "New Customer" RPC takes 80 parameters, but saving the customer is done through *several* different calls; one for address, one for name, one for status, etc. etc."

This was obviously an unknown requirement. Out of curiousity, in this object how do you deal with clients changing the individual element? If one changes the name, do they then call "SaveName"? Do you save each time a property is changed?

Dennis Forbes
Friday, April 16, 2004

"Out of curiousity, in this object how do you deal with clients changing the individual element?"

Basically how you said we'd have to deal with it.  Since the object I'm designing is going to be used internally only, the people who design the UI will have to determine how best to present the data.  So if they have a "Personal Information" screen, they will need to keep track of which data is dirty and clean, and call all the appropriate methods, all of them, if necessary.  At the same time, we'll have the ability to expose function level interfaces, for things like web services.  They could have a "Change Address" screen, a "Change Name" screen, etc. etc.

I don't think that would be a good UI, but the flexibility is there, in case that's what the programmer wants.  And who can predict the whims of marketing?  Certainly not I. :)

Dignified
Friday, April 16, 2004

"1.)  There is no "Save Customer" RPC.  The "New Customer" RPC takes 80 parameters, but saving the customer is done through *several* different calls; one for address, one for name, one for status, etc. etc."

I recommend that YOU unify this under a single interface.  Rather than propagate this poor interface to the next level.  I recommend going with a single Save Customer which does the multiple RPC calls necessary.

"So then a "Save Customer" generic function would need to have state values tracked to find out if information is fresh, or "dirty", and then call the appropriate updating RPC's."

I don't think this adds alot of complexity.  Essentially you are encapsulating all this weird RPC stuff under a single interface.  The users of this interface don't need to know that there are all these RPC calls being made.  They simply set the values and call save.  Nothing could be simpler.

Rather than setting "dirty" flags, I'd have a complete copy of the data.  I'd then compare each field with the backup copy.  And if the field has changed, I'd make the appropriate RPC call.

Furthermore, I don't think there would be any cost-wise inefficiency in this.  The Save() call is going to cost no matter what.  Your internal work to determine what has changed and what hasn't is the optimization to reduce the number of RPC calls.  This keeps all this work in one place.

"For this same reason, I'm not a fan of having "auto-load on ID change" functionality, because somebody could change the ID, but not necessarily want to load the new info."

I don't like this either.  But you're never going to want to change the ID of existing record.  I leave this field as read-only and the only way to change it is to call something like LoadCustomer(ID).

"Is it easy to read code, if you call Save Customer without an ID, it actually calls code to create a new customer?  With an ID it's totally different code."

In most cases, the code is nearly identical.  For example, if this were backed onto SQL.  The only difference would be that it's an INSERT or an UPDATE query.

"Plus, extra parameters (not part of the object) are required for New and not Save."

I think the point is that there isn't any extra parameters between new and save!

"Do you like this style of design?"

Yes.  Basically how this is most beneficial is when you are using the same screen for editing as in creating.  When editing an existing record you pass a loaded object to the screen and it fills in the screen.  When you click [Save] the save method is called on the object (updating the record).  When creating a new record, you pass an empty object to the screen.  When you click [Save] the save method is again called but this time a new record is created.  Code reuse at it's finest.

Later,

Almost Anonymous
Friday, April 16, 2004

"I recommend that YOU unify this under a single interface."

In concept I think this sounds like a good idea.  But would I still implement the atomic level updates as public functions though?  It could be that someone could code a batch app that only changes names, or some other contrived example. :)  Would it still make sense to only have the "Save Customer?"  If not, I could have an ID, and the new name, and I could just call the method "UpdateCustomerName".  With the all-encompassing "SaveCustomer" scheme, I would have to call "LoadCustomer(ID as long)", and then update the property and call save.  2 transactions versus 1.  And they all have a cost.

"Rather than setting "dirty" flags, I'd have a complete copy of the data.  I'd then compare each field with the backup copy."

I don't agree with this.  Much of the data is held in strings, and it's not like there's only one app utilizing this on the server at one time.  I'd be effectively doubling the size of the object, plus introducing O(n) algorithms on every string type to compare.  At least with a BitArray, I could get by with much less data storage (I'd still have to execute the O(n) strlen on the property sets if I'd want to save transaction cost.)

"Furthermore, I don't think there would be any cost-wise inefficiency in this."

I agree, if done properly, there's no more cost-wise inefficiency.  My worry, like most things, is that I'll make an error somewhere in the implementation, or worse, somebody extending the class will implement the logic incorrectly, causing such an inefficiency.  Perhaps I have a trust issue. :)

"In most cases, the code is nearly identical.  For example, if this were backed onto SQL."

Right, but this code is not identical.  Unless you are saying that I should implement a "private" save customer method as you suggested above, and then the public save customer that everyone uses, automatically determines (based on ID) whether or not it is CreateCustomer or SaveCustomer.  In which case, it would be easy to implement, but for some reason I still prefer the explicit New versus Save.  I just think it's easier for a newcomer to the code to grasp.

I hope you don't think I'm being argumentative.  The different approaches interest me greatly.

Dignified
Friday, April 16, 2004

As I alluded to in my original message, you should be abstracting and hiding petty implementation details that the client has no need for.

Honestly with the limited information that I've seen regarding this, my solution would be an object for each distinct set of properties (Address, PhoneNumber, etc), each with a dirty flag, containment into one master object (Person), again with a dirty flag for its own properties, and a save automatically cascades to contained objects.

Dennis Forbes
Friday, April 16, 2004

"It could be that someone could code a batch app that only changes names, or some other contrived example. :)"

For contrived examples, I recommend implementing static functions as necessary (but not ahead of time).  You example of UpdateCustomerName(ID, Name) is perfect.  Just drop that static member in and document as an optimization.  Remember, don't optimize prematurely.

"At least with a BitArray, I could get by with much less data storage (I'd still have to execute the O(n) strlen on the property sets if I'd want to save transaction cost.)"

My thought is that if you have a screen your code probably looks like this:

// Save customer details
Customer.Name = Screen.Textbox.Value;
Customer.Address = Screen.Textbox.Address;
...
Customer.Save()

Now this will set all your bitflags to true even if the user didn't actually change the name or address on the screen.  Now you could be more careful in tracking changes:

If (Screen.Textbox.IsChanged()) Customer.Name = Screen.Textbox.Value;

If that IsChanged() is something that's easily implemented in your environment then go for the bitflag method.  If not, the screen compare is going to save you RPC calls (see below) at the expense of memory (cheap) and a few string compares (also cheap).

"My worry, like most things, is that I'll make an error somewhere in the implementation, or worse, somebody extending the class will implement the logic incorrectly, causing such an inefficiency."

The problem is that you don't trust your own code but you trust anyone elses.  If I use your code in a screen then I'm going to do something like this:

// Save Everything
Customer.Name = Screen.Textbox.Value;
Customer.Address = Screen.Textbox.Address;
...
Customer.SaveName();
Customer.SaveAddress();
...
Customer.SaveSomethingElse();

Now you're expecting me, the implementer, to do the right thing.  *I* have to all the work and I have to do it *everywhere* I use the Customer object.  Further I have to know *WHY* I have to be careful.  You're throwing encapsulation and code reuse out the window.

"Right, but this code is not identical.  Unless you are saying that I should implement a "private" save customer method as you suggested above, and then the public save customer that everyone uses"

Basically, See this Pseudocode:

function Save() {
  if (ID is NULL) CreateCustomer()
  else UpdateCustomer(ID)
}

Although passing the ID to UpdateCustomer is not necessary in this example (since it's a class member) but this gives the additional option of making both of these functions public as well.

"I hope you don't think I'm being argumentative.  The different approaches interest me greatly."

No worries.  I'm pretty argumentative myself.  I think you're worried about making things too complex in your class.  But this complexity exists -- it's just up to you to decide WHERE you want it to be.  My opinion is that it should all be contained in the class.

Almost Anonymous
Friday, April 16, 2004

BTW, Dennis's idea is a really good one...  split it up in separate classes (one for each corresponding RPC call w/ dirty flag for each).  Allow the customer object to be treated as a single item; with Load/Save or Load/Create/Save functions.  But then also allow each sub-object to treated individually as well, with corresponding Load/Create/Save functions.

This gives you maximum flexibility.

Almost Anonymous
Friday, April 16, 2004

I think everyone is trying to make the same point here, so I'll try again :)

You have some very bad smells in your code. Huge number of parameters is one. Huge number of getters and setters is another (more on that later). Customer.Save* is probably the worst.

The cause of these is because your classes are not very Object-Oriented. The reason for that is because fundamentally it's hard to implement a truely OO O-R mapping. It requires a lot of effort to do right, and it's much easier to write the APIs to follow the table structures rather than business logic.

However, if you just do the simplest thing possible, the result is that the library is hard to use, and the code that uses it will be error-prone (because you leave it easy for the client to end up in an inconsistent state, and make it hard to keep it consistent).

Think about it from a Object Oriented point of view. What does it mean for the Customer to Save its Address?

Anyway, normally I wouldn't recommend Dr. Evil's articles, as they are too exteme, but I think in this case they might be just what the Doctor ordered ;)

http://www.javaworld.com/javaworld/jw-09-2003/jw-0905-toolbox.html

And just in case that wasn't enough:
http://c2.com/cgi/wiki?AccessorsAreEvil

genius
Friday, April 16, 2004

The getter/setter link was provided in a prior discussion (I believe it had to do with accessors, but I can't find it in the search), and at that time it was GUI specific, and the disclaimer by the author that getters and setters reveal too much implementation detail was absurd (as the only class that would fit the author's ideal model would have no methods, properties, fields, or interfaces, as all of the above reveal implementation details -- the class is implementing something).

Of course I feel that most arbitrary "X is evil!" discussions are just that: Arbitrary. Someone diluting the language to promote their own personal mission, however absurd. "Case statements deemed evil - story at 11"

Dennis Forbes
Friday, April 16, 2004

Thanks Dennis...  I was going to point out the absurdity of that as well. 

Perhaps, we shouldn't use the term "object" or "class" to describe this situation.  I'm thinking we should just call them "smart structs" and then everybody will agree that it needs getters and setters (or public fields).

Almost Anonymous
Friday, April 16, 2004

*  Recent Topics

*  Fog Creek Home