Fog Creek Software
Discussion Board




Embarassing OOD / performance question

Ok, apparently I've been living in a cave, and I don't grok how to handle this trivial OOD issue that's been bugging me for a while. Frankly, this is embarassing.  RTFM is an acceptable response, BTW, so long as you provide a link ;>

Let's say, oversimplifying incredibly, I have two data tables name Employee and Company, respectively. Obviously, the employee table has a foreign key back to the company table.  I want to create two classes, also called Employee and Company, as well as a collection or array class EmployeeList.

For the sake of this exercise, the language is c#.

So,

Company.ID
Company.Name
Company.StreetAddress
Company.City
...


Employee.ID
Employee.FirstName
Employee.LastName
Employee.Phone
...


If I understand correctly, I should create a method "GetEmployee" or property "Employee" that would return a reference to the company the employee works for.  The Company class will take care of querying the database to populate the instance with the appropriate data.  Ok, no sweat.

But, if I want to instantiate EmployeeList, say for a paginated list display, I'm going to instantiate maybe 50 of these Employee instances by means of passing in a data row from a DataTable.  One DB read, many instantiations, still cool.

However, let’s say I also want to show the company name and ID (and nothing else about the company) each employee works for as part of the display. Do I hit the database 50 additional times simply to get the company object? Alternatively, do I just add the company name and ID as a read only property to the Employee class? If the latter, am I doning a "Bad Thing" in OOD?  If the former, ouch.  If some other way, I’m open to suggestions.

Thanks. 

Too Embarassed to Say
Friday, January 30, 2004

Aaaargh!

"GetCompany" - Not "GetEmployee"

Too Embarassed to Say
Friday, January 30, 2004

If the optimal design for the orchestration between databases and objects was so clearly defined then it'd be a data-object generation wizard, so your pseudo humility seems oddly misplaced. As it is there are a lot of different methods, and a lot of different justifications for each, so you're not going to find the one true way.

Get yourself an OO database and save yourself the mental anguish of not walking in lockstep with the BigMac chefs.

Dennis Forbes
Friday, January 30, 2004

http://www.intersystems.com/cache/index.html

Dennis Forbes
Friday, January 30, 2004

RTFPOEAA

(Read the fabulous Patterns Of Enterprise Application Architecture by Martin Fowler)

You will realize that there is not one true way, and that it is not very difficult to know that you shouldn't do stupid things like accesing the database when you could use a cache.

I suppose that there is one truth: the best way is not the easiest.

.NET Developer
Friday, January 30, 2004

Why go through all this trouble to be an OO fashion victim, when almost any DBMS will do all the work for you?

select company.id, company.name, employee.firstname, employee.lastname
fromp company,employee
where company.id = employee.companyId
order by company.id, employee.lastname;

If your programming environment is a little brain-dead, you could also implement this query as a view in the DBMS, thus treating it as a single table.

HeWhoMustBeConfused
Friday, January 30, 2004

You've run across what I consider the single most important truism for people who use objects to access database records:

Object models suck for reporting.

Brad Wilson (dotnetguy.techieswithcats.com)
Saturday, January 31, 2004

HeWhoMustBeConfused,

You are retarded.
He's asking how to handle this in the Object Model.  We all know how to do joins.

TooEmbarrased, this is an *excellent* question, imo.    This is an issue thats run into quite often.  Sorry, there are no short answer that I've seen.  My two suggestions (which you basically already have talked about, nothing too insightful here)  If very few other parts of your application would access the company entity object, I would have a "mass loader" utility class or something that would load all the companies and their respective employees using the oh so brilliant "join" query that HeWhoMustBeConfused posted, but get all the data from each of the company rows and populate their seperate objects.  Unless i'm mistaken, theres almost no performance difference when accessing more columns in a table. 

  On the other hand, if your "company entity" is being accessed throughout many places in the application, (such as when users are logged in, it may load their company data into an object), I would say don't worry about the mass loading, and just have each of the entities load themselves when need be.  The reason being, if they're already cached, the performance hit you'd take by running multiple queries should and could be negated by the fact that a good ammount will already be loaded and chaced.  Hope this helps.

vince
Saturday, January 31, 2004

The dichotomy is this.

Those familiar with databases (and the particular database they're working with) see nothing wrong with keeping data the same shape it is in the tables and result queries.

Those developing  where there is no underlying knowledge of the database need some kind of abstraction and as most people are stuck with a C++ OO model of abstraction they naturally  want it in those kinds of classes.

But do you really need an Employee object?  Do you need to create a class that has all of the methods and properties that you'd ever want to do with an Employee object?

Would you want the same kinds of access to a amend the address property as to access their salary level, or who they report to, or their disciplinary record?

No, naturally not. 

So perhaps the object you need to manipulate is not the Employee abstraction, but the Form (or  Report), that you want to manipulate that data with.  The Employee Form.

Now the members of that form are the fields and methods on that form and they are bound to the data that relates to the Employee.  Whether that data is directly a field on a cursor or its a member of a collection is irrelevant.  Getting and putting the data are abstractive methods that happen as the form opens, is navigated and then as the data is saved by the user or the form is closed.

Now you might say, but this abstraction layer between the actual Employee data and the UI is the Employee object.  But it isn't, its how all the Forms (and Reports), are constructed. 

In an abstract sense the Form doesn't care about the structure of the data the way in which the data is retrieved and updated is going to be the same for all data sources. 

In large scale systems I'd probably favour a further layer of abstraction where the data was instantiated as a cursor by calling a stored procedure, and similarly when it was updated.

In native Visual Foxpro, I use Codemine to give me the Data Management objects that do all that for me.  I populate the Data Environment at the time of design (and I also build it at runtime for specific purposes), and then layout the form the way I need, to do whatever it is I need to do.

Everything else is pretty much taken care of apart from some wrinkles as in navigating the same dataset you have buffered without committing that data.

In straight line performance there isn't going to be a lot of difference the same latencies are going to exist.  But in terms of robustness and stability the less direct reliance is placed upon the specifics of the database the better.

Simon Lucy
Saturday, January 31, 2004

Oh, and additionally.  If it isn't a form based process but some kind of batch or helper process its exactly the same.

Simon Lucy
Saturday, January 31, 2004

We did it by defining specialist methods on our data access objects that basically bypass the objects themselves - and simply return a recordset for display purposes.  If the user needs to do anything with the contents of that recordset the objects in question are instantiated from the ids within the recordset

zealot
Sunday, February 01, 2004

I ran into this and it puzzled me for quite some time. I asked lots of people but noone had any good methods of doing this.

However, I then did an OO design course and 3 types of class were described:
Entity (e.g. company, employee)
Control (classes that 'do' things)
Boundary (classes that interface with the enviroment, e.g. a mouse or a keypad)

I wanted to find a solution that did not compromise OO. Using adhoc recordsets or creating psuedo objects to represent certain queries is actually breaking encapsulation, I think unacceptably.

The solution I came to was to have the entity classes as you described, e.g. company, employee and then to have control classes that are used to persist those entity classes to a permanent data store, like an RDBMS.

For example one could have a single control class called "DBManager" which contains, in VB terms, 'shared' methods and is non-instatiatable (is this called 'virtual' in C++/Java?). The actual DB connection details would be held in a separate entity class (e.g. 'DBInfo') which is passed to each method.

The DBManager is the interface to the database. It has methods such as GetCompanies, GetCompany, GetEmployee, GetEmployeesForCompany etc.

This, I think, is the most OO friendly method.

You would expect that the Employee class would have a reference to a Company object but rather than the Employee containing a property of type Company I would have the Employee contain a property of CompanyID which is simply the key of the Company (otherwise when you get an Employee object, you also have to get a Company object to put in the Employee object)

If you then want to do certain queries, like get employees (matching criteria) and the company details of the companies that those employees work for, then you can set up a method on the DBManager class, e.g. GetEmployeesAndCompanies(byval EmployeeFilter, byref Companies, byref Employees)

The DBManager then issues the appropriate JOIN query and returns a collection of employee objects and a collection of company objects. All from one DB read.

The client can then run through the employee collection and get the company for each employee from the company collection.

The benefit of this approach also means that if all 50 employees worked for the same company you would have 50 employees and 1 company returned, which is the least possible data you could have.

Of course if Company has 100 properties and you only want two of them then you may want to avoid populating a full company class. But you can do this by creating a brief company class which the full company class inherits from.

I think this approach is the most encapsulation-friendly. Company and Employee are well encapsulated and the database is encapsulated by the DBManager class.

You could argue that the DBManager has some information about Employee and Company, and it may well do if it is to understand how to query for them, but I think this is a more minor compromise. Where you have cross-object queries it would be very hard to encapsulate this perfectly.

But at least the entities are perfectly encapsulated themselves.

Happy to be challenged but I've written a large app using this approach and it's worked extremely well.

Gwyn
Sunday, February 01, 2004

Incidentally Dennis, I looked into Cache. Whilst it looked good the biggest issue for me was the integration with non-Cache technology (works fine in its own Cache server pages), for example .NET.

I also found a new bug whilst playing aound with it on the introduction course so I wasn't convinced in the maturity of the product!

Gwyn
Sunday, February 01, 2004

*  Recent Topics

*  Fog Creek Home