Fog Creek Software
Discussion Board




Welcome! and rules

Joel on Software

Offline Mode Data Syncronization Best Approach

Hi to all gurus out there ^_^
I need some expert advice as to my case scenario.
I'm developing an email application(actually it's a module part of our main application suite). It functions like Outlook 2003.
We have 2 types of databases, First the main database(SQL Server) for the application and the other a local database(MSDE) residing on the local machine(or LAN).
The cycle goes like this :
1. A user of the application has an associated email account(POP3/SMTP)
2. on the email module, once he clicked SEND/RECEIVE, I would query the mail server regarding the incoming mails for this user(based on the email accounts that he has setup).
3. Once I receive the emails(the email itself as plain text and attachments as bytes), I save it to both the local datbase(the cache) and the main database(I'll explain later the reason for throwing it back to the main database).
4. I delete the mails from the web server, then display the mails that the user has coming from the cache database.

Outlook performs the same thing with emails. Once it has gotten the emails from the mail server. It saves it in it's own database file.
But the problem with outlook is when you need to format your harddrive or change computer. Because your email is saved on your hardrive, you cannot view your emails on other pcs. With our application, the user can view his email as long as he has access to our application. That's why we save the emails to the main database. If he ever tries to format his own harddrive or logins in other pcs, ALL the emails are still present on the main database. That way we maintain synchronization.

Say the user performs the following steps :
- User 1 logins on Computer A. Once Logged In, the emails are retrieved from the cache(the local datbase MSDE) then displayed on the Outlook 2003 like Interface).
- He Issues a SEND/RECEIVE command. New emails(let's say Email1, Email2, Email3) are fetched from the mail server then saved on both databases.
- because the emails are saved on both databases, We are sure that the main database is always updated(We only perform reads on the email).

Now my dilemma is this :
- User 1 for some reason doesnt or cannot use his own pc or laptop but instead use other pcs : Computer B.
- The main database is always updated, Email1, Email2 and Email3 is guaranteed to be present there, but not on the local database on the o pc : Computer B(It's only present on the machine where he issued the SEND/RECEIVE command : Computer A).
- When User 1 logs in, he won't be able to see Email1, Email2 and Email3 because at startup, the program only reads the emails from the cache(the local datbase MSDE).

Now the proposed approach was :
1. at startup read both the emails for User1 from the local datbase and the main. Then merge the data(We're using Datasets). That way, User 1 always sees his own emails regardless what computer he uses.
However there are some drawbacks to this approach :
          * Reading from both database is time and resource consuming, esp. at startup.
          * Merging the data wouldn't get it to save to the local database, I have to perform it manually. Say, I have on the local database ( Email01, Email02, Email03 ) but on the main database I have ( Email01, Email02, Email03, Email1, Email2, Email3 ) -> the latter 3 was retrieved on Computer A so it's not present on Computer B's cache/local database. So I would have something like :
        for every mail not in ( Email01, Email02, Email03 )
          //to get the rowstate : Added for automation using adapters.
          datasetForCache.Tables(MailHeader).add(mail) ->
          cacheAdapter.Update(datasetForCache)
( NOTE : the above code is only pseudocode )
2. add an additional field to everytable like MachineName, and I would get it using Environment.MachineName. Once I receive the email I stamp it with the MachineName then save it. That way I know that some mails are retrieved using other computers.
say I have a table in the datbase like MailHeader
________________________________________________
Subject    |  From ........    | MachineName
________________________________________________
Email01      m@icx.com.ph      MARKDURAN
Email02      x@ibm.com          MARKDURAN
Email03      x@ibm.com          MARKDURAN
Email1        b@yahoo.com        REMYCRUZ
Email2        b@yahoo.com        REMYCRUZ
Email3        b@yahoo.com        REMYCRUZ

Computer B's machine name is MARKDURAN, Computer A's Machine Name is      REMYCRUZ. And at startup, I would get the data from the cache then query the main database for records that doesn't match this computer's ( Computer B ) machine name( MARKDURAN ). Then perform the pseudocode on the first approach. However, I have no guarantee that the ComputerName cannot change. I mean, is there other ID that I can get that is specific for this pc, even if he reformat's his pc the ID wouldn't change.

3. -- For you guys to Add. Need you guys for this ^_^. Thanks in Advance!!!


       

mark duran
Monday, May 09, 2005

http://www.microsoft.com/exchange/default.mspx

rjc
Monday, May 09, 2005

http://www.microsoft.com/exchange/default.mspx

^_^ nice joke, I said that I'm developing my custom email client application right. c'mon.

mark duran
Monday, May 09, 2005

anyone here?

mark duran
Tuesday, May 10, 2005

This is really just a question of ensuring that the 'cache' contains up-to-date data before presenting the data (in this case emails) from the cache to the user.

You could do that by having a date/time stamp when the central database was 'last updated' and another for when the cache was 'last refreshed'. You'd probably want those date/time stamps to be per-user rather than per-machine. The 'last updated' is in the central database and the 'last refreshed' is in the local cache.

If the date/time stamp doesn't match then you'd need to refresh the cache from the central database before displaying the cached data to the user. If the 'last refreshed' has never been set then you'd also need to refresh the cache from the central database. This deals with a user roaming to a different machine or re-building his own machine. Obviously, when the cache has been refreshed, you would synchronise the 'last refreshed' with the 'last updated'. If the date/time stamps do match (or if the central database is unavailable) then you'd simply display the cached data to the user.

Alternatives to date/time stamps would be unique values such as GUIDs.

Mike Green
Tuesday, May 10, 2005

I think that you need to rethink the logistics of this whole approach. Do people really need every single email at every single computer they use or will they want to be more selective? Wouldn't it just be easier to keep all email on the server and let the user manually pull down any old emails that they would want to see that weren't originally viewed on their current computer?

Maybe I'm not understanding your design...

matt
Tuesday, May 10, 2005

Some more food for thought to help understand where I'm coming from:

1) When is an email considered too old to require synchronization? I have 2GB of old emails on the server. I get a new laptop. I connect to the email server and it starts a 5 hour process of resyncing every email I've ever read when I actually only need the last month locally. Anything beyond that and I can always perform a search on the archive data on the server.

2) I use someone else's machine. I don't want to leave a copy of my emails on their machine when I'm done. Is there an easy way to purge emails from their machine? Do I really need synchronization if I just want to see if I have new mail?

3) I am using a public computer. Will there be a simple web interface into the server so that I can check emails without the need to download anything?

4) Keeping track of every machine's state will be problematic since there is really no way to uniquely identify a given machine.

5) People rarely use more than one machine for email. This is probably because of limitations in today's email systems than anything else. However, is the need to have email on multiple machines really worth all of this hassle if very few people will actually end up using it? You appear to be designing for something that isn't really the norm. Having a web interface for the times when you are not at your own machine could be much easier and more flexible.

Just my two cents.

matt
Tuesday, May 10, 2005

>> Alternatives to date/time stamps would be unique values such as GUIDs.
- I've settled to using the serial no. of the harddrive(the hd where the application is installed), I really need to set it up as per ( user and machine ) basis.

>> 1) When is an email considered too old to require synchronization? I have 2GB of old emails on the server. I get a new laptop. I connect to the email server and it starts a 5 hour process of resyncing every email I've ever read when I actually only need the last month locally. Anything beyond that and I can always perform a search on the archive data on the server.
- I would prompt the user if he/she wants to fetch emails that was received on other pcs.
  The deal was the user won't have to delete any single email.

>> 2) I use someone else's machine. I don't want to leave a copy of my emails on their machine when I'm done. Is there an easy way to purge emails from their machine? Do I really need synchronization if I just want to see if I have new mail?
  - already answered this in no. 1

>> 3) I am using a public computer. Will there be a simple web interface into the server so that I can check emails without the need to download anything?
  - the program is designed to be used by a company, or company employees, surely their setup would be every application/pc would be inside the premises of the company and can be controlled. However if they have employees that are roaming around, say, the marketing team. they would have their own laptop. Basically none of them would want to access a public pc, doing so, they would have to install the application there.

>> 4) Keeping track of every machine's state will be problematic since there is really no way to uniquely identify a given machine.
    - using the serial no. of the harddrive where the application is residing would be the effective.

>> 5) People rarely use more than one machine for email. This is probably because of limitations in today's email systems than anything else. However, is the need to have email on multiple machines really worth all of this hassle if very few people will actually end up using it? You appear to be designing for something that isn't really the norm. Having a web interface for the times when you are not at your own machine could be much easier and more flexible.
  - it's a winforms application, we don't have any web interface yet, though we have plans.

mark duran
Tuesday, May 10, 2005

How is this thing you're building better/different/strategic than just using outlook with exchange?

I'm obviously missing the context, but I don't see what you're trying to accomplish creating 'Yet Another Email Client'

rjc
Wednesday, May 11, 2005

*  Recent Topics

*  Fog Creek Home