Monday, October 18, 2004

Making distributed systems easier

A major source of complexity in implementing distributed systems today is in the persistence of data. The core of any system is the ability to process information and hence the need to persist information is key.

Unfortunately the majority of mechanisms to store and process data assume that they are the center of the universe and that moving the data in and out is not important, and hence should be done in bulk.

Even modern solutions such as XML Databases (at least the ones that I have worked with require reading all data into an in memory object and then storing). Many years ago I worked on a data analysis system that was based on the idea of infinite object streams - this made everyone think about what was the minimum object and design distributed processing objects naturally.

Short of writing a new stream based XML store myself,anyone got any ideas?

1 comment:

Steve Ketchpel said...

You might check out work going on at Stanford on
STREAM. It's a general purpose stream manager and query language. I suspect they've done some work using XML as the stored objects, which should be a straightforward extension.

Looking forward to hearing you at the RDVP Seminar in 10 days' time. -SPK (RDVP '05)