The long explanation of where I am coming from (more elementary in nature for the benefit of the non-scientist out there). The organization I work for is responsible for some of the operation of the Hubble Space Telescope and I am currently involved in a project to implement a new "command center system". Part of this system is the collection, processing and reporting of spacecraft telemetry information. This telemetry information is a stream of measurements taken on board the telescope of temperatures of various parts, voltages and currents of various circuits, positions of relays, modes of each piece of equipment, etc. There have been, over the life of the telescope something like 8000 different measurements reported in the telemetry (it changes every time the space shuttle astronauts work on the telescope). The telemetry data is absolutely vital to the operation of the telescope because it makes it possible for the spacecraft engineers to know what is happening on board and, even more importantly, when things are going wrong (which happens fairly often) they are able to answer questions such as: 1. What went wrong? 2. Why did it go wrong? 3. What can be done to prevent this from happening again? 4. What subtle clues can we watch for to find out when something is about to go wrong? 5. Can we safely operate the telescope when this problem occurs? 6. What can be done to work around the problem for now? Often, answering these questions requires getting access to telemetry information going all the way back to the launch of the telescope (1990). Unfortunately, the amount of data is very large: processed it is about 6GB a day. To make it possible to quickly access telemetry information, NASA had decided to use three techniques to compress the data. The compressed data (about 120 GB per year when loaded in the database, 2+ TB over the life of the telescope) is then loaded into a high performance data warehouse (made by Red Brick). The data is loaded in time order by measurement. This makes it very easy to quickly find out what the value of a given measurement is at a given time. Problems arise when trying to create complex reports that involve several measurements over long periods of time, and this is partly because of how the data is compressed. Measurements fall into two basic categories: analog (voltage, current, temperature, etc) and discrete (relay positions, equipment mode, etc). All discrete measurements and the slow changing analog measurements are stored in the database in "change only" records that give a start time, as stop time and the value of the measurement during that time. Fast changing analog measurements are averaged over short intervals (typically one second to one minute). The time periods of records stored in the warehouse are irregular and unpredictable (even for averaged measurements because of communications disruptions). I'm including some RealActualHubbleTelemetryData (based upon the old adage that a picture is worth a thousand words). Looking at the data, you will notice that it has three characteristics: 1. Each measurement/mnemonic runs on its own time base (although all mnemonics have a common time base with a granularity of 25 milliseconds). 2. Average mnemonics use mostly fixed length intervals, but interval lengths for change only data can vary widely. 3. The data contains gaps. Some gaps apply to ''every'' mnemonic. Some apply only to a particular mnemonic. To create a sophisticated report involving several measurements for a long period, it is essential to properly manage the time dimension. A couple of simple report examples: 1. Report the value of CBAT1CUR, CBAT1PRS and CSS3A every 30 seconds. 2. Report the values of CBAT1PRS and CSS3A when CBAT1CUR is greater than 0 amps. The data for any given measurement, for any given time period can quickly and easily be retrieved in time order. The difficulty lies in taking records of differing lengths with different time offsets (and don't forget those gaps) and synchronizing the values of several measurements in time to produce the required output. KathieClark sent me the following very astute remark which you might find elucidating: ----- I have experience with Data Warehousing, so I understand your description of the StarSchema and the various alternate storage methods. I am suspecting when you say 'synchronize', what you are trying to get to is data that is valid for the same time basis. It sounds to me (just guessing) as if you might have data that is grouped in disparate time ranges and discrete points in time, and when you say synchronize, you are trying to somehow bring them into alignment. Change data is also difficult to work with. Those artificial ranges are one of the problems of the Star Schema (only) approach to data warehousing. When you chunk data into ranges and play the various games you play to get the data into an additive format, you damage the data for any view other than the one the designers of the warehouse thought was most relevant at the time. And - if you have thrown out your raw data (or it is inaccessible) - well - it can be impossible to get there from here - and it sounds like you are experiencing that to some degree. ----- ''Sounds like a job for a GeometricDatabase'' ----- Sounds like you are working with time series data. For many purposes, time series data can be extrapolated or summarized from any periodicity to any other. Also, measuring and storing some statistical characteristics of a time series that is being consolidated might allow some probability reporting even when the detail at the smallest granularity is no longer available. Consider a self-defining time series object which supports variable formats, granularity, and conversions. ("Consider" is all I have done, but the approach seemed very attractive when I was working with demand time series data in an inventory/forecasting system.) ---- See also: TimeSynchronousProcessing StarSchema