Information streams are flows of data through a data architecture that are strongly connected. For instance, such an information stream would flow horizontally through a standard data warehouse architecture having vertical layers that extract, transforms and loads data. Typically, these would incorporate components of major business processes of sales, order handling, inventory, accounting and so forth. The figure below displays this:
By creating strongly coherent and loosely coupled building blocks, we are providing a flexible and scalable architectural solution. By coherent we mean that each component in a building block is strongly related. We want these building blocks to be as independent of each other as possible and that is what we term loosely coupled. We can take this a step further and put the building blocks that belongs to a specific business process together in an information stream. It is easy to find the motivation to do this:
- By decoupling non-related building blocks and joining the ones that are related, we can simply run one specific information stream at any time, independent of the others
- It is easier to divide operational responsibility to make sure that data comes through on a daily basis
- Responsibility can be linked across divisions to the owners of the business process in question
- Further development and maintenance can be directly linked to that information stream. It is hence easy to predict consequences, schedule downtime and inform the owner(s) of that information stream.
- It makes introduction to the data architecture easy, as we would want to tag the different bundles in each layer of the information stream with the same business process name
How do we construct an information stream? The objects themselves needs to be strongly related to the business process in questions and when these are identified and created, we tie all the objects or building blocks together for each layer making sure to use the same name throughout. Imagine having three layers for the ETL process - raw data (RA), staging (ST) and datamart (DM). For the sales process, this would give us the following bundles: RA_Sales, ST_Sales and DM_Sales. A master sales workflow should run these in sequence. In a data warehouse environment you would need to include conformed dimensions in such a run. The construct of conformed dimensions should however be similar: RA_Conformed, ST_Conformed and DM_Conformed.
Information streams will typically consists of eighty percent of a typical data warehouse architecture and a bit less in broader data architectures that spans many sources and uses of data. It will, however, make the main flows visible and easily identifiable for people working closely with it. It will even make the data architecture easier to relate to for business users and responsibility can be given at several levels in the organization. This makes it easier to both communicate necessary changes and set new targets of improvements.