Wednesday, November 28, 2007

Six Sigma

If you talk to Japanese manufacturing companies (say Toyota or Toshiba) and you talk data quality, then soon you'll talk 'Six Sigma'. Six sigma is a term derived from the statistical term describing the standard deviation (sigma) and therefore describes the likelyhood to make an error. If you reach Six Sigma, you reach a quality where you hardly make any mistakes. This is particularly important in manufacturing.

When talking Six Sigma I think it is mostly striking to see that the quality process is everywhere; it is engrained in the organisation -from workers up to sr managers - quality is everywhere. And that is - I think - the main message of Six Sigma: You can only reach high levels of quality (and therefore also information quality) if this is engrained in everything we do.

Some vendors of data quality tools start to use the term Six Sigma in their toolkit and that makes me sceptical, since it is not the tool that does the job, but the complete culture change. So if you see managers buying a Six Sigma tool and not do anything else - than don't trust them!

Labels: ,

Sunday, November 25, 2007

Making Controls Work

Data architects need to balance constantly between what we want in terms of easy to use functionality and building in some level of control. Users do not want to be bothered with storing data in the right place, or adding all sorts of meta data, but data managers know that without this it is not easy to re-use the data elsewhere, or later during the life cycle of the data.

So one of the holy grails for data architects is to find that right balance between easy functionality and the controls. My view is that we need to focus with controls on the 'must-have' and further see how much we can already populate through some smart algorithms. At the end of the day we all know that if things are automated they will happen (like the a search index), and when you rely on the user it is more likely that meta data stays incomplete and inconsistent, unless there is a lot of control. The right balance is even easier to find if some of the user interaction can be supported by work done by automated processes - e.g. if the meta data can be pre-populated, and users only need to click OK (and take out the glaring mistakes), then this process may work.

Another option is to make the data quality more visible; more transparent retrieval mechanisms (like e.g. Spotfire or BusinessObjects) are great enablers for this. Suddenly users see that data is missing and may even grasp the effect of this.

Labels: , ,

Thursday, November 22, 2007

Integration is the Key

Things like data management and architecture are quite often still elusive subjects for senior managers. We produce pretty pictures, we make comments that are hard to disagree with, but when people ask us the 'so what' question we usually have a complex story about roles and responsibilities, compliance, implementation of difficult 4-dimensional data models and other intangible things. What we usually miss is the elevator speech that sells our story.


I have been reading through previous posts and I see a lot of things the fall in the first category I mentioned - the difficult & intangible overhead stuff. Therefore it is time now to focus on the one-liners ... I even think we can reduce it to a single word: The word Integration. Probably the main reason why we do data management is because we want to integrate something:
  • Integrate business process flows
  • Integrate information from various processes into a single management view
  • Integrate information over time - so we can compare today with yesterday
  • Integrate the inside with the outside
  • Integrate data with documents, maps, ...

OK - there's a few other things - risk management, compliance, etc. (pretty important stuff), but they key thing that people understand is the word integration. So probably in the future I am going to drop the word data management in discussions. When people ask why when I propose a certain measure, than I will say: Because you need to integrate ...

Labels: ,

Monday, November 19, 2007

Location, location, location

Another aspect of master reference data is its spatial representation. Spatial information becomes more and more important (think Google Earth) and therefore the proper management of location, shape, orientation, etc. becomes very important. Using a spatial location is a very intuitive way of finding information, and after all the freebies from Google Earth and the upsurge of TomTom type of devices the spatial information is now firmly on our 'map' of data management.
A starting point is to have a standard coordinate reference system (CRS) agreed in the company. Each country (or in some cases even regions) may have a different CRS and therefore it is important to choose a common standard. The most common are WGS84 or UTM. Different coordinate systems for the same point can lead to a mismatch of 100's of meters.
The second thing is related to the 'spatialisation' process. Usually the spatial representation of an object is stored in spatially enable Oracle database, but quite often this is a different data base than where the original source of the data is. In a way you can compare the spatial information with an index - it helps you to find the object in a defined space. But when the spatialisation is not standardised, or not owned by anybody, than there will be quality issues with the data very soon.
Having a spatial index in place centrally is useful, but if you have a large environment this may not be practical. Too much centralisation can lead to security complexities, synchronisation issues and preformance problems. The best thing is to have a strategy to store the spatial information together with the master reference data, whilst using a standardised spatialisation service. Than you can re-use the security of the master data source and render the information anyway you like - in a table, on a graph and also on a map!

Labels:

Sunday, November 11, 2007

Master reference data management

Implementing a service oriented architecture (SOA) also leads to the move to set up master reference data management services.

I am sure that there are no real best practices around at the moment, so therefore this is a bit like exploring undiscovered territory. The big step is that you start to de-couple data from functionality. With other words - the management of data becomes a business process in itself.

So how do we get this done? Some people start straight away with developing the best possible data model, but I think that this is not a good idea. Don't develop a data model that would meet everybody's requirements from the start, but follow the steps, as laid out earlier in ... http://infomgr.blogspot.com/2007/06/data-management-fundamentals.html and the best is to start small. So in short do the following:
  • Pin down the scope & keep this as small as possible to start
  • Define the ownership, including the operational responsibilitie
  • Define the process / data flow, including the operational roles
  • Set up the data model & keep this as flexible
  • Finally - set up the 'service'

The service itself should contain the following:

  • The canonical data model + optionally the container for the data (and keep the scope limited to start!)
  • A messaging service; to transport the ins/upd/del transactions
  • A routing service; to move the data to the right place
  • A key mapping service; to know how to translate keys, if required
  • And optionally a user interface for power users

This all looks a bit daunting, but by starting small it is a feasible feat. Let's see if I can convince my colleagues of this wisdom as well!

Labels: ,

Friday, November 09, 2007

Canonical Data Model

The main challenge continues to be the data model. In SOA speak: the canonical data model. Through some sort of difficult reason it is very hard for most IT people to deal with data models. Obviously they are not so sexy as the functionality in the application. Unfortunately it still requires quite a bit of skill to develop one ... and expecially an intermediate data model needs to fit a lot of requirements. It needs to...
  • Support the superset of requirements of the related systems - and therefore it is important to model cardinalities, identifiers, attributes, etc. in the most flexible way. Cardinalities need to be flexible (by default many to many). Identifiers need to be unique (by default meaningless numbers). And with every attribute it needs to be clear that this is an attribute and not potentially an entity ...
  • Support the reference data over time (4D), i.e. everything needs to be time stamped and no reference data gets deleted ...

If developers want to meet all these requirements, than they need to dust off some skills we dropped during the nineties.

Labels: , ,

Sunday, November 04, 2007

Enterprise Service Bus

The flavour of today in integration architecture is the Enterprise Service Bus (ESB), which is in a way nothing new. The idea of integration via an intermediate model has been around for some time and has led to many disappointments. But maybe this time there is a chance to make it work, since technology has moved on.

So what's new? First of all XML has matured to some sort of lingua franca; gradually all players in the market start to support the paradigms of SOAP, XML schema's, message based services, etc. And that means that momentum is building up. Second - the main difference is that everything is loosely coupled and therefore the responsibility for integration is now firmly put at where it belongs - with the applications. With other words - it is not the middle layer that needs to take care of everything, and this may be a factor determining its success.

Labels: , ,

Friday, November 02, 2007

Incremental Architecture

Implementing Enterprise Architecture is not an insignificant task and usually when carried out as a separate activity it has a high likelihood of failure. This due to its size and its lack of clear owners. Owners are usually more interested in more tangible pieces of functionality and less in the plumbing.

Luckily there is also another option and that's called incremental architecture. It works as follows:
  • Develop a high level of understanding of the direction for the portfolio / enterprise (so you cannot work without an overarching blueprint, but the ambition-level can vary

  • Map the existing program to this high level picture - and you will see that projects overlap with elements that need to be developed to make the blueprint or vision a reality

  • Expand the scope of the existing planned projects and include the common elements needed in the future architecture as deliverables to ...
This gives the opportunity to develop the integration gradually and has a higher likelihood for success, since it is part of the existing ongoing program. The only downside is that it is very hard to achieve a real step change ...

Labels: , ,