Thursday, August 30, 2007

Why to document high level information architecture

Why we are documenting higher level architecture? Isn't the high level too meaningless for doing anything with it? Are these high level exercises ever going to be successful?

Well - I disagree - it is not a waste of time - and this is why:
- it is a method to communicate (mainly with users) - so it should be simple and related to business language. If you can pin-point easily the weaknesses, bottlenecks in your architecture, than it is easier to get approval for improvement.
- it is a way of documenting the high level context, which helps positioning the system - so it should not contain detailed content. High level position will make it easier for systems to find relations to others and helps avoiding overlaps or duplication
- it is a way of helping later stages of development or later projects understanding the system & its context quickly -
- it is a way of helping portfolio management - usually business users are very functionality oriented and therefore there is insufficient appetite for more fundamental integration issues. High level pictures help making the case for this.

These benefits also point at the main requirements for architecture: so it should be short - no details - and it should be easy and quickly accessible

So the end result may look easy, but to actually develop it is quite hard. It is important to focus at the key messages ... And that's still an art (which is rarely covered by case tools).

Labels: ,

Friday, August 24, 2007

Documenting information architecture ...

I am struggling to find a good way of documenting information architecture. At the lowest level of technical meta data there are different methods available (physical data models, entity relationship diagrams, CRUD's ...), but if you go up in abstraction level we enter the world of 'architects' where Powerpoint and Visio are still king & queen. I have not seen a common format or common tool for documenting the high level. The only common symbol is the one for databases (looks like a flat drum), but for the rest it is arrows and boxes that can mean anything.


So my question is if we can agree on symbols for some of the key elements?
- types of data stores (e.g. a corporate store should be different than a project / application store)
- interfaces (point to point) or middleware (hub and spoke)
- applications & utilities - with differences for data retrieval and data capture applications


The second thing is that I have not seen a definite list of what to document in terms of higher level data architecture. I can think of the following:
- the high level overview of the main data stores and the key data transported via interfaces (this can be done e.g. as a context diagram per business area or key business process) - this overview should reflect the different lifecycle phases of the information.
- the main components of the application / integration / data stack (in a different view) per system (or set of systems)

Labels: ,

Friday, August 17, 2007

Data Architecture principles

One of the Data Management principles is about applying proper architectural principles - it is a bit of a meta-principle - and this covers actually a whole category of things. So what is it? It is actually a lot ... Let me try to list a few ideas (it is again a list of 10!):
  • Data should be created in one place (a defined master source)
  • Data should be distributed via a hub and spoke architecture (so avoid spagghetti)
  • When using middleware adopt a single integration architecture (don't stack middleware on top of middleware ...)
  • When communicating with external parties adopt XML as a standard - adopt as much as possible one XML dialect only
  • Split data capture from data retrieval
  • Only store data together when it needs to be managed together (i.e. when you need to see it together you can also do that in the application)Store as much meta data as reasonably possible with the data
  • Ensure audit trails are implemented on data
  • Split project information from corporate versions of information
  • Don't use transaction systems to store your history (archived information). Store this in some sort of data warehouse
  • Only retain data when there is a value to retain it - but check this across the data lifecycle!

Labels: ,

Thursday, August 16, 2007

Data Management Principles

Usually IT departments are very infrastructure or application focused. Information or data is seldom the first priority. In order to establish data management as one of the cores in the organisation it is required to publish some key principles; it is a bit like a consitution for a country.


Establishing these principles and then repeating them ad nauseum is the only way to get people to know the importance. The principles should be easy and should be part of the overall IT architectural principles (as mandatory).


Here are (what I think) the main points:

1. Data is an Asset (so it should be managed like an asset)

2. Data should have an Owner

3. Data should have known Quality rules

4. Data should have a guaranteed integrity across the Lifecycle

5. Adopt international standards where possible

6. Classify each element in the right security class (is there any confidential data?)

7. Ensure data is accessible to whom needs it (open up by default!)

8. Ensure meta data is in place

9. Adopt principles for data architecture (more on this next time!)

10. Ensure internal & external information is treated with the same diligence

I may have missed one or two (but 10 was such a nice number), but if you can get your projects to adhere to this, than you're pretty mature in terms of data management!

Labels:

Sunday, August 12, 2007

More on Meta data management

Meta data continues to fascinate me, and especially the way we continuously fail to get it to work. Intuitively I would say that it is easy to convince people about the value of meta data. Would you buy a jar or tin in the supermarket without a label? Would you take a medicine without a prescription?

That makes sense and of course we have meta data all over the place in the realm of IT. The only issue is that we do not manage it. So why is this? I may have a few explanations, and hopefully some further solution directions as well.

Explanations for failure:

1) when we try to set an overall meta data management strategy we try to analyse the full meta data problem and then we create usually very complex models. These complex models lead to complex system that nobody wants to maintain

2) when we deliver applications we focus on the delivery of the applications and not for the contribution to the greater good

3) there are usually no people with an overview beyond single projects

4) there are many competing standards, classification structures, etc. Who decides what's leading?

5) Meta data has no visible pay-back. So why invest?


So how to make it work?

1) Meta data only works when used in a business process where people have the interest in maintaining it. If meta data is collected and does not add any value after collecting it, than this meta data is useless. So only collect just enough, just in time. So only collect data definitions, if there is a guarantee that it will be used. That guarantee only exists when the usage is part of a common business process.

2) Distinguish between technical meta data and business meta data. Quite often people get confused about meta data, because they mix the two types. Technical meta data relates to things within systems (tables definitions, CRUD matrix, data mapping in an interface, etc), while business meta data is something understood by 'simple' business people. Things like Data Ownership, quality rules, positioning of the data in a business process. The two types of meta data require a complete different way of managing. Technical meta data is maintained within the application delivery process and portfolio management, while business meta data is usually a higher level issue (e.g. business process redesign). Business meta data is usually collected earlier in the process of bringing in new solutions than technical meta data.

3) Publish business meta data as part of something that is used by everybody; e.g. an Online help or a glossary. Earlier I advocated the use of wiki's for this, and I still think this is an excellent option. Don't publish technical meta data, but ensure it is part of the technical documentation (build in an architectural check in the project for assurance). It is best when technical meta data is part of the solution (e.g. part of the XML file, part of the ETL definitions, etc.)

4) Keep it simple

Labels:

Friday, August 03, 2007

Unique identifiers

Holy wars and worse have been fought about unique identifiers for data items and the last word on it has not been written either. The most purist solution is the meaningless numeric sequence and this is usually the recommendation. So why is it that other solutions are adopted more regularly than this simple recommendation?

Well - people get confused in the discussion on unique identification of things, because if you uniquely identify something, wouldn't it be nice to also recognise it. Why have a meaningless number if you can also call me Evert? And this is where people make the mistake. There is a difference between human recognition of a thing or person and what systems need to do with its related records. Every system has different ways of dealing with the names of things and therefore every recognisable identification will have to go through multiple conversions if the item is used in many systems. A neutral number does not have this problem.

The other issue is that things that you can recognise can change. Take country codes as an example. You would think that a country is a pretty stable object, but Upper Volta became Burkina Faso and Birma became Myanmar and then I even did not start mentioning Serbia ... Every time a country changes you have to change the identification of the object in all systems. This can lead to lots of (technical) issues.

My view is that we should allow for both - a unique (meaningless) number and a unique meaningful alias (or set of aliases). The number is the actual primary key, but the alias is what you use in practice on your screen. You can use the alias as much as you like, but when you need to convert, you don't run into technical problems! The more meaningless the identifier, the less discussion when situations change.

The unique identifier needs to be assigned when the object is created and should never change. The best way is to do this via a 'service', an independent component in your system architecture, but this only makes sense in complex large enterprise wide architectures.

Labels: , ,

Wednesday, August 01, 2007

Data capture vs data retrieval

One old concept of data management which is still valid today (and at the same time insufficiently adopted) is the architectural split between data capture vs data retrieval. It seems a bit artificial to split datastores in this way, but in larger architectures it makes absolute sense. This is not only because of the traditional reasons of tuning the retrieval database for performance (the old data warehouse concept), but mainly because of lots of practical reasons. Here are some:

  • Data capture is usually a complex process with steps for QC and validation, while retrieval is read-only. This leads to different data models, security models, etc.
  • Data capture is usually for just a limited number of users with various access rights, while data retrieval requires a focus on sharing
  • Data retrieval environments are focused on information retention over time
  • Data capture environments should only exist once for a data type while data retrieval environments can exist in multiple ways. Once data is created it can be shared or replicated instantly via messaging services

Having this concept in the back of the mind whilst architecting a data environment for an enterprise (so not for small systems!) is very useful and can simplify enterprise wide solutions.

Labels: , ,