Sunday, June 24, 2007

Unlocking value

Data Management is a bit of a boring subject to many, But when done well it can lead to massive benefits that you cannot even imagine at the start. An example is the success of the Internet. At the end of the day it is actually a data management success. The W3C has been very successful in setting and enforcing a few standards on how to address machines in a network (IP adresses), how to exchange information (the IP Protocol), how to name domains (DNS) - and replicate this across the network, and how to publish and link information within these domains (HTML). Everything else after that is now history. Nobody could have imagined the growth of the Internet in the last decade and half.

What it proofs is that we need to agree on certain basics and at the same time need to give the freedom to everything else. Within data management we see people spending enormous amounts of time in trying to come up with the perfect data model, or have endless discussions about how to define a unique identifier, but to me these discussions are usually a bit academic. What really counts is that a decision is taken; a standard has been agreed. It can be a standard which is not perfect (like the Qwerty keyboard); but if it works it will have benefits beyond what it was designed for.

Think about what would be possible with the wide adoption of XML to the same level as the adoption of HTML? Unfortunately agreement on formatting is much easier than agreement on content. Still I can see enormous steps being taken in the Web2.0 world where information from different sources is being mashed together from lots of different sources (e.g. Google News). Imagine what you would be able to do if information within a company could be mashed together automatically, because the information is easily recognised? So it is still worth the effort to spend some time on meta data, because it may lead to some unplanned by-effects beyond imagination!

Labels: , ,

Saturday, June 23, 2007

Data Management Fundamentals

As said before Data Management is about control, control on quality, control on replication of information. Data Management is very much part of the plumbing of an Information Architecture and therefore people tend to give a little attention. Remember - At the end of the day users want to use data in applications, but it is just the applications that they see. And therefore the easier it is for them to get the right data, the better it is. They don't care how we get it to them, as long as it is easy to get and easy to understand. If it is good data than that's a bonus, but if bad data is easier to get than good data, than it is more likely that users will choose the former. If you use Google would you browse to page 25 to get the right result out of your search? No.

So what does this tell us? What it tells me is that Data Managers have to provide a platform which is like using electricity. We use a switch and we have light. So users should get their data served to them automatically (i.e. the data is already replicated to the application and there is nothing to worry about), or should be able to get it very easily; in one place, in the right format and with the right quality - without ambiguity.

In many administrative environments this is already a reality - integrated back office systems have been rolled out over the last decade and people cannot even remember anymore that they had to go through different systems to get something ordered and then an invoice paid. Note that some companies still struggle with their master data, but that's more a flaw of SAP, than a lack of intergration and standardisation.

If we talk about the more complex industries like manufacturing, construction or oil & gas, than we see that we are clearly not there yet. And this is where Data Management really becomes interesting. Quite often data is held in many systems and replication is ad hoc, manual, and clearly not perfect and this is where a lot of progress can be made by applying a ten simple architectural measures:

1) Agree the scope = the master reference data; i.e. data shared across systems

2) Agree who decides about the data in scope

3) Agree the standard for the master reference data

4) Agree the master source for this reference data

5) Agree basic quality rules

6) Agree how the information is replicated (format, frequency)

7) If more than one version of the same data should exist, agree how to deal with versioning

8) If users have to shop for the data themselves, than ensure it retrieval is in one place

9) Automate as much as possible, and keep it simple

10) Ensure there is a data helpdesk in place


This is all no rocket science - but in most large organisations this is a struggle!

Labels: ,

Friday, June 22, 2007

The limits of Freedom: Manage your Data

The points made about supporting maximum freedom for the users are mostly true for Document Management, but when we talk data than we should be careful about allowing too much freedom. Data is by definition a more structured type of information than a document and structure means the requirement for control.

In a well functioning data management environment we have data sets that are used in more than one environment and therefore it is of paramount importance that this master reference data is of the best possible quality and adheres to clear and common definitions. Even better - it would be great to have a number of tags with every data item, telling what it is and telling you what has happened to it.

This is actually in line with some of the Web2.0 thinking. For document management the web2.0 looks like reducing the controls, but for data management this is not true - since this is about adding more context. And this is where the ideas of the semantic web come into play. This idea of the semantic web is about putting information into context so the meaning of information (the semantics) can be understood by machines (search engines, integration engines, etc.). Integration should be a problem that you solve once and by establishing open standards this should be achievable.

Please note - I am not advocating to add a lot of overhead to managing data, but merely a number of measures that should be taken into consideration for every data item that is shared across systems. The basics are as follows:

  • First of all: Know what is should be shared - don't put measures in place on items that are unique to one system (unless that data is critical)
  • Ensure integration of meta data with data - using standard XML schemas is very good approach
  • Put in place some quality indicators - automated measures checking completeness and consistency of the data - this can be part of the meta data
  • Also add other tags like time stamps and userids of people who have done something with the data (audit trail)

A controlled data management environment is a must for making business processes work across the company and across systems. The more you ensure the data adheres to transportable standards and is available with contextual meta data, the more you can integrate. At the end of the day data is only part of the information infrastructure. The real value is in what you do with it.

It also can help reducing the controls on document management. If documents can be automatically tagged via extracting words listed in master reference data, than that reduces the overhead in document management.

Labels: , , ,

Monday, June 18, 2007

Freedom vs Control

Information Managers (and also IT people) have the tendency of trying to get things under control. That's why we put a password on everything and that's why many systems are cumbersome to use - would you really want to add these 20 mandatory attributes with the document? Do you really need to go through these 6 steps of approval?


What we sometimes forget is that we manage information on behalf of the users. We are there for them, and not the other way around. So to really be successful as an information manager we need to sometimes challenge our own natural inclinations of building in more security than needed, or more meta data than feasible. Some of the measures I described in the item about Hoarding are therefore more than a user can accept.


One of the measures I mentioned was getting rid of shared drives. But maybe this is a typical dictatorial action, with the information management goals made more important than the user. But do we really want to do this? Recently I noticed a colleague taking this brave measure in his organisation. I fear that he will fail, since the technology offered as an alternative is not as simple as a file system ... and therefore users will look for shortcuts that will defy the purpose of this measure. Dictatorship leads to terrorism!


I have already written a lot about the Web2.0 - how we can add a lot of value by increasing the freedom of users in terms of how they classify information and how actually opening up our security model will be beneficial as well. Still we like to have control - that's just natural behaviour, but my view is that we should not get this control by confining the users too much. We can also get this control in an easier way via automated means:

  • We should not spend too much time setting up complex taxonomies, but should be investing in automatic tagging mechanisms based on master reference data.

  • We should not spend too much time in asking people to clean up their old information, but have automatic archiving procedures based on where people store it. This gives people the freedom to collect what ever they like

  • We should not have complex security models, but have 'buckets' that are open and areas that are closed.

This is all technically very easy and is more about a changed mind set (behaviour) than anything else. At the same time it will help the users, since it will not put too many restrictions on them.

Labels: , , ,

Thursday, June 14, 2007

Hoarding and ILM

There is another approach to hoarding - just let it be! It is less of a problem if a few conditions are met, as I described before in some other posts. The conditions are as follows:
  • If all information is kept in an open digital format

  • If there is an open security model (i.e. only really confidential information is stored under lock and key)

  • If there is plenty of disk space (which is cheap today)

  • And if there is a powerful search engine

Than you can make it work. But this is easier said than done, since managing large volumes of information needs some sophisticated technologies supporting optimal storage, supporting fast backup and effective restore, and it needs to be able to recognise versions of the same information.

Some hardware vendors like HP and EMC are now exploring this space and are coming up with so call ILM solutions (information lifecycle management). These are not just powerful storage servers, but 'intelligent' solutions, since they come with software that enable the optimal storage. The HW vendors see it is a way to get higher up the value chain and companies can benefit from this aim, since the hardware in the company becomes more valuable, since it is managed in a more intelligent way - win-win.

It sounds all promising, but at the same time there are no silver bullets, so I expect that ILM solutions only have the potential to work if they are accompanied by a lot of consultancy and a lot of drive within the company to make the implementations work.

Labels: , ,

Tuesday, June 12, 2007

Hoarding

I am in the process of moving and then you realise how much people are hoarding stuff. Every job move a take about 1 or 2 boxes with me, full of old courses, interesting reference documents, etc. and usually I never see them again until the next job move. It is hard to break this habit and it is clear that I am not the only one. All around the company I see cupboards full of papers and I guess there is a lot of junk in it that we throw away, plus a few valuable gems that we may have lost ...


So is this a problem? Probably ... because paper takes up a lot of space, people loose a lot of time searching in their papers and valuable information may get lost, because the information is not accessible anymore (since it is store in some sort of personal collection).


So there is some merit in address the 'hoarding' problem. And actually there are ways to address this.

One - drastic way - is the idea of Open Plan offices. You may love or hate them, but for hoarding paper it is a good remedy. Just allocate little space for storage and then apply a clean desk policy and soon people are only retaining the most valuable information. Obviously this only works if there is also a good library type of function, else people start to throw things away that should have been retained.

This problem also extends to digital data and therefore a similar measure like clean desks can be applied to shared drives. Offer people a controlled environment for key documents and clear out the team space (say every week).

Another measure is the 'big brother' type measure of having a information delivery compliance monitor for what kind of documents need to be stored at the start and end of each business process. This works for repetitive processes (e.g. every project should have a signed off project plan, stored in the EDMS), but this does not work for more creative, or iterative environments. In that case it is hard to measure the value of what has been posted as key information, but than a manager could sign off on a general statement that key information was managed. All other information should be thrown away, or at least stored outside the prime information traffic areas. Note that this only works if it is on the manager's scorecard!

Getting people to rotate in the organisation is also a good way of cleaning the house once in a while. Usually people tend to clean up a lot of information at the ends of their jobs and the new person coming in can do another sweep.

Labels: ,

Monday, June 11, 2007

Vital records protection

Due to the cyclone of last week I have had quite some thoughts on DRP and BCP, and concluded that the scope of this should be pretty limited. Risk management is a good way of scoping (only records unavailability that causes a high risk needs to be considered as part of the scope), but this can be a slightly too narrow approach and may lead to short term gains and longer term problems. Therefore it is important to define for a company the vital records. A simple definition is: Without these records you can close the shop. So what are the main parameters in defining vital records?
  • Legal: By law you need to have these records - usually these records are related to agreements, contracts, financial transactions and people. Quite often these records have a legal retention date (think Sarbanes-Oxley act). Obviously these records need to be properly protected. The most important (and therefore vital documents) are the ones related to major agreements. The bulk type (invoices) need less protection.
  • Asset Integrity: Records on the structure and state of maintenance (integrity) of facilities are vital for any operation. Think drawings, designs, inspections. Quite often these records need to be retained indefinitely and they require a high level of protection in terms of protection against damage or loss. Quite often they're not confidential.
  • Confidentiality: Some records in the company are company secrets that can be the differentiator for doing business (think intellectual property, major strategic documents). These records need protection both from a confidentiality and a physical protection perspective.

So what to do? I think first of all it is important to have records in an open digital format, i.e. stored on a computer and easy to open with normal desktop tools (think PDF). Paper is great to work with, but digital is the only proper backup mechanism. So if you have paper than it is important to have at least good quality scans of all vital records. The scans need to be indexed properly and need to be made available online (with sufficient security of course). Obviously some records (like facility drawings) may have a more specific format (e.g. Autocad).

Further it is important to have the digital records protected physically against any disaster (flooding, storm, fire, ...) and the easiest way to establish this is having them stored in more than one place (preferably more than two). Note that these places need to be geographically apart. In some countries companies have decided to have their information backup abroad (especially recommended for more volatile places). So if you have your records in Amsterdam (or New Orleans), than it is wise to have a backup in place in a higher place (so not next door, but say in the Alps or the Rocky Mountains).

The backups have to be made on a regular basis (daily) and more importantly: you have to test if the backup can be restored! Note that the same security measures (on accessibility) need to be in place on the backup (so confidential data is also protected in the backup site).

So bottom line: it is important to know what is vital for your business and it is important to have good protection in place for them. Not just in terms of security, but also in terms of physical protection against disasters and the best protection for the latter is replication.

Labels: , , , ,

Saturday, June 09, 2007

Managing Risks and the world of Web 2.0

Quite some time ago I wrote about Information Management that it is actually all about managing risks. And our little cyclone has reminded me of this fact. We suddenly realised that most information in the company is actually not important (or let me rephrase, not important enough for business continuity)


So a lot of what we do in Information Management is more at the 'nice to have' end of the scale of things. But having written this statement does not mean that there are some nice things about IM / Web 2.0 that are now emerging as elements that can add value in the continuity of a business or be of help during recovery from a disaster.


The Web2.0 has brought us wiki's, blogs, ... and this is how they could help:
  • Wiki's can lead to more up to date procedural information, but can also help with collecting the learnings from a disaster event. Everybody can contribute! Many people know more than a few isolated auditors

  • Blogs can help during the disaster to share news and bring people up to date on the status of services and other things. I noticed for instance that AP, the news agency has started to use pictures from Bloggers in their news coverage. Mobile blogging is also possible, so why not use your phone for things like this?

  • The free online storage spaces for content (pictures, movies, but also documents etc) can act as a temporary business continuity site when your own services are down

But having these things playing a more vital role means that these Internet based services become more vital and therefore become part of the DRP/BCP (disaster recovery planning and business continuity planning). That's a way to become important!

So the infrastructure side of things needs a possible rethink as well. Usually data centers are setup as a single point of failure and having the Internet as a vital piece of communication technology does not allow for single points of failure. So setting up the companies infrastructure as a set of networked nodes is a model worth considering as well.

Labels: , , ,

Friday, June 08, 2007

Some notes on Disasters

A few days ago we were hit by a cyclone, and this brought up the subject of disaster recovery and business continuity again. We have spent many hours discussing vital services in the past and how to ensure continuity for them (or how to ensure that we could recover from a disaster) and thanks to the cyclone we could test our assumptions.


On disaster recovery (DR) I noticed that we were lucky. We had an orderly shutdown before the storm hit us and no real damage was done to the data center. So recovery was easy - it was just a matter of getting the computer floor dry, getting power up and running the start-up sequence.
Things would have been worse if the storm had wiped out our facility, since we had only limited backup equipment - and most data was kept in the same site (just 500 m down the road). But arranging a full mirror with sufficient distance is expensive ...


On business continuity (BC) I noticed that really most services are not vital. The main office was completely deserted, but production continued. The next morning when I arrived in the data center I saw nobody around, but all services were humming happily. Is it really that most what we do is not really necessary? It was good to see that most production operations just need basic IT functions and therefore they were not hit. This also gives me some thoughts about our plans for IP telephony (IPT), since this is a more vulnrenable service that the good old fashioned system that we have today, so it is clear that we still need to have it as a back-up.


So what were the things I missed during the storm? I actually missed mostly my access to the Internet! Being cut-off from the rest of the world with just rumours - no real information - can be dangerous. People were making assumptions about the wind, the waves, etc. and through this had the chance of making the wrong decisions. So having a good news service up and running was vital. If the company also has IPT in place, than Internet + IPT become the most essential services to keep up. For the rest it is mainly about keeping power & water up running, so all the basic IT needed for this is vital. The rest can just run on a laptop (e.g. information on emergency and recovery procedures, or even on paper).


Bottom line is that for BC it is only the bare bones of the IT services that are important and this service obviously needs electricity, so any back-up system in place should be able to run for some time in a limited mode on maybe a small generator or batteries, so we keep the basic utilities up and running + the communication to the outside world. Anything else is luxury. And for DR I think it helps to have some conscious decisions on what information (+ apps + hardware) should be available in a recovery site at some distance, because you may need it one day ...

Labels: ,

Saturday, June 02, 2007

Middleware

Since the day we started moving from two-tier to multi-tier IT architectures we have seen a growing complexity around how we integrate applications, data, etc. Through this we see a growing number of 'moving parts' emerging in every implementation stack. It increases flexibility, once up and running it will make deployment of changes a lot easier, it even helps with scalability, cost control and ..., but it becomes pretty complex.


The so-called middleware market (covering hubs, connectors, real-time services, etc.) has been growing a lot and quite often I see that we need integrators to integrate the integrators (just to add to the complexity) and that's why it is important to really work on you architecture.


The rules are actually quite simple (and very old fashioned)
  • Keep it simple (and this is rule nr 2 and 3 as well)
  • Standardise - try to have one technology for the messaging and data transformation
  • Do portfolio management (limit versions, vendors, etc.)
  • And continue with keeping the overview of how data flows from one place to the other. This will help with challenging if another point to point interface is needed (why not channel it via a hub?)
  • Always challenge the next middleware technology, because quite often there are simpler solutions possible (e.g. if you can solve it with data integration in the database, why have a messaging hub?)
  • And if you have to choose a technology, than choose an open standard using XML, because that will keep your options open. Especially for integration with the outside world there is a lot of merit of using XML, since it is self-describing.
If you don't do these things you will see spiralling costs, the need for consultants with exotic skills and at the end of the day an unmanageable architecture ...

Labels: , ,

Friday, June 01, 2007

Information Architecture

In my previous post I mentioned the word 'architecture' and this is a bit of an elusive concept in Information Management. What does it mean?

One of the most common references in this field is the Zachman framework - a very complete framework that covers data, applications, infrastructure, people, processes and even motivation in all its aspects (from high level to detailed implementation). I think it is a great concept for understanding all aspects of IM&T and it covers a lot I have been writing about.

If we just focus on the data architecture, than the framework is a bit large. It is very easy to lose track in all the things that need to be analysed and documented (before you know it you spend more time on the framework than on improving data management - paralysis by analysis), so therefore I would like to focus on the four main elements of information architecture that are important to me:

- Master reference data: large organisations need to establish as much as possible the master sources for their key objects. So one place to manage people information, product information, customer information, etc. From these master sources this information can be shared. Master data requires common definitions and clear ownership of information.

- Middleware and data integration: large organisations need to define clearly how information flows from system to system. This is to avoid spaghetti integration. Different integration concepts are possible (via a central data store, via a middleware layer, etc.)

- A supporting Data Management organisation: An architecture needs to be owned & maintained. Just like a garden needs a gardener. The Data managers take care of establishing the blueprint, the standards, ensure quality is measured & improved and obviously they take care of the day to day operation of the data stores and interfaces.

- A high level story: To establish all these things requires sustained data management investements and this can only be achieved with sufficient sr management commitment. So the data architect also needs to have a high level story on what architecture can achieve and what improvements (successes) have been made.

Labels: , ,