Strategic Insights, Inc.
  • Home
  • About Strategic Insights
  • Data Reintegration Methodology
    • Data Modeling Flaw Causes Disparate Data
      • The Vision of Data Integration
        • Slash Data Integration Costs
          • ETL - Tedious, Complex, Expensive
            • ETL Data Integrations Are Obsolete
            • Data Integration Architectures
              • Data Architecture Standards
                • Disparate Data System Integration
                  • Data Federation Architectures
                    • Master Data Management Architecture
                      • Integrated Data Warehouse
                        • Integrated Data Architecture
                        • Data Model Integration
                          • Data Model Recasting
                            • Commonality Relationships
                            • Database Integration
                            • Products & Services
                            • Blog
                            • Contact Us
                            Database Design Deficiencies Cause Information Silos 01/16/2012
                            24 Comments
                             
                            The reason databases currently are not data integration friendly is simply because they are not design to be data integration friendly!  Currently, data integration is not a database design consideration.  Most data models are independently designed and result in the definition of  heterogeneous disparate data models.  Each independently designed heterogeneous data model, when instantiated as a database, forms an disparate island of disparate data or if you wish an independent information silo.  For example, ten heterogeneous data models, when instantiated, result in ten isolated islands of disparate data.  These islands of disparate data and information silos are dominant in our current data architectures.  However, the root cause of this data isolation has not been previously defined.
                             
                            Without understanding the root cause of this data isolation, several data integration methods have been developed over the past 25 years.  Now that we understand the root cause of this data isolation, we know that none of the prior art data integration methodologies correct the actual problem.  Since it is our database design methods of defining disparate heterogeneous data models that cause data isolation in our data architectures, the correction of the root cause needs to be a correction of the data modeling methods.

                            The Data Reintegration Methodology is a long overdue modification to the current data modeling methodology.  Amazingly, eliminating the data isolation caused by our heterogeneous data models results in the integration of these previously isolated data models.  That is, all heterogeneous data models that are created or enhanced using our patented Data Reintegration methodology become a part of a single network of integration data models.  Likewise, the databases instantiated from the integrated data models are also integrated provided that these databases are properly populated with data as defined within the Data Reintegration Methodology.  Under these conditions, every Data Reintegration database becomes a part of a single network of integrated databases.  Any database that is enhanced using the Data Reintegration methodology is now integrated with any other so enhanced database.  With the Data Reintegration Methodology, we are simply removing the data isolation artifact of our prior art data modeling and permanently re-integrating the data that never should have been isolated!

                            You are invited to leave comments on this blog.  For more detailed information, contact us and request the Data Reintegration Methodology whitepaper.

                            Data Reintegration is a trademark of Strategic Insights Inc. 
                            The Data Reintegration Methodology is patented by U.S. patent no. 7,979,475 and other pending patents. 
                            ©Copyright Strategic Insights, Inc. 2012.  All rights reserved.
                             


                            Comments

                            Kubilay Tsil Kara link
                            02/04/2012 04:41

                            Kubilay Tsil Kara • Agree, there are not integration friendly because of type of shortcuts taken in past database designs to satisfy other shortcuts taken in application designs. De-duping and data quality are just words spun out to describe bad design and architecture, there is nothing wrong with data is the logic which is flawed.

                            Reply
                            Jeff Voivoda link
                            02/06/2012 20:20

                            I think this most often happens when a corporation or agency creates a new sub-corporation or sub-agency that requires much of the same data as the parent, but slightly different in format or content. Instead of sharing the established data as an enterprise asset, the 'new' group replicates the established data, makes changes / enhancements then struggles to stay in sync with the parent schema. Tah-Da - information silo created!

                            Reply
                            jeff stokes du bose link
                            02/07/2012 11:35

                            Most enterprise data products are organic, growing from an unexamined requirement through small data sets to larger, monolithic silos. By the time these untamed animals become somebody's pet (empire building, @Malcolm), they are usually rigid and are fought with esoteric definitions and implied meta-data.

                            Because of the nuance of almost every data element in such unplanned data stores, what are the chances that these data can ever be reconciled with another, similar-looking data store? In data mining or business intelligence efforts, the work can be done but the overt meta-data and nearly-field-specific data qualifiers must proliferate before one data set can be compared to another with any reasonable confidence.

                            Utilizing a combined data store in this manner would be impossible in a production environment, so the best solution is to let things remain the same and deal with it at the interfaces.

                            Not a good alternative, but the only one that has ever worked in my experience.

                            ... oh, besides the "Great and Universal Master Data Management Initiative," which always dies from overexertion, daily floggings, lack of support, inadequate technical and statistical expertise, and very low expectations for any return. (But it sure keeps some middle manager in grandeur for a few long and terrible months.)

                            Reply
                            Bob Mack link
                            02/07/2012 11:41

                            Thank you for your well written comments. To tell you the truth, 7 years ago, I would have said something similar.

                            So your point is that there is nothing that can be done to resolve data isolation, we should just live with it. Perhaps that was fine under the prior art paradigm. What I am saying is that under this prior art paradigm, we isolated all this data that never should have been isolated.

                            Organizations are collectively paying billions of dollars annually attempting to live with that problem.

                            Under the prior art, we didn’t know how to design and enforce referential integrity between databases. We didn’t know that our universe of data had a finite boundary composed of very specific master data. We didn’t understand that entire databases could be transformed with the addition of four boundary data entities. We didn’t know that there was a permanent solution for data integration that once a database was transformed that it would become a part of a network of integrated databases. We just could not conceive of integrating a database in a week or less. We just took the wrong path!

                            But, the natural state of data is to be integrated. We unintentionally added isolation into our data architectures by ignoring even the most fundamental interactions between our data models. You will be amazed at the difference of working with truly integrated data.
                            The future is very bright for those that think outside the box!

                            Thank you again for your articulate comments.

                            Reply
                            Martyn Jones link
                            02/12/2012 12:26

                            ---- What causes islands of disparate data?

                            What caused it? The way IT evolved.

                            What causes it? Bad IT.

                            Reply
                            Ian Bennett link
                            02/13/2012 23:10

                            A large US based computer game concern has acquired companies in Europe and Asia. These companies all have their own independently developed databases with data models and data in a number of languages including English, French, German, Mandarin and Cantonese. As many of the non-native English speaking companies cater for the English speaking market they typically have a combination of languages in the one data base. Many of the companies sell on-line and whilst many customers have bought from more than one of those companies, each company has identified them independently and much of the data is out of date as the customer data often only gets updated as the customer buys another game. Some of the companies make no attempt to match a customer from one purchase to the next and simply keep a new copy of customer data for each purchase. Some of the companies list game titles in one table with a many to many relationship to the gaming platform. Others consider the game title to be the concatenation of the name and platform so there is no hard link between "GameX Xbox" and "GameX Playstation". Add to this product variants such as "GameX Deluxe Edition" and some games may even be completely renamed for different platforms. (Ever used one of those movie services where they recommend to you the DVD version even though you have just watched the Bluray version?)

                            The English only speaking US execs wish to get regular updates on game sales over all companies by platform by demographic in order to make timely decisions on pricing, promotion, etc and when to cut products. They are also interested in trends on platform usage. For example, if many customers have more than 1 major platform then it could be more profitable to only release games for one platform as the customer can use it either way.

                            If someone has an integration solution to above problem that only takes days, I would certainly like to hear about it.

                            Reply
                            Peter Nolan link
                            02/13/2012 23:29

                            @Robert,
                            you are on the right track. I was selling "application integration" in the late 80s and early 90s when I worked for IBM. We were proposing to people that they have a single integrated data model on DB2 that would serve as the base for the organisations operations.

                            This was an E-xpensive proposition and most customers did not go for it. They went for packages. By the mid 90s it was clear that companies would buy packaged software and NOT build their own applications. With the rise of peoplesoft, oracle apps, and SAP this became clearer by the year.

                            My failure (along with many others) to convince companies to build integrated applications let to my second career of re-integrating the data from these packages back into a single integrate view of the business because such a view is NECESSARY.

                            Since 1995, when I first built a set of templates for such integration programs in cobol the projects I have worked on when using these tools have been MUCH less expensive than competitive offers. In 1997 I did a very large life insurance company in Hong Kong and the services fees, including travel and accommodation, were about USD350K.

                            We were so successful the parent company in Canada asked us to present to them and make them an offer. The IBM Offer was USD25M in services. We offered USD10M. Alas, my companies sales conditions meant my team from Asia Pacfic could not do the deal.

                            There is NO WAY that we are going to have application integration. It is too expensive. Also, if you buy such from SAP etc then you are buying competitive PARITY. Buying ERPs from the vendors can NOT get you a competitive advantage as they can sell the same to your competitors.

                            Since disparate data from different systems is a fact of life data re-integration is a fact of life. And the problem then becomes to perform that data integration as cost effectively as possible over the lifetime of the data integration which is FOREVER. The tool that is the enabler for that is what I have written because it uses the documentation of the data integration rules in the spreadsheet as source code to generate SQL to perform the data integration. (A C++ engine also still exists as it came first).

                            So if you can re-integrate your data at the lowest possible cost why would you consider application integration that will be very expensive.

                            And by the way...this "days and weeks" to perform the documentation of a sizable data integration project is not possible. Why? Because human beings are NOT CAPABLE of understanding the source data and mapping it to a target model in that period of time. Data integration requires a human being to understand the data for the very reason that that data is disparate and therefore can not be integrated by computers.....for the forseeable future.

                            Reply
                            Bob Mack link
                            02/13/2012 23:31

                            Ian: Your hypothetical reminders me of some of my past projects!

                            Data Integration by design is an entirely different paradigm for data integration. As such it is difficult to respond in a few short paragraphs, but I’ll give it a try.

                            The data architecture for this new paradigm was designed from the top down. The data universe, that is, the totality of metadata and of data was our model. It was discovered that integrated data was the natural state of data and that it was our data modeling methods that imparted islands of disparate data into our data architectures. Perhaps more important was the discovery that the data universe was finite and therefore has an actual boundary of metadata and of master data. All data models should include these boundary data entities to anchor the data model relative to all other so designed data models. A new form of entity relationship (commonality relationship) links these boundary data entities between data models. The result is the first ever integrated network of data models. Upon instantiation, the result is an integrated network of databases as long as these boundary database tables are properly populated. The population of master data is very important to this paradigm.

                            Your “hypothetical” is a great example of what can be done with this new paradigm. The data integration by design paradigm focuses upon enhancing the native master data of each database. Each database to be integrated represents a consistent and cohesive data set with the exception that the master data is somewhat deficient. Essentially, with the data integration by design paradigm, each database, in its entirety, is recast by a new standardized set of master data while retaining its native set of master data as well. This recast enhances the master data of the existing database and provides multiple access paths deep into the database.

                            It is important to understand that the governance of boundary master data needs to be provided from outside of the individual organizations in order to transcend the organizational boundaries. The boundary master data providers have much of the information that the native master data sets do not and as such will resolve some of your master data issues.

                            The data integration by design paradigm leaves the existing operational database intact. Because of this data integration in place, most of the issues you enumerate are of no importance to me from a data integration point of view. The boundary data entities are applied to the data models. The data access paths are formed between databases.

                            The network of integrated databases may be considered as a distributed ODS. Now, within each database, may be formed one or more fact tables to support reporting. The fact tables are also integrated across databases as well as with the operational transactions of the parent database. The fact tables from multiple databases may be consolidated into a single database if desired.

                            Since there are no transformations anywhere within the data architecture, all the data sets are integrated at a high-level. More data integration will follow for each database where the data governance is within the scope of the organization.

                            This is my attempt at a quick explanation of a paradigm that is extremely foreign to most data architects and data modelers. This paradigm is based upon numerous new discoveries and certainly a complete understanding on your part is not possible here.

                            May I suggest that you visit my web site and review the information there as well: www.strins.com.

                            Thanks

                            Reply
                            Doug Stacey link
                            02/13/2012 23:51

                            Ok, can't help jumping in here...

                            What causes islands of redundant, disparate data? In my experience its a 'project driven' data strategy. Project Managers and project teams are under extreme pressure to implement a given solution as quickly as possible expending the least amount of money possible. Left to their own devices they'll look at what data they require to implement their project and deem it easiest/quickest to create a database with all their needed data in their own domain and write their application directly against it.

                            The only thing that will stop that is strong and effective governance. Each subject area of data must have a system of record declared. Part of the responsiblity of being the system of record is to expose your data to the rest of the enterprise through web services. Architectural disipline and strong governance are then required to insure that subsequent projects that need that data access it through the services rather than grab a copy and persist it themselves.

                            We've built up these islands over years of effort and they aren't torn down over night. Data Classification and strong governance can turn the tide though and set you on the right path.

                            Reply
                            Michele Ho Lewis link
                            02/21/2012 05:57

                            Can't agree more, Doug.
                            What causes data isolation? bad planning, lack of basic data principles, disiplines, procedures, standardards, oversight/governance.

                            Reply
                            Ronald van Keekem link
                            02/21/2012 06:04

                            Like Michele already said: "Can't agree more, Doug". That's why I gave Doug's analysis a 'Like', although if I look at the root of the problem I should have voted 'DISlike', lol.

                            Reply
                            William Moore link
                            02/21/2012 06:14

                            Hi Rob

                            Islands of Disparate Data - occurs not just because of poor DB Design. I've seen it occur quite a bit when modelers are unable to articulate the business language via the data model. Also, it is not uncommon for modelers to not even make attempts to a) Avoid Semantic Mismatches (e.g., let's just drop items into a structure, regardless of its business meaning or intent....what's the harm?) or b) Not pay attention to terms that are more Application Oriented, than on the same level of grain of the overall Business Model, etc. All of these things contribute to a poor Business/Logical Model, which ultimately leads to a poor DB design.

                            Also, sad to say, there is the element of politics. It is not uncommon for modelers to instantiate terms that are not consistent with the Business Language first, then attempt to force-fit the data requirements into those terms, regardless of whether those terms are truly relevant and representative of the data.

                            Last, but not least, Corporate America does not always have the ability to recognize that IT teams are not truly collaborating and building synergy across teams and within groups; or embracing a 'better solution', when the decision-makers decide to use one that is just 'good enough'.....so where's the incentive?

                            Reply
                            MARCO AURELIO CAVALCANTE RIBEIRO link
                            02/23/2012 07:58

                            Most inconsistent data management policies, insecure managers and directors, even IT pros, absolute no knowledge of integrated data management policies are among the most common causes. The old "this is my data" prejudice...

                            Reply
                            Japie Erasmus link
                            03/04/2012 22:00

                            In my experience lack of data ownership is the lead cause. In some cases abandonment of data ownership. Service Orientated Architecture can solve some of this issues if managed from a data level. This can only work if the quality of the data is part of the data owner’s job description. Data ownership is not a sideline job.
                            On table level you can combine data in single tables but columns cannot easily be shared between owners.
                            I witnessed a group of persons responsible for managing client limits leaving the bank to be replaced by new staff that simply started changing limits without being trained on the system. The chaos that followed was then blamed on the system.

                            Reply
                            Bala Seetharaman link
                            03/04/2012 22:04

                            The key reason can be classified in to four ?

                            Finance:
                            No unified IT budget
                            Try to maximize ROI by sub projects
                            Prove low cost TCO compared with in the business division


                            Governance:
                            Serious data quality issues from the existing system and try to prove some quality improvements
                            No data governance framework
                            Data architect or SME didn't understand the E2E scope?

                            Impact Assessment:
                            Serious functiona/technical/data impact whenever the upstream or downstream upgrade
                            SME or Datastewards didn't consider the sustenance and business continuity impact at all.

                            Audience:
                            business users are not in sync or they wanted this silo model prove the reports are in world class quality and presentation (trying differ. report tools)


                            Many more...

                            Reply
                            Jeff Cohen link
                            03/25/2012 07:34

                            In larger corporations, I think that the teams building solutions simply don't know that data is already governed by the enterprise. Without a comprehensive metadata strategy, how would they know what data standards exist or where identical data is stored?

                            Reply
                            Ian Posner link
                            03/25/2012 09:25

                            Most large companies assign budgets on a departmental basis, as they do goals. In such cases, departmental managers are only interested in meeting their goals and in many cases will only finance IT projects that further this goal.

                            Every large corporation has tension in it, between the forces for centralisation and those for decentralisation. This is as old as the hills. Those in the centre can come up with the strategies, but are incapable of delivering the solutions to the business units as they are too far removed; Those attached to the business users can deliver tactical solutions, but aren't interested in the big picture.

                            The challenge is to meet the demands of business units while coordinating the disparate silos of development where possible or providing achievable enterprise-wide solutions.

                            Reply
                            Andre Linssen link
                            03/25/2012 19:35

                            There are a few problems. But the biggest problem is the business itself and the architecture that it wants to enforce. If a business would have general architecture principles (as Togaf proposes) one could imagine that there would be a principle like "Redundancy of data must be avoided".

                            A principle like this could cause serious extra work in a project, so it depends again on the business how strict it would be enforced.

                            But there's something else that needs to be taken into consideration: applications come and go, but databases grow like a tree. So, it would be a good idea to design your data in a more abstract level. And with each design step it should be taken into consideration how easy it is to make future adjustments.

                            What I am trying to say is that most companies get the IT system they wanted in the past. How sad.

                            Reply
                            Mike P. link
                            03/27/2012 04:23

                            What causes islands of disparate data? As Doug said above - Projects.

                            More specifically, projects in the absence of a strong governance framework.

                            A project manager typically has the blinkers on to meet their project's objectives. Sharing, integration, reuse, standards, etc all add time and cost to the individual project without the project itself perceiving that it gains any of the wider business benefits achieved by following a strategy of integration.

                            Without strong governance, the pm will typically take the short-term view as their responsibility ends when the project ends, and they don't have to live with the long-term pain.

                            Reply
                            Latha N. link
                            03/27/2012 04:25

                            What causes islands of disparate data? Lack of Data Stewardship.

                            Data management policies focus on warehousing, modeling, governance, reporting, etc. How many organizations focus on data stewardship? There are generally no attempts to gather and hold knowledge on the how-what-where-who of data in an organization. It would be nice to have a data steward who can guide the answers to the key questions:
                            What data is needed?
                            Where is the data?
                            Who owns it?
                            How do we get it?

                            Are there any organizations that have created a group of data stewards whose primary role is to "know" where all the pools of data are and how to harness them?

                            Reply
                            Moshe J. link
                            04/16/2012 05:41

                            While much of what has been said faulting IT management and policies is true, the old fact that computerized systems merely model the business is a strong factor here. How many businesses run as consistently and rigidly as good data governance advocates? In many cases that I've seen, different departments constantly are 'doing their own thing' either for expediency or political reasons. Often the underlying enterprise we're trying to model is riddled with inconsistencies and different views of their information needs. How can we hope to accurately model such a world with a consistent view? It's by definition wrong from the get-go!

                            Reply
                            Kim Korab link
                            04/16/2012 05:44

                            What causes islands? Growth in companies, information, software, IT, the need for more and faster information, etc. A company that was growing rapidly in the 1960's probably opted for mainframes, then came servers, then PCs or maybe not quite in that order but you get the drift. There is the big issue of data quality! Also, vendors have provided departmental or business process solutions that are targeted only at one specific subject or process, they were faster than going through the mainframes or trying to consolidate data through traditional IT departments and cheap (off-the-shelf solutions). Then there is data quality! I actually remember the trend being mainframes, then "departmental" data stores, then came the data warehouse, then came business intelligence, then came CRM, then ERP, then back to centralization, returning to business unit specific, etc. etc. It is difficult for most companies to swallow the cost of centralizing millions of terabytes of data so that access is efficient. Then there is data quality!!!!! There is also the "big bang" conversions, migrations, centralizing, de-centralizing, data warehousing, data mart (islands), etc. Then there is data quality!!!! The real answer is simply progress ... and never understanding the value of "good" data and what it takes to it.

                            Reply
                            Terry Alters link
                            04/16/2012 05:46

                            Businesses are always looking for the silver-bullet solution to their technology woes. And usually this ends-up with IT having to support and integrate disparate systems. One healthcare organization I know has a separate system for Hospital Patient Care, Urgent Care Facilities and Physician Practices. None of which are totally integrated and each having their own way of identifying a Patient. Recently this same organization wisely contracted to implement a brand new system to handle all these business areas, but will still have to support the legacy systems until the final switch is pulled years down the road. Businesses seek and depend on reliable and integrated technology solutions to help them function in a highly competitive environment, which usually means they acquire the closest thing that meets their requirements, leaving IT with the daunting task of integrating it with everything else. Disparate databases usually maps to disparate applications, purchased by desperate organizations looking for the final answer to their prayers.

                            Reply
                            Madhu Guttikonda link
                            05/18/2012 04:54

                            I have visited your blog, and the solutions- they are all good- natural state of data is to be integrated- proper use of data models- making them universal is the answer..

                            Reply

                            Your comment will be posted after it is approved.


                            Leave a Reply

                              Author

                              Robert Mack, Ph.D.
                              President of Strategic Insights, Inc. and Inventor of the Data Reintegration Methodology

                              View Robert Mack's profile on LinkedIn

                              Archives

                              April 2012
                              March 2012
                              February 2012
                              January 2012

                              Categories

                              All
                              Data Architecture
                              Data Integration
                              Data Modeling
                              Master Data


                              RSS Feed

                              Follow @RobertFMack
                              Click the Twitter button above to follow us for updates and special offers.

                              Get a free copy of the Data
                              Reintegration Methodology whitepaper
                              by:
                              Adding a blog
                              comment on this page
                               
                              or
                              completing the information request form.

                            Web Hosting by StartLogic