Meta-Data:
Data about data. In terms
of data warehouse it stores information about the structures of both source and
destination data and how to extract, transform, and load data.
It may also maintains network configuration information like ip-addresses
and ports. The meta-data coalition
http://www.mdcinfo.com/
Metadata architecture:
1. Data dictionary (Not Complete
and Too Detailed).
2. Source systems.
3. Connection information.
4. Transformation information
including information on how data is summarized (View Source)
5. Mapping of structures in source
systems to structures in data warehouse systems.
6. Scheduling of transformation
activities.
7. Log information about ETL processes.
8. Exclusionary rules, what data is included and excluded.
9. Aliases that other users might use to think of the data as.
10. Versioning and Change control and logging.
Oracle Data-Dictionary:
Relational tables stored on the system tablespace that hold information
about the Oracle database and current instance. The data-dictionary holds
alot of interesting information but has too problems, one is that it is not
complete and it is too detailed. First it holds only technical detail
information about the database which is not all the information that the user
needs and is not necessarily in a format that the user can understand. The
data dictionary can be extended with the use of Oracle Comments.
SQL> COMMENT ON TABLE customers IS 'Source is CUSTMSTR file within the ORDPROC system.';
SQL> COMMENT ON COLUMN customers.customer_name
IS 'Name
editied to all capital letters during load'
SELECT t.table_name, tc.table_type
FROM all_tables t, all_tab_comments tc
WHERE t.owner = tc.owner
AND t.table_name = tc.table_name(+);
Comments have there limitations including constraints on size, what can be commented on, editing, and it is a manual process. Comments are limited to 4000 characters and is limited to one comment per table or column. They can only be on tables, views, or columns.
Another way to extend the data-dictionary is to create additional tables for metadata information. This has less limitations than using comments, however this is still a manual process.
Many ETL/ETT tools do a much better job and come metadata repositories. While ETL/ETT tools can be expensive to buy and learn, my experience includes custom writing ETL programs in Pro*C and using these tools. The ETL/ETT tools definitely save a lot of time.
Metatdata Repository: Central location for information on all of the above. Will either contain the information itself or knowledge on how to get it. For example it will be able to query the Oracle data-dictionary, parse network configuration files, scheduling files, and/or etl source code. Since meta-data will come from various locations the ability of the various sources to communicate is a make or break issues for a true Metadata Repository.
Documentation:
Much of the documentation is online in the system catalogs and configuration
files and many ETL tools will know about all these locations and can manage them
for you. However some will require
manual maintenance. The project
manager, data analyst, and/or technical writer is usually responsible to
manually maintained documentation. The
other documentation is maintained through use and configuration of the dw
itself. The problem with ad hoc hacking is that
it is not documented or planned out very well.
Metadata repositories and operations force you to plan it better, it is
also self-documenting.
Models and MetaModels: A picture is worth a thousand words. That is basically what a model is. It gives us a visually representation of something we are trying to understand. We understand things better if we can see them. Some common MetaModels are:
And One offered by IBM and Oracle based on the OLAP councils Meta Data API(MDAPI 2.0) which is also based on CORBA: http://www.olapcouncil.org/
MDAPI
2.0 is a industry-standard API for interfacing with OLAP.
CORBA Common Object Request Broker Architecture is an industry standard
architecture for interfacing with Objects across a network.
CORBA is a generic API, MDAPI 2.0 is a specific type of API for interfacing with
OLAP objects.
Metadata Interchange Specification (MDIS): A standard to allow different vendors to exchange their meta-data, it is governed by The Meta-Data Coalition It has over a hundred members including most of the heavy hitters in this industry. The latest version of the specification is available at www.mdis.com The Meta-Data coalition has recently merged with the Object Management Group. A standard being promoted by IBM and Oracle is available at www.onemeaning.com
Query/Reporting Tools: Such as Crystal Reports may also provide some meta-data, particularly on how to query the data-warehouse.
CASE (Computer Aided Systems Engineering) Tools: Such as ERWin and other ER-Diagram tools hold alot of metadata about the design and structure of a database, however this tools are of limited use to data-warehouses and much of that same data is available from the data-dictionary. In fact many ERD case tools use the data-dictionary as their primary metadata. CASE tools specially designed for data-warehouses are inevitable, however by then metadata will all be standardized on XML.