Structure of a relational database is (normally) described in the data dictionary. Fortunately, SQL-99 has provided us with a standard mechanism (INFORMATION_SCHEMA) to access this information. Unfortunately, INFORMATION_SCHEMA does not cover everything required to load and synchronize the volatile data dictionary information with the metadata repository.

Transferring metadata into the repository is a three-step process, involving extraction from the source database, transformation to suitable format and, finally, loading into the repository. XML is used as the ‘transport media’ to intermediate between a source database and the database housing the MMX repository instance. Serialized file as a transfer mechanism has several advantages over direct database connections between two servers, namely:

  • transparency: no need for ‘drilling holes’ into layers of firewalls, DBAs and corporate rules;
  • asynchronicity: data extraction and loading need not take place simultaneously;
  • platform independency: not reliant on details of specific database and OS platforms. 

Metadata gets extracted from a data dictionary by executing SQL queries against INFORMATION_SCHEMA views (joined with some additional technical metadata) and are designed to output their results as serialized, well-formed XML. These queries have to be customized for each database platform considered according to the details of XML support and technical metadata storage provided by the platform. Metadata tags originating from a data dictionary are translated to fit the Relational Database Metamodel in MMX. This transformation is controlled by a XSLT stylesheet providing all necessary tags and classificators needed for load processing. A single XSLT stylesheet is sufficient to handle all different metadata transformations for data dictionary information from an arbitrary relational database. XSLT 2.0 is used for transformation. Transformed XML file contains all the information, including necessary meta-metadata, to be loaded into metadata repository. This process employs ‘XML shredding’ technique and is implemented in form of an SQL stored procedure. One procedure is capable of handling the processing of all metadata originating from a data dictionary, however due to differences in XML support details between different database servers the procedure is unique for each database platform.

INFORMATION_SCHEMA views identify the data objects with the combination of  catalog_name, schema_name and object_name. This uniform hierarchy is used to construct a standard URI to reference the original database object. The reference URI takes the form:


where <database> is either an IP address or a name uniquely identifying the database server; <catalog_name>,<schema_name> and <object_name> identify a database objecthierarchy and are provided by a INFORMATION_SCHEMA view; <additionalelement(s)> is required in case <object_name> is not on the last level of objects requiring identification, eg. <column_name> is required for column metadata.

In case the source metadata is provided in form other than direct extract from adatabase, eg. text file or spreadsheet, customised processing is required to convert the data definitions to well-formed XML format.