mmx metadata framework
...the DNA of your data
MMX metadata framework is a lightweight implementation of OMG Metadata Object Facility built on relational database technology. MMX framework
is based on three general concepts:
Metamodel | MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The model consists of only a few abstract entities... see more.
Access layer | Object oriented methods can be exploited using inheritance to derive the whole data access layer from a small set of primitives created in SQL. MMX Metadata Framework provides several diverse methods of data access to fulfill different requirements... see more.
Generic transformation | A large part of relationships between different objects in metadata model are too complex to be described through simple static relations. Instead, universal data transformation concept is put to use enabling definition of transformations, mappings and transitions of any complexity... see more.

The Quest for MOF Compliance

July 20, 2010 19:01 by marx

MMX Framework relies heavily on the concept of 'lightweight implementation of MOF'. This article will outline the mapping of MOF elements to MMX Framework classes and the principles of implementing MOF models in MMX Repository.

MMX Framework is NOT an academic undertaking run for purely scientific purposes in a closed lab on a desert island. It is being developed and maintained for practical purposes in a very pragmatic manner. Therefore reflecting every single MOF feature and artifact has not been a goal: it is designed with minimal overhead and minimal footprint in mind, hence the term lightweight. Also, MMX Framework is intended as a platform for metadata repository, therefore all behavioral aspects of MOF as well as the Factory Class have been deliberately omitted. Further, as the power of Complete MOF (CMOF) is not required for our purposes we restrict ourselves to the power and expressivness of Essential MOF (EMOF). So, for starters, here's a complete diagram covering EMOF:

M2 layer of MMX physical model consists of just three entities: MD_OBJECT_TYPE, MD_RELATION_TYPE and MD_PROPERTY_TYPE. The centerpiece of the model, MD_OBJECT_TYPE, maps to NamedElement concept of MOF 2.0 (ModelElement in MOF 1.4). Everything derived from NamedElement is mapped to one of the classes in MMX Core Metamodel. The following table lists all EMOF elements with their respective properties and how they are implemented and mapped against MMX data model. The mappings are presented in informal way and are expressed either as MMX data model objects (physical tables, columns) or as MMX Core Metamodel elements (classes, attributes).

EMOF MMX implementation Comments
Element Core::MMXClass For formal reasons, there's a root class that everything else descends from that corresponds to MOF Element.
Tag Core::Annotation Root level metaproperty (md_property_type) inherited by any MMX class.
+name md_property_type.property_type_nm  
+value md_property_type.default_value_ds   
NamedElement Core::<any class>  MOF Named Element is mapped to any class on MMX M2 layer 
+name md_object_type.object_nm   
Class  Core::<any class>   MOF Class is mapped to any class on MMX M2 layer 
+isAbstract  md_object_type.abstract_class_ind   
+ownedAttribute    Expressed through md_object_type - md_property_type relationship (1:N).
+superClass Core::SuperClass  Root level metarelation (md_relation_type) expressing containment, inherited by any pair of MMX classes 
+ownedOperation Not applicable   
Package  Core::Metamodel  Package denotes a namespace 
+uri md_object_type.reference_ds  Represents namespace URI 
+nestedPackage Core::SuperClass  nestedPackage/nestingPackage composition is expressed via a root-level metarelation applicable to any pair of MMX classes.
+nestingPackage Core::SuperClass see previous 
+ownedType  Not applicable  Expressed via MOF Type.package property 
Type  md_object_type.object_type_cd  Type is mapped via its subtypes (Class and DataType) 
+package  md_object_type.parent_object_type_cd  Package and Class are related via inheritance (parent-child) relationship 
TypedElement  Not applicable  TypedElement is mapped via its subtype, MultiplicityElement 
+type  md_property_type.data_type_cd This attribute is inherited and finally implemented by Property element 
MultiplicityElement  Not applicable MultiplicityElement is mapped via its subtype, Property. Other subtypes of MultiplicityElement (Operation and Parameter) are not implemented 
+isOrdered Not implemented yet   
+isUnique  Not implemented yet   
+lower  md.property_type.mandatory_ind  lower = 1 is expressed via mandatory_ind = TRUE. lower = 0 is expressed via mandatory_ind = FALSE
+upper md_property_type.multiplicity_ind  upper > 1 is expressed via multiplicity_ind = TRUE. upper = 1 is expressed via multiplicity_ind = FALSE 
Property  md_property_type   
+isReadOnly Not implemented yet   
+default  md_property_type.default_value_ds   
+isDerived  Not implemented yet   
+class  md_property_type.object_type_cd  Expressed through md_object_type - md_property_type relationship (1:N) 
+isID  Not applicable  Everything implemented in MMX data model always has a unique ID, this is built in 
+isComposite Not applicable  isComposite denotes hierarchical property. In MMX data model hierarchical properties are expressed via parent-child relations 
+opposite  Not applicable  opposite applies to MMX relationship types. Associations between MMX properties are not possible 
DataType  Core::DataType  MOF has 4 primitive types (Boolean, String, Integer and UnlimitedNatural) that are mirrored by corresponding MMX Core data types 
Enumeration  Core::DataType::Enumeration  Enumeration is a MMX Core data type
+ownedLiteral  Not applicable  Expressed via MOF EnumerationLiteral.enumeration property 
EnumerationLiteral  md_object domain value instances  EnumerationLiteral values are implemented as instances of their corresponding Enumeration class, on M1 level
+enumeration md_object.object_type_cd  Enumerations (Value domains) and EnumerationLiteral values (domain values) are associated via md_object.object_type_cd of instance records referring to their enumeration class record in md_object_type 
Factory Not applicable   
+package Not applicable   
Operation  Not applicable   
+ownedParameter  Not applicable   
+raisedException  Not applicable   
+class  Not applicable   
Parameter  Not applicable   
+operation  Not applicable   

Note. MMX (as well as EMOF) does not support visibilities. All property visibilities expressed in the UML MOF model are ignored and assumed to be public. 

The MOF specifies three capabilities that add-on to the modeling constructs from UML Infrastructure:

  • Reflection: Allows discovery and manipulation of metaobjects and metadata
  • Identifiers: Unambiguously distinguishes objects from each other
  • Extension: Allows dynamic annotation of model elements with additional information
As stated in MOF Specification: "In a MOF context, an object's class (i.e it’s metaobject) reveals the nature of the object - its kind, its features. The Reflection Package allows this discovery and manipulation of metaobjects and metadata." MMX supports Reflection through separation of M2 and M1 layers (Classes - Instances, Metaobjects - Objects) and instantiation association (via object_type_cd) between the two. Thus, manipulation and navigation of M1-level instance objects based on knowledge stored on M2 level is made possible.

"An element has an identifier in the context of an extent that distinguishes it unambiguously from other elements." MMX Framework is built on database technology so naturally everything in it has a unique key. Every element (class or instance) has actually two Identifiers: one in the form of a database table key, another as a unique URI. On M2 level the table key is a primary key column (SMALLINT) with unique values, and the URI is constructed based on the location of a class in the overall class hierarchy. On M1 level the table key is a primary key column (INTEGER), construction of the unique URI is the responsibility of an application that creates class instances.
 
"It is sometimes necessary to dynamically annotate model elements with additional, perhaps unanticipated, information." For dynamical annotation of models (Extension) Tag element of MOF is implemented as a top-level metaproperty of MMX root class. This property gets inherited by any MMX class and can be used to associate a collection of name-value pairs with model elements.


MMX G4 Design Rationale

July 11, 2010 14:00 by mmx

The primary goals for the next major revision (G4) of MMX Metadata Framework are:

  • OMG MOF compliance, providing facilities to map metamodels created in MOF to MMX Data Model and realize those metamodels in MMX Repository;
  • XML Schema compliance, providing methodology for realization of any metamodel expressed as an XML Schema;
  • support for very basic workflow functionality directly in Core MMX Metamodel;
  • inclusion of Dublin Core attributes and simplified RBAC support directly in MMX Core Metamodel.  

Changes from G3 to G4 in MMX Physical Data Model

Data column Change Comments

md_relation_type.association_type_cd

remove UML association types (association, aggregation, composition) are supported via containment_ind and reliance_ind flags in G4
md_relation_type.semantic_type_cd remove Semantics of a relationship is expressed directly in relationship type in G4
md_relation_type.transitivity_ind
md_relation_type.symmetry_ind
md_relation_type.reflexivity_ind
remove Deprecated. Not supported in G4.
md_relation_type.containment_ind add Indicates that this is a 'contains' association ('aggregation' in UML terms).
md_relation_type.reliance_ind add Indicates that this is an 'owns' or 'relies on' association ('composition' in UML terms). Note that an 'owns' association is also a 'contains' association so to properly indicate an UML composition both flags should contain True. 
md_property_type.domain_cd change Refers to an Enumeration class in metamodel. Replaces domain_type_cd from G3, that combined the meaning of both domain_cd and datatype_cd. 
md_property_type.datatype_cd add Refers to one of the Datatype classes in Core Metamodel. 
md_object.public_ind add Indicates that the object is publicly visible and RBAC permissions for the object are not to be checked. 
md_object.mutable_ind add Indicates that the object can be changed. False would freeze the object. 
md_object.workflow_num add Can be used to tag an object with a workflow state. Note that there are no universal states defined: an application is free to interpret this figure however it chooses to.
md_object.security_id remove Deprecated. Simplified RBAC implementation in G4 does not provide support for RBAC Object concept, G4 RBAC API functions realize this functionality in a more direct way instead. 
md_poperty.domain_id remove Deprecated. In G4 Enumerations are tracked on M2 level.
md_property.value_id add In MOF, a property can be used to identify an object instance. In that case this field would contain object_id of an object. 

Note 1. Enumerations are treated as Core Metamodel classes derived from DataType class and are realized on M2 level. EnumLiterals are realized as instances of one particular Enumeration class and are therefore stored on M1 level, ie. md_object.object_type_cd points to an object_type_cd on M2 level (md_object_type.object_type_cd).

Note 2. Core Metamodel provides very simple facilities to implement basic workflow functionality. These simple facilities come in form of state_ind, public_ind, mutable_ind and workflow_num fields of md_object table that, in combination with published_, edited_, created_ and changed_ timestamps are sufficient to handle basic workflow management needs. The details of a specific workflow implementation are left to an application developers.

Note 3. A simplified, 'low-calorie' RBAC implementation is part of Core Metamodel. User, Role and Permission are implemented as classes of RBAC metamodel. However, RBAC Object is realized as a property of a Permission object. This allows Permissions with multiple Objects, and both class-based (referring a class in M2) and object-based (referring a concrete object in M1) permissions are possible, even in mixed manner. Similarly, RBAC Operation is realized as a property of a Permission object, with enumerated value list stored in M1 as EnumLiterals. Again, interpretation of these Operation tokens is up to an application 'owning' those operations. Finally, in addition to standard RBAC features classes Privilege and Pattern provide additional functionality required to build easy-to-use permission management applications. Privilege acts as a template for Permission objects, Pattern as a template for Role objects.



XDTL Runtime: Alive and Kicking

March 22, 2010 11:58 by marx

First, a quick recap. XDTL (http://xdtl.org) is an XML based descriptional language designed for specifying data transformations from one database/storage to another. XDTL syntax is defined in an XML Schema document. XDTL Runtime is a lightweight ETL runtime environment that efficiently and with zero overhead handles most of the typical ETL needs. XDTL Runtime can generate SQL or SAS scripts (or any other executable instructions for that matter) based on templates processed by a template engine.

Now, the 'news' part. XDTL Runtime Version 1.0 (XDTL RT 1.0) is finished and running live! The runtime is written in Java (Sun JRE 1.6 required) and uses Velocity (http://velocity.apache.org/) for template processing. So here's a short primer.

There are two individually addressable units of execution in XDTL: a package is a container containing tasks, both of them can be invoked by name. A task consists of steps denoting individual commands that are executed sequentially and cannot be addressed individually. Besides tasks, a package contains three collections: parameters, variables and connections. As in XSLT, $ denotes dereferencing: during execution, everything starting with $ is dereferenced and substituted with a value. In addition, everything between { and } is treated as a JavaScript expression and evaluated.   

There are file commands and database commands in XDTL. File commands usually launch operating system processes and are designed to handle files (move, compress, transfer etc.). File commands in XDTL RT 1.0 are:

get: downloads a file with a file transfer utility (ftp, scp, sftp)
put: uploads a file with a file transfer utility (ftp, scp, sftp)
unpack: unpacks a file with an archival utility (zip, arj, tar etc)
pack: pack a file with an archival utility (zip, arj, tar etc)
strip: cleanse a text file, eg. with a stream/regex utility (sed etc)
move: move files to another directory, usually for archival with timestamp
clear: remove everything related to a file/task from a working directory
log: adds a line to standard log output
exec: executes an arbitrary operating system command line (shell) command

Database commands control database operations:

read: transfers a text file into a database table, usually in staging area
write: transfer a database table into a text file
load: configured to load a file into a database table with a bulk load utility
dump: configured to dump a database table into a file with a bulk dump utility
transaction: wrapper for transaction processing
query: executes a database command (an SQL statement, a SPARQL query etc.)

Then come some control flow commands:

call: invokes another package or another task passing required parameters
if: adds basic conditional control

Finally, while the file and database commands constitute the backbone of XDTL, it's heart and soul are mappings and render commands:

mappings: define or retrieve metadata used to instantiate a procedure (SQL, SAS etc.)
render: merges a code template with corresponding mappings to produce executable SQL statement(s), SAS procedure(s) or any other form of executable code

Mappings and templates are fully decoupled and 'know nothing about each other': it's only during the rendering step that they meet each other to merge into an executable script or set of instructions. This enables a lot of novel opportunities: a specific set of mappings can be used with different templates for different purposes, a specific template can be reused many times with different mappings to handle different data sources, a mapping representing some business logic can be ported to another platform by rendering with another template etc.  Splendid!