mmx metadata framework
...the DNA of your data
MMX metadata framework is a lightweight implementation of OMG Metadata Object Facility built on relational database technology. MMX framework
is based on three general concepts:
Metamodel | MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The model consists of only a few abstract entities... see more.
Access layer | Object oriented methods can be exploited using inheritance to derive the whole data access layer from a small set of primitives created in SQL. MMX Metadata Framework provides several diverse methods of data access to fulfill different requirements... see more.
Generic transformation | A large part of relationships between different objects in metadata model are too complex to be described through simple static relations. Instead, universal data transformation concept is put to use enabling definition of transformations, mappings and transitions of any complexity... see more.

XDTL: An Architectural Perspective

October 11, 2009 21:13 by marx

XDTL stands for eXtensible Data Transformation Language (see the previous post). This is an XML based language for describing data transformations most often utilized in loading data into data warehouses, or building complex data processing tasks consisting of series of data manipulations. XDTL language definition (XML Schema) has its namespace defined here: 

xmlns:xdtl="http://xdtl.org/xdtl"
xsi:schemaLocation="http://xdtl.org/xdtl xdtl.xsd" 

Note: The schema is non-normative and only provided as means to validate XDTL instances and, as such, is naturally subject to change as the language evolves.

(1) Runtime Engine interpreting XDTL scripts. XDTL is just a language used to build the scripts describing data transformations, so it needs an execution mechanism to (pre)process and run those scripts. An XDTL engine (interpreter) assembles the scripts, mappings and templates into a series of executable commands basically consisting of file and database operations and runs them. There can be more than one XDTL runtime, each one designed for its own purpose and implementing a specific subset of the language definition. An XDTL runtime could also be embedded into another system to provide the low-level plumbing for an application that has to accomplish some ELT functions internally. 

(2) Mappings, stored either in MMX Repository or directly inside XDTL script. Mappings' concept is based on the ideas laid out in [1]. Mappings express the structural dependencies and data dependencies between data 'sources' and 'targets' during different stages of a transformation process and "...describe all data movement-related dependencies as a set of Mapping instances. One instance represents a "single-stage" link between a set of source Entity instances and a set of target Entity instances where every entity plays only a single role, either source ot target." [1] There are three basic types of mapping instances: Filter-Mapping (corresponding to SQL's WHERE and HAVING clauses), Join-Mapping (JOINs) and Aggregate-Mapping (GROUP BYs).

Implementation of the mappings concept in XDTL involves a set of four collections: Sources (covering source tables), Target (target table), Columns (column mappings accompanied by IsJoinKey, IsUpdateableColumn etc.) and Conditions (conditions used in JOIN, WHERE and HAVING clauses). Mappings are either imported from MMX Metadata Repository in XML format during execution, included from an external file or explicitly defined in the XDTL script. An arbitrary number of mappings can be cascaded to express transformations of very high complexity. Storing mappings in the Repository opens up endless opportunities for using the same information in various other applications, eg. Impact Analysis or Data Quality tools.

(3) SQL templates turned into executable SQL statements. Being an ELT language, SQL statement represents its single most important functional part. The big question with SQL automation is: how far you want to go with substituting SQL code with something more abstract? In theory, you could decompose all your SQL statements into series of binary expressions, store them away, and assemble into SQL statements again as needed, driven by syntax of one particular SQL dialect. However, usually this approach fails to produce something useful - mostly because SQL is too loosely defined (considering all those vendors and dialects) and trying to cover everything just results in a system too cumbersome for anyone to use. The result would be a sort of 'metaSQL' describing 'real SQL' that is probably an order of magnitude more complex to maintain than hand-coded statements, even with several parallel code bases. And that's exactly what the developers would do: invent a mechanism to bypass the abstract layer and hand-code SQL directly.

(4) Template Engine. Based on our experience (and tons of ETL code written) we have extracted a set of SQL 'patterns' common to ETL (ELT) tasks. The patterns are converted into templates for processing by a template engine (Velocity [2] in particular). Template engine merges patterns and mappings into executable SQL statements. So instead of going as deep as full decomposition we only separate mappings (structure) and template (process) parts of SQL. This limits us to only a set of predefined templates but new ones can always be added. The templates are generic and can be used with multiple different mappings/data structures. The mappings are generic as well and can be used in multiple different patterns/templates. Template engine 'instantiates' mappings and templates to create executable SQL statement 'instances'.

As an example, provided with proper mappings, this simple template 

...
#macro( insertClause $Cols )
#foreach ($col in $Cols )
 $colTarget#if( $velocityHasNext ),#end
#end
)
#end
 
#macro( selectClause $Cols )
#foreach ( $col in $Cols )
 $colSource#if( $colType )::$colType#end AS $colTarget
#if( $velocityHasNext ) ,#end
#end
#end
 
#macro( fromClause $Srcs $Conds )
#foreach ( $cond in $Conds )
#foreach ( $src in $Srcs )
#if( $condId == $velocityCount )
 $srcName $srcAlias ON $condExpr
#if( $velocityHasNext ) 
JOIN#end
#end
#end
#end
#end
 
#macro( whereClause $Conds )
#foreach ( $cond in $Conds )
#if( $velocityCount > 1 )AND #end
 $condExpr
#end
#end
 
## generic insert-select statement
#set ($tgt = $Target)
INSERT INTO $tgtName (
#insertClause ( $Columns )
SELECT 
#selectClause ( $Columns )
FROM 
#fromClause ( $Sources $Conditions )
WHERE 
#whereClause ( $Conditions )

...

would produce the following SQL statement:

INSERT INTO call (
 cust_a, cust_b, type, length, date)
SELECT
 a.cust_id AS cust_a
 , b.cust_id AS cust_b
 , call_type::integer AS type
 , call_length::integer AS length
 , call_date::date AS date
FROM
 cdr c
JOIN customer a
 ON c.phone_a = a.phone_no
JOIN customer b
 ON c.phone_b = b.phone_no
WHERE
 c.phone_a IS NOT NULL

This pretty simple template can actually produce a lot of quite different (and much more complex) SQL statements all following the same basic pattern (insert select from multiple table join) which is probably one of the most frequent ones in ELT processes. Of course, in an ideal world, a runtime engine would also have zero footprint and zero overhead, support multiple platforms and multiple SQL dialects...

[1] Stöhr, T.; Müller, R.; Rahm, E.: An Integrative and Uniform Model for Metadata Management in Data Warehousing Environments, 1999. 

[2] http://velocity.apache.org/ 

 



MMX Access Control: RBAC API

July 18, 2009 21:32 by mmx

MMX Framework has built-in Access Control support as part of its Core Metamodel (http://mmxframework.org/post/2009/06/30/Access-Control-Implementation-in-MMX-Framework.aspx) based on RBAC System and Administrative Functional Specification (see ANSI INCITS 359-2004, Role Based Access Control). As stated in this document, "The RBAC Functional specification specifies administrative operations for the creation and maintenance of RBAC element sets and relations; administrative review functions for performing administrative queries; and system functions for creating and managing RBAC attributes on user sessions and making access control decisions."

The following set of functions is defined on Core/Hierarchical RBAC level. 

Administrative Commands (see Note 1):

AddUser(user)
DeleteUser(user)
AddRole(role)
DeleteRole(role)
AssignUser(user, role)
DeassignUser(user, role)
GrantPermission(object, operation, role)
RevokePermission(operation, object, role)
AddInheritance(r_asc, r_desc)
DeleteInheritance(r_asc, r_desc)
AddAscendant(r_asc, r_desc)
AddDescendant(r_asc, r_desc)

System Functions (see Note 2):

CreateSession(user, session)
DeleteSession(user, session)
AddActiveRole(user, session, role)
DropActiveRole(user, session, role)
CheckAccess(session, operation, object): BOOLEAN

Review Functions:

AuthorizedUsers(role): USERS
AuthorizedRoles(user): ROLES
RolePermissions(role): PERMS
UserPermissions(user): PERMS
RoleOperationsOnObject(role, obj): OPS
UserOperationsOnObject(user, obj): OPS
SessionRoles(session): ROLES (see Note 2)
SessionPermissions(session): PERMS (see Note 2)

MMX Framework RBAC API implements AuthorizedUsers, AuthorizedRoles, RolePermissions, UserPermissions, RoleOperationsOnObject and UserOperationsOnObject as Table Functions receiving MMX object identifiers of a proper type as parameter(s) and returning tables (rowsets) as values. Oracle implementation is contained in package MMXAC.

An additional function, object_types(obj) that is not part of the RBAC Functional Specification converts an RBAC object into a list of MMX object types (classes) denoted by a single RBAC object identifying an MMX object type or object type hierarchy. 

Note 1: The functionality of Administrative Commands is provided by MMX Administrative UI application (eg. MMX Metadata Navigator).

Note 2: RBAC Sessions are temporary in nature and are not supported by MMX Framework.  

Here's the implementation details on Oracle platform (package header) for the record: 

FUNCTION authorized_users (role_id IN MD_OBJECT.object_id%TYPE)
RETURN user_table PIPELINED;

FUNCTION authorized_roles (user_id IN MD_OBJECT.object_id%TYPE)
RETURN role_table PIPELINED;

FUNCTION role_permissions (role_id IN MD_OBJECT.object_id%TYPE)
RETURN perm_table PIPELINED;

FUNCTION user_permissions (user_id IN MD_OBJECT.object_id%TYPE)
RETURN perm_table PIPELINED;

FUNCTION role_operations_on_object (role_id IN MD_OBJECT.object_id%TYPE, obj_id IN MD_OBJECT.object_id%TYPE)
RETURN op_table PIPELINED;

FUNCTION user_operations_on_object (user_id IN MD_OBJECT.object_id%TYPE, obj_id IN MD_OBJECT.object_id%TYPE)
RETURN op_table PIPELINED;

Note 3. authorized_users, authorized_roles, role_permissions and user_permissions without parameter return the full list of their respective RBAC class instances.


 



Versioning

July 3, 2009 21:11 by marx

MMX Framework supports versioning on three different levels: data, model and metamodel. Which type of versioning is most appropriate in which case depends on the purpose and context of the specific case, requirements and scope of the application etc.

On data level, every instance (M1) level entity (object, relation or attribute) is equipped with built-in timestamp structure, version number and status indicator, combination of which has the capability to encode the history of events concerning the entity. Managing these histories - instantiating multiple versions of an object, querying for event history etc. - is the responsibility of an application or service. Data level versioning treats an object as an isolated entity and pays no attention to its properties or relationships. This is the simplest form of managing versioning information but it is sufficient in most cases as usually only the latest (current) incarnation of the entities are of interest to the consumers of metadata.

On metamodel level, the contents, structure and relationships subject to versioning process are part of the metamodel (M2) and are defined as special classes dedicated for this purpose. Sometimes these version classes can be descendants of a 'main' class, inheriting all or part of its properties, but usually they are completely independent classes 'on their own right' (eg. Statistical Activity/Statistical Activity Instance and Classification/Classification Version in Neuchâtel Terminology model). How to treat metamodel level versioning is completely up to the metamodel, how and why the versioning classes are defined and the purpose and context of the application.

Things get more interesting on model level. Maintaining versions of a model object requires keeping track of related properties and relations as well based on the context of the object. To create a version of an object an application or service has to take the following steps:

- create a copy of an object (MD_OBJECT) with new id and identical values of the attributes that do not change with the new version;

- add a suffix with consequent unique number to object name to distinguish between different versions (name[1], name [2] etc.); 

- create copies with identical values of all properties and property hierarchies (MD_PROPERTY) of the object being versioned;

- create copies of all relations (MD_RELATION) with any direction of the object being versioned. 

- add an instance of special relation type PreviousVersion (inherited from MMX Core) linking the new object instance to the previous version; 

Model level versioning does not apply in case of a class with subclasses as it is assumed that in that case versioning takes places on the lowest subclass level.