mmx metadata framework
...the DNA of your data
MMX metadata framework is a lightweight implementation of OMG Metadata Object Facility built on relational database technology. MMX framework
is based on three general concepts:
Metamodel | MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The model consists of only a few abstract entities... see more.
Access layer | Object oriented methods can be exploited using inheritance to derive the whole data access layer from a small set of primitives created in SQL. MMX Metadata Framework provides several diverse methods of data access to fulfill different requirements... see more.
Generic transformation | A large part of relationships between different objects in metadata model are too complex to be described through simple static relations. Instead, universal data transformation concept is put to use enabling definition of transformations, mappings and transitions of any complexity... see more.

XDTL Packages in Five Easy Steps: Beginner

November 14, 2010 13:24 by mmx

Continuing from the previous step, this one actually does something useful, namely transfers some stuff from one location to another.

<?xml version="1.0" encoding="UTF-8"?>
<xdtl:package name="loader.test"
  xmlns:xdtl="http://xdtl.org/xdtl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://xdtl.org/xdtl http://xdtl.org/xdtl/xdtl.xsd">

  <xdtl:parameter name="filename" required="1"></xdtl:parameter>
  <xdtl:parameter name="tablename" required="1"></xdtl:parameter>
  <xdtl:parameter name="dbhost" default="localhost:5432" required="0"></xdtl:parameter>
  <xdtl:parameter name="dbname" default="postgres" required="0"></xdtl:parameter>
  <xdtl:parameter name="dbuser" default="postgres" required="0"></xdtl:parameter>
  <xdtl:parameter name="dbpass" default="" required="0"></xdtl:parameter>

  <xdtl:variable name="tmpdir">/tmp/incoming</xdtl:variable>
  <xdtl:variable name="bakdir">/tmp/incoming/bak</xdtl:variable>
  <xdtl:variable name="nocr">-e 's/\r//g'</xdtl:variable>
  <xdtl:variable name="tempdb">staging</xdtl:variable>

  <xdtl:connection name="local" type="DB">jdbc:postgresql://$dbhost/$dbname?user=$dbuser&amp;password=$dbpass&amp</xdtl:connection>

  <xdtl:tasks>
    <xdtl:task name="load_file">
      <xdtl:steps>
        <xdtl:get source="ftp://www.site.com/$filename.zip" target="$tmpdir" overwrite="1"/>
        <xdtl:unpack source="$tmpdir/$filename.zip" target="$tmpdir" overwrite="1"/>
        <xdtl:strip expr="$nocr" source="$tmpdir/$filename.txt" target="$tmpdir/$filename.tmp" overwrite="1"/>
        <xdtl:read source="$tmpdir/$filename.tmp" target="$tempdb.$filename" connection="$local" type="CSV" overwrite="1" delimiter=";" quote='"'/>
        <xdtl:query cmd="SELECT COUNT(*) FROM pg_tables WHERE tablename='$tablename'" target="exist" connection="$local"/>
        <xdtl:if expr="${exist != 0}">
        <xdtl:query cmd="INSERT INTO $tablename
          SELECT t0.* 
          FROM $tempdb.$filename t0
          LEFT OUTER JOIN $tablename t1 ON t1.id = t0.id
          WHERE t1.id IS NULL" connection="$local"/>
        </xdtl:if>
        <xdtl:move source="$tmpdir/$filename.zip" target="$bakdir/$filename.$xdtlDateCode.rar" overwrite="1"/>
        <xdtl:clear target="$tmpdir/$filename.*"/>
      </xdtl:steps>
    </xdtl:task>
  </xdtl:tasks>
</xdtl:package>

In short, a text file is downloaded (get), unzipped (unpack), stripped from CR's with sed (strip) and gets loaded into staging area (read). If the target table exists the rows that are not in the target table already get INSERTed. Finally, incoming file is archived away (move) and the temp files are trashed (clear). All in all, a typical plain vanilla data warehousing task. At this point, some explanations are probably needed.
 
1. The required parameters, filename and tablename get their values in some other (calling) package outside this one. If not, the package get an error.  The rest of the parameters are optional, ie. they might get their values from a calling procedure, if not they fall back to the defaults. 
 
2. The external utilities (get, unpack, strip etc.) get their context in the global configuration file (resources/globals.xml), like this:
 
<entry key="get">wget %target:-O :% %overwrite::-nc% %source%</entry>
 
%target%, %overwrite% and %source% are evaluated to the actual get element attribute values. 
 
3. xdtlDateCode is a variable that was evaluated to the current date in the startup.js script (there can be an arbitrary number of such variable definitions): 
 
var xdtlDateCode = java.lang.String.format("%1$tY%1$tm%1$td", new Array(new java.util.Date()));
 
To be continued...
 
 
 


XDTL Packages in Five Easy Steps: Basics

October 28, 2010 18:08 by marx

Now that XDTL is taking off there's suddenly a need for a primer. While all the basics have already been covered in a previous article, some hands-on examples would probably help. Here come a couple of scripts (the term 'package' is used from now on), one simple one and some more advanced in the following posts.

As an introduction, the mandatory first program; should be self-explanatory.

<?xml version="1.0" encoding="UTF-8"?>
<xdtl:package name="loader.test"
  xmlns:xdtl="http://xdtl.org/xdtl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://xdtl.org/xdtl http://xdtl.org/xdtl/xdtl.xsd">
  <xdtl:tasks>
    <xdtl:task name="hello">
      <xdtl:steps>
       <xdtl:log msg="hello world!"/>
      </xdtl:steps>
    </xdtl:task>
  </xdtl:tasks>
</xdtl:package>

To be continued, stay on the channel.

 



XDTL Runtime: Alive and Kicking

March 22, 2010 11:58 by marx

First, a quick recap. XDTL (http://xdtl.org) is an XML based descriptional language designed for specifying data transformations from one database/storage to another. XDTL syntax is defined in an XML Schema document. XDTL Runtime is a lightweight ETL runtime environment that efficiently and with zero overhead handles most of the typical ETL needs. XDTL Runtime can generate SQL or SAS scripts (or any other executable instructions for that matter) based on templates processed by a template engine.

Now, the 'news' part. XDTL Runtime Version 1.0 (XDTL RT 1.0) is finished and running live! The runtime is written in Java (Sun JRE 1.6 required) and uses Velocity (http://velocity.apache.org/) for template processing. So here's a short primer.

There are two individually addressable units of execution in XDTL: a package is a container containing tasks, both of them can be invoked by name. A task consists of steps denoting individual commands that are executed sequentially and cannot be addressed individually. Besides tasks, a package contains three collections: parameters, variables and connections. As in XSLT, $ denotes dereferencing: during execution, everything starting with $ is dereferenced and substituted with a value. In addition, everything between { and } is treated as a JavaScript expression and evaluated.   

There are file commands and database commands in XDTL. File commands usually launch operating system processes and are designed to handle files (move, compress, transfer etc.). File commands in XDTL RT 1.0 are:

get: downloads a file with a file transfer utility (ftp, scp, sftp)
put: uploads a file with a file transfer utility (ftp, scp, sftp)
unpack: unpacks a file with an archival utility (zip, arj, tar etc)
pack: pack a file with an archival utility (zip, arj, tar etc)
strip: cleanse a text file, eg. with a stream/regex utility (sed etc)
move: move files to another directory, usually for archival with timestamp
clear: remove everything related to a file/task from a working directory
log: adds a line to standard log output
exec: executes an arbitrary operating system command line (shell) command

Database commands control database operations:

read: transfers a text file into a database table, usually in staging area
write: transfer a database table into a text file
load: configured to load a file into a database table with a bulk load utility
dump: configured to dump a database table into a file with a bulk dump utility
transaction: wrapper for transaction processing
query: executes a database command (an SQL statement, a SPARQL query etc.)

Then come some control flow commands:

call: invokes another package or another task passing required parameters
if: adds basic conditional control

Finally, while the file and database commands constitute the backbone of XDTL, it's heart and soul are mappings and render commands:

mappings: define or retrieve metadata used to instantiate a procedure (SQL, SAS etc.)
render: merges a code template with corresponding mappings to produce executable SQL statement(s), SAS procedure(s) or any other form of executable code

Mappings and templates are fully decoupled and 'know nothing about each other': it's only during the rendering step that they meet each other to merge into an executable script or set of instructions. This enables a lot of novel opportunities: a specific set of mappings can be used with different templates for different purposes, a specific template can be reused many times with different mappings to handle different data sources, a mapping representing some business logic can be ported to another platform by rendering with another template etc.  Splendid!