Functional View on PMQL
PMQL is a query language with XML syntax used especially for interacting with PMML documents. A PMQL query is executed against a repository of prediction models stored in PMML format.
The PMQL laguage can express the following types of queries:
- scoring against stored PMML models
- build a new model based on the ones existing in the repository using a predefined set of operators. The operators can be PMML specific, like model sequencing or selection.
- comparing a sequence of models:
- functional comparison: function, domain, producer, statistics, metrics for performance comparison
- structural comparison: model identity, schema compatibility
- search in the metadata for a given characteristic: a given attribute or schema, function type, accuracy etc
- full text search in all the models
Example of a query written in PMQL
<pmql functionType=”classification” modelType=”regression”>
<field id= “1” name=”temperature” type=”xs:float” optType=”continuous”/>
<field id= “2” name=”humidity” type=”xs:int” optType=”continuous”/>
<field id= “3” name=”play” type=”xs:string” optType=”categorical”/>
<field idref=”1” value=”12.1”/>
<field idref=”2” value=”11”/>
The query execution engine should be able to:
- validate a PMQL query and construct an abstract tree
- consult the repository metadata to find the appropriate model for the given query; if none is found, throw an error. If several models fulfill the requirement, a sequence of model references is returned. The lookup can be strict (exact schema) or lax (name solving against term ontology or using fuzzy matching against the textual term descriptions)
- consult the functions metadata to find the required capabilities; if none is found, throw an error (could be that it returns a list of suggestions, if the check against the repository metadata was positive)
- in the case in which both of the above are successfull, the dispatcer applies the Xquery function to each of the models that satisfy the restrictions
- the outcomes of each function are bundled in a valid XML document describing the outcome and the model that produced the specific outcome.
The Phases of Execution of a Query Expressed in PMQL
- annotation: syntactic checking (against PMQL Schema) and semantic checking (against metadata catalog, permission checking)
- rewriting By solving the types and names mismatches and converting the query to an abstract tree.
Types can be resolved by applying the allowed (or default) type conversions as specified in the XMLSchema.
Name solving is subject to intense research in the context of achieving the Semantic Web desiderates. A promising approach is the use of ontologies to mediate between heterogeneous schemas. In this way a PMQL query can be transformed and interpreted with respect to the internal schema of a certain PMML model in the DeVisa repository.
- plan building The query plan is a pair (mode, instances) contains a clear reference to the models subject to scoring and a clear reference to an XML document containing the instances with the schema reinterpreted with respect to the aforementioned model. In the case there are several models that satisfy the query requirements the query plan is made of a sequence of execution units.
The Structure of a PMQL Document for a Classiffication Task
A PMQL document can specify the exact model to use in the classification task or let the DeVisa PMQL engine decide on the appropriate model.
There are three possibilities:
- Exact Model. the document refers a model in the DeVisa catalog; The DeVisa engine will use the specified model to execute the classification task
- Exact Schema. the document refers a data dictionary but does not refer a model; It can specify desired properties though. DeVisa engine will select the appropriate model.
- Match Schema. the document describes a data dictionary (and possibly a mining schema according to the data dictionary). In this case the DeVisa engine is responible for identifying the matching schema and the appropriate model (should theese exist).
PMQL is specified in terms of a XML Schema that can ve visualized here . Note that the specification is currently under development.