DeVisa Use Cases


Devisa Use Cases


Summary


Name Documentation
DM Consumer The DM Consumer is a WS client application that uses DeVisa DM models.
download A DM Consumer can download a DeVisa model in order to use it internally, e.g import into the DM application.
authenticate The WS client that invokes a certain method might need to be authenticated in order to execute the required function.
admin The Admin use case involves uploading/downloading the DeVisa models without any other processing.
replace

Replace occurs when a model is replaced completely by a newer one. The new model will get the same model ID in the repository.

upload

A DM Producer can upload models in the DeVisa repository via WS methods (SOAP, XMLRPC).

update

Update occurs when a model needs only a certain type of adjustment. This use case should use a specific XML update technique. The update procedure is triggered by the system itself in most of the cases.

search

The searching functions allow inspecting the properties of the PMML models in the repository. The search functions conform and therefore are limited to the information that a PMML model can incorporate according the PMML 3.2 specification.

DeVisa provides searching functions such as:

Selecting the models with desired properties

  • function type: classification, regression, clustering etc
  • type of model: tree, SVM, cluster etc
  • producer e.g all the models belonging to a certain producer application
  • degree of freshness: e.g models newer than a certain date
  • performance measures
  • fields statistics

Selecting the models that conform to a certain schema

  • Exact Schema. the document refers a data dictionary but does not refer a model; It can specify desired properties though. DeVisa engine will select the appropriate model.
  • Match Schema. the document describes a data dictionary (and possibly a mining schema according to the data dictionary). In this case the DeVisa engine is responsible for identifying the matching schema and the appropriate model (should these exist).

This type of search is used especially as an extension to the scoring use care

Full text search in the model repository

This type of search is useful when the client is looking for keywords in the description of the model, in the field names or field description, in the function type etc

Future work: create a search mechanism that allows more complex predicates on the search criterions.

scoring

The scoring use case means applying the models on the new instances.

The scoring occurs via web service methods.

Depending on the models, there are several types of DeVisa scoring procedures:

  • Classification Scoring.
  • Cluster Scoring.
  • Association rules scoring.
classification scoring The scoring method receives as input a set of instances and one or more classification models and classifies the instances with respect to the models.
clustering scoring The scoring method receives as input a set of instances and one or more clustering models and assigns the instances to the most appropriate cluster in each of the models.
association rules scoring The scoring method receives as input a set of items (instances) and one or more association rule models. It determines all rules of each of the input models whose antecedent itemset is a subset of a the input itemset and returns the consequents of these rules as the inferred itemsets. An extension of this procedure computes all rules whose antecedent and consequent itemsets are included in the input itemset. This version is useful to determine which itemsets support which rules.
compose

Model Composition allows the combination of simple models into a single composite PMML model.

PMML version 3.2 supports the combination of decision trees and simple regression models. More general variants would be possible and may be defined in future versions of PMML.

In PMML Model composition uses three syntactical concepts

  1. The essential elements of a predictive model are captured in elements that can be included in other models.
  2. Embedded models can define new fields, similar to derived fields.
  3. The leaf nodes in a decision tree can contain another predictive model.

In DeVisa simple models can be combined into more complex ones forming new valid PMML documents.

A client application can specify the models subject to composition and the combination method.

DeVisa identifies the specified models. If the models do not exist in the repository then the process stops.

The found models are checked for compatibility. If they are not compatible the process stops.

The new valid model is returned to the user/stored in the repository.

sequencing

Model sequencing is the process through which two or more models are combined into a sequence where the results of one model are used as input in another model.

Model sequencing is supported partially by the PMML specification.

Examples of sequencing:

  • The missing values in a regression model can be replaced by a set of rules (or decision tree)
  • Several classification models with the same target value can be merged via a voting scheme, i.e the final classification result can be computed as a an average of the results of the initial classifiers.The average can be computed by a regression model.
  • Prediction results may have to be combined with a cost or profit matrix before a decision can be derived. A mailing campaign model may use tree classification to determine response probabilities per customer and channel. The cost matrix can be appended as a regression model that applies cost weighting factors to different channels, e.g., high cost for phone and low cost for email. The final decision is then based on the outcome of the regression model.
selection

Model selection in PMML allows for combining multiple 'embedded models', aka model expressions, into the decision logic that selects one of the models depending on the current input values.

Examples of selection

  1. A common method for optimizing prediction models is the combination of segmentation and regression. Data are grouped into segments and for each segment there may be different regression equations. If the segmentation can be expressed by decision rules then this kind of segment based regression can be implemented by a decision tree where any leaf node in the tree can contain an embedded regression model.
compare

A client application (DM Consumer) wants to compare two models.

The client needs to specify:

  • The two models to be compared, through an exact reference (model id)
  • The comparison type, which can be syntactic or semantic.

Syntactic Comparison. Two PMML models are compared through a XML differencing approach. To be researched if the eXist's XML diff extension module can be used here.

Semantic Comparison. Two models can be compared from the following points of view:

  • Schema compatibility. This involves a DataDictionary compatibility check and a MiningSchema compatibility check.
  • Function comparison. It is checked if the models fulfil the same prediction task, the algorithms used to fulfil the task etc.
  • Performance measures (to be researched)

The Semantic comparison is useful for the cases in which the client envisions a model composition and wants to pre-check the compatibility of the models

statistics

An application can invoke this service to obtain statistics on the models.

Example of statistics:

  1. frequencies per domain, producer, schema, or function type etc.
  2. ?? based on performance measures (to be researched):
  • sensitivity, specificity, accuracy (from what I've seen they are not supported by PMML)
  • can we connect performance/assesment with model verification  in PMML?

At the beginning DeVisa should be able to provide only a full report (PMQL) for the point (1).

DM Producer The DM Producer is a WS client application -typically a DM application, like Weka- who uploads models in the DeVisa Repository.
DeVisa

Details


DM Consumer
Name Value
Visibility public
Abstract false
Leaf true
Root false
Documentation The DM Consumer is a WS client application that uses DeVisa DM models.
Business Model false

Relationships

Unamed Association
To
Name Value
End Model Element download
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

Unamed Association
To
Name Value
End Model Element search
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

Unamed Association
To
Name Value
End Model Element classification scoring
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

Unamed Association
To
Name Value
End Model Element clustering scoring
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

Unamed Association
To
Name Value
End Model Element sequencing
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

Unamed Association
To
Name Value
End Model Element selection
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

Unamed Association
To
Name Value
End Model Element compare
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

Unamed Association
From
Name Value
End Model Element statistics
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

Unamed Association
From
Name Value
End Model Element association rules scoring
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

download

Name Value
Abstract false
Leaf true
Root false
Documentation A DM Consumer can download a DeVisa model in order to use it internally, e.g import into the DM application.
Rank High
Business Model false

Extension Points

laxly specified model

Relationships

Unamed Include
To authenticate
Visibility Unspecified
Stereotypes Include

Unamed Extend
To search
Visibility Unspecified
Stereotypes Extend

Unamed Generalization
From admin
Substitutable false
Visibility Unspecified

Unamed Association
From
Name Value
End Model Element DM Consumer
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

authenticate

Name Value
Abstract false
Leaf true
Root false
Documentation The WS client that invokes a certain method might need to be authenticated in order to execute the required function.
Rank Medium
Business Model false

Relationships

Unamed Include
From scoring
Visibility Unspecified
Stereotypes Include

Unamed Include
From compare
Visibility Unspecified
Stereotypes Include

Unamed Include
From compose
Visibility Unspecified
Stereotypes Include

Unamed Include
From upload
Visibility Unspecified
Stereotypes Include

Unamed Include
From download
Visibility Unspecified
Stereotypes Include

Unamed Include
From search
Visibility Unspecified
Stereotypes Include

Unamed Include
From statistics
Visibility Unspecified
Stereotypes Include

admin

Name Value
Abstract true
Leaf false
Root false
Documentation The Admin use case involves uploading/downloading the DeVisa models without any other processing.
Rank Unspecified
Business Model false

Relationships

Unamed Generalization
To replace
Substitutable false
Visibility Unspecified

Unamed Generalization
To upload
Substitutable false
Visibility Unspecified

Unamed Generalization
To download
Substitutable false
Visibility Unspecified

Unamed Generalization
To update
Substitutable false
Visibility Unspecified

Use Case Descriptions

Admin
Super Use Case
Author dianagorea
Date Jan 17, 2008 2:19:00 PM
Brief Description The DM client uploads/downloads the DeVisa models without any other processing.
Preconditions
Post-conditions
Flow of Events
Actor Input System Response
1

replace

Name Value
Abstract false
Leaf true
Root false
Documentation

Replace occurs when a model is replaced completely by a newer one. The new model will get the same model ID in the repository.

Rank Medium
Business Model false

Relationships

Unamed Extend
From upload
Visibility Unspecified
Stereotypes Extend

Unamed Generalization
From admin
Substitutable false
Visibility Unspecified

upload

Name Value
Abstract false
Leaf false
Root false
Documentation

A DM Producer can upload models in the DeVisa repository via WS methods (SOAP, XMLRPC).

Rank High
Business Model false

Extension Points

existing model
Documentation If the model that the DM producer has uploaded already exists in the DeVisa repository then and the model is newer than the existing one then it is replaced.

Relationships

Unamed Extend
To replace
Visibility Unspecified
Stereotypes Extend

Unamed Include
To authenticate
Visibility Unspecified
Stereotypes Include

Unamed Generalization
From admin
Substitutable false
Visibility Unspecified

Unamed Association
From
Name Value
End Model Element DM Producer
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

Use Case Descriptions

Upload
Super Use Case admin
Author dianagorea
Date Jan 17, 2008 3:04:44 PM
Brief Description
Preconditions The DM producer has a model expressed in PMML
Post-conditions The model is stored in DeVisa PMML repository.
Flow of Events
Actor Input System Response
1 upload request
2 authentication process
3

checks PMML valid

update

Name Value
Abstract false
Leaf false
Root false
Documentation

Update occurs when a model needs only a certain type of adjustment. This use case should use a specific XML update technique. The update procedure is triggered by the system itself in most of the cases.

Rank Unspecified
Business Model false

Relationships

Unamed Generalization
From admin
Substitutable false
Visibility Unspecified

search

Name Value
Abstract true
Leaf false
Root false
Documentation

The searching functions allow inspecting the properties of the PMML models in the repository. The search functions conform and therefore are limited to the information that a PMML model can incorporate according the PMML 3.2 specification.

DeVisa provides searching functions such as:

Selecting the models with desired properties

  • function type: classification, regression, clustering etc
  • type of model: tree, SVM, cluster etc
  • producer e.g all the models belonging to a certain producer application
  • degree of freshness: e.g models newer than a certain date
  • performance measures
  • fields statistics

Selecting the models that conform to a certain schema

  • Exact Schema. the document refers a data dictionary but does not refer a model; It can specify desired properties though. DeVisa engine will select the appropriate model.
  • Match Schema. the document describes a data dictionary (and possibly a mining schema according to the data dictionary). In this case the DeVisa engine is responsible for identifying the matching schema and the appropriate model (should these exist).

This type of search is used especially as an extension to the scoring use care

Full text search in the model repository

This type of search is useful when the client is looking for keywords in the description of the model, in the field names or field description, in the function type etc

Future work: create a search mechanism that allows more complex predicates on the search criterions.

Rank High
Business Model true

Relationships

Unamed Include
To authenticate
Visibility Unspecified
Stereotypes Include

Unamed Extend
From scoring
Visibility Unspecified
Stereotypes Extend
Condition The client has specified the model through "Match" or "" case.

Unamed Extend
From compose
Visibility Unspecified
Stereotypes Extend

Unamed Extend
From download
Visibility Unspecified
Stereotypes Extend

Unamed Association
From
Name Value
End Model Element DM Consumer
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

References

Sequence Diagram/search
Description A complete search sequence diagram (also interaction with the client)
Type Diagram

scoring

Name Value
Abstract true
Leaf false
Root true
Documentation

The scoring use case means applying the models on the new instances.

The scoring occurs via web service methods.

Depending on the models, there are several types of DeVisa scoring procedures:

  • Classification Scoring.
  • Cluster Scoring.
  • Association rules scoring.
Rank High
Business Model true

Extension Points

laxly specified model
Documentation

There are three ways in which a client application -a DM consumer that invokes a scoring method- requests a model in DeVisa.

  • Exact Model. the document refers a model in the DeVisa catalog; The DeVisa engine will use the specified model to execute the scoring task
  • Exact Schema. the document refers a data dictionary but does not refer a model; It can specify desired properties though. DeVisa engine will select the appropriate model.
  • Match Schema. the document describes a data dictionary (and possibly a mining schema according to the data dictionary). In this case the DeVisa engine is responsible for identifying the matching schema and the appropriate model (should these exist).

Except for the first case, the model is laxly specified.

Relationships

Unamed Include
To authenticate
Visibility Unspecified
Stereotypes Include

Unamed Extend
To search
Visibility Unspecified
Stereotypes Extend
Condition The client has specified the model through "Match" or "" case.

Unamed Generalization
To classification scoring
Substitutable false
Visibility Unspecified

Unamed Generalization
To clustering scoring
Substitutable false
Visibility Unspecified

Unamed Generalization
To association rules scoring
Substitutable false
Visibility Unspecified

References

Sequence Diagram/scoring
Type Diagram

Communication Diagram/scoring - Communications
Type Diagram

classification scoring

Name Value
Abstract false
Leaf true
Root false
Documentation The scoring method receives as input a set of instances and one or more classification models and classifies the instances with respect to the models.
Rank High
Business Model true

Relationships

Unamed Generalization
From scoring
Substitutable false
Visibility Unspecified

Unamed Association
From
Name Value
End Model Element DM Consumer
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

References

Communication Diagram/scoring - Communications
Type Diagram

clustering scoring

Name Value
Abstract false
Leaf true
Root false
Documentation The scoring method receives as input a set of instances and one or more clustering models and assigns the instances to the most appropriate cluster in each of the models.
Rank Unspecified
Business Model true

Relationships

Unamed Generalization
From scoring
Substitutable false
Visibility Unspecified

Unamed Association
From
Name Value
End Model Element DM Consumer
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

References

Sequence Diagram/scoring
Type Diagram

association rules scoring

Name Value
Abstract false
Leaf true
Root false
Documentation The scoring method receives as input a set of items (instances) and one or more association rule models. It determines all rules of each of the input models whose antecedent itemset is a subset of a the input itemset and returns the consequents of these rules as the inferred itemsets. An extension of this procedure computes all rules whose antecedent and consequent itemsets are included in the input itemset. This version is useful to determine which itemsets support which rules.
Rank High
Business Model true

Relationships

Unamed Generalization
From scoring
Substitutable false
Visibility Unspecified

Unamed Association
To
Name Value
End Model Element DM Consumer
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

compose

Name Value
Abstract true
Leaf false
Root true
Documentation

Model Composition allows the combination of simple models into a single composite PMML model.

PMML version 3.2 supports the combination of decision trees and simple regression models. More general variants would be possible and may be defined in future versions of PMML.

In PMML Model composition uses three syntactical concepts

  1. The essential elements of a predictive model are captured in elements that can be included in other models.
  2. Embedded models can define new fields, similar to derived fields.
  3. The leaf nodes in a decision tree can contain another predictive model.

In DeVisa simple models can be combined into more complex ones forming new valid PMML documents.

A client application can specify the models subject to composition and the combination method.

DeVisa identifies the specified models. If the models do not exist in the repository then the process stops.

The found models are checked for compatibility. If they are not compatible the process stops.

The new valid model is returned to the user/stored in the repository.

Rank High
Business Model true

Extension Points

laxly specified models

Relationships

Unamed Include
To authenticate
Visibility Unspecified
Stereotypes Include

Unamed Extend
To search
Visibility Unspecified
Stereotypes Extend

Unamed Generalization
To sequencing
Substitutable false
Visibility Unspecified

Unamed Generalization
To selection
Substitutable false
Visibility Unspecified

sequencing

Name Value
Abstract false
Leaf true
Root false
Documentation

Model sequencing is the process through which two or more models are combined into a sequence where the results of one model are used as input in another model.

Model sequencing is supported partially by the PMML specification.

Examples of sequencing:

  • The missing values in a regression model can be replaced by a set of rules (or decision tree)
  • Several classification models with the same target value can be merged via a voting scheme, i.e the final classification result can be computed as a an average of the results of the initial classifiers.The average can be computed by a regression model.
  • Prediction results may have to be combined with a cost or profit matrix before a decision can be derived. A mailing campaign model may use tree classification to determine response probabilities per customer and channel. The cost matrix can be appended as a regression model that applies cost weighting factors to different channels, e.g., high cost for phone and low cost for email. The final decision is then based on the outcome of the regression model.
Rank Unspecified
Business Model true

Relationships

Unamed Generalization
From compose
Substitutable false
Visibility Unspecified

Unamed Association
From
Name Value
End Model Element DM Consumer
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

selection

Name Value
Abstract false
Leaf true
Root false
Documentation

Model selection in PMML allows for combining multiple 'embedded models', aka model expressions, into the decision logic that selects one of the models depending on the current input values.

Examples of selection

  1. A common method for optimizing prediction models is the combination of segmentation and regression. Data are grouped into segments and for each segment there may be different regression equations. If the segmentation can be expressed by decision rules then this kind of segment based regression can be implemented by a decision tree where any leaf node in the tree can contain an embedded regression model.
Rank Unspecified
Business Model true

Relationships

Unamed Generalization
From compose
Substitutable false
Visibility Unspecified

Unamed Association
From
Name Value
End Model Element DM Consumer
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

compare

Name Value
Abstract true
Leaf false
Root true
Documentation

A client application (DM Consumer) wants to compare two models.

The client needs to specify:

  • The two models to be compared, through an exact reference (model id)
  • The comparison type, which can be syntactic or semantic.

Syntactic Comparison. Two PMML models are compared through a XML differencing approach. To be researched if the eXist's XML diff extension module can be used here.

Semantic Comparison. Two models can be compared from the following points of view:

  • Schema compatibility. This involves a DataDictionary compatibility check and a MiningSchema compatibility check.
  • Function comparison. It is checked if the models fulfil the same prediction task, the algorithms used to fulfil the task etc.
  • Performance measures (to be researched)

The Semantic comparison is useful for the cases in which the client envisions a model composition and wants to pre-check the compatibility of the models

Rank Medium
Business Model true

Relationships

Unamed Include
To authenticate
Visibility Unspecified
Stereotypes Include

Unamed Association
From
Name Value
End Model Element DM Consumer
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

statistics

Name Value
Abstract true
Leaf false
Root true
Documentation

An application can invoke this service to obtain statistics on the models.

Example of statistics:

  1. frequencies per domain, producer, schema, or function type etc.
  2. ?? based on performance measures (to be researched):
  • sensitivity, specificity, accuracy (from what I've seen they are not supported by PMML)
  • can we connect performance/assesment with model verification  in PMML?

At the beginning DeVisa should be able to provide only a full report (PMQL) for the point (1).

Rank Medium
Business Model true

Relationships

Unamed Include
To authenticate
Visibility Unspecified
Stereotypes Include

Unamed Association
To
Name Value
End Model Element DM Consumer
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

DM Producer

Name Value
Visibility public
Abstract false
Leaf true
Root false
Documentation The DM Producer is a WS client application -typically a DM application, like Weka- who uploads models in the DeVisa Repository.
Business Model false

Relationships

Unamed Association
To
Name Value
End Model Element upload
Provide Property Getter Method false
Provide Property Setter Method false
Multiplicity Unspecified
Visibility private
Aggregation Kind None
Navigable true
Abstract false
Leaf false
Visibility Unspecified
Derived false

DeVisa

Name Value
Abstract false
Leaf false
Root false

Children

Name Documentation
scoring

The scoring use case means applying the models on the new instances.

The scoring occurs via web service methods.

Depending on the models, there are several types of DeVisa scoring procedures:

  • Classification Scoring.
  • Cluster Scoring.
  • Association rules scoring.
compare

A client application (DM Consumer) wants to compare two models.

The client needs to specify:

  • The two models to be compared, through an exact reference (model id)
  • The comparison type, which can be syntactic or semantic.

Syntactic Comparison. Two PMML models are compared through a XML differencing approach. To be researched if the eXist's XML diff extension module can be used here.

Semantic Comparison. Two models can be compared from the following points of view:

  • Schema compatibility. This involves a DataDictionary compatibility check and a MiningSchema compatibility check.
  • Function comparison. It is checked if the models fulfil the same prediction task, the algorithms used to fulfil the task etc.
  • Performance measures (to be researched)

The Semantic comparison is useful for the cases in which the client envisions a model composition and wants to pre-check the compatibility of the models

compose

Model Composition allows the combination of simple models into a single composite PMML model.

PMML version 3.2 supports the combination of decision trees and simple regression models. More general variants would be possible and may be defined in future versions of PMML.

In PMML Model composition uses three syntactical concepts

  1. The essential elements of a predictive model are captured in elements that can be included in other models.
  2. Embedded models can define new fields, similar to derived fields.
  3. The leaf nodes in a decision tree can contain another predictive model.

In DeVisa simple models can be combined into more complex ones forming new valid PMML documents.

A client application can specify the models subject to composition and the combination method.

DeVisa identifies the specified models. If the models do not exist in the repository then the process stops.

The found models are checked for compatibility. If they are not compatible the process stops.

The new valid model is returned to the user/stored in the repository.

authenticate The WS client that invokes a certain method might need to be authenticated in order to execute the required function.
admin The Admin use case involves uploading/downloading the DeVisa models without any other processing.
search

The searching functions allow inspecting the properties of the PMML models in the repository. The search functions conform and therefore are limited to the information that a PMML model can incorporate according the PMML 3.2 specification.

DeVisa provides searching functions such as:

Selecting the models with desired properties

  • function type: classification, regression, clustering etc
  • type of model: tree, SVM, cluster etc
  • producer e.g all the models belonging to a certain producer application
  • degree of freshness: e.g models newer than a certain date
  • performance measures
  • fields statistics

Selecting the models that conform to a certain schema

  • Exact Schema. the document refers a data dictionary but does not refer a model; It can specify desired properties though. DeVisa engine will select the appropriate model.
  • Match Schema. the document describes a data dictionary (and possibly a mining schema according to the data dictionary). In this case the DeVisa engine is responsible for identifying the matching schema and the appropriate model (should these exist).

This type of search is used especially as an extension to the scoring use care

Full text search in the model repository

This type of search is useful when the client is looking for keywords in the description of the model, in the field names or field description, in the function type etc

Future work: create a search mechanism that allows more complex predicates on the search criterions.

classification scoring The scoring method receives as input a set of instances and one or more classification models and classifies the instances with respect to the models.
clustering scoring The scoring method receives as input a set of instances and one or more clustering models and assigns the instances to the most appropriate cluster in each of the models.
sequencing

Model sequencing is the process through which two or more models are combined into a sequence where the results of one model are used as input in another model.

Model sequencing is supported partially by the PMML specification.

Examples of sequencing:

  • The missing values in a regression model can be replaced by a set of rules (or decision tree)
  • Several classification models with the same target value can be merged via a voting scheme, i.e the final classification result can be computed as a an average of the results of the initial classifiers.The average can be computed by a regression model.
  • Prediction results may have to be combined with a cost or profit matrix before a decision can be derived. A mailing campaign model may use tree classification to determine response probabilities per customer and channel. The cost matrix can be appended as a regression model that applies cost weighting factors to different channels, e.g., high cost for phone and low cost for email. The final decision is then based on the outcome of the regression model.
selection

Model selection in PMML allows for combining multiple 'embedded models', aka model expressions, into the decision logic that selects one of the models depending on the current input values.

Examples of selection

  1. A common method for optimizing prediction models is the combination of segmentation and regression. Data are grouped into segments and for each segment there may be different regression equations. If the segmentation can be expressed by decision rules then this kind of segment based regression can be implemented by a decision tree where any leaf node in the tree can contain an embedded regression model.
download A DM Consumer can download a DeVisa model in order to use it internally, e.g import into the DM application.
replace

Replace occurs when a model is replaced completely by a newer one. The new model will get the same model ID in the repository.

upload

A DM Producer can upload models in the DeVisa repository via WS methods (SOAP, XMLRPC).

statistics

An application can invoke this service to obtain statistics on the models.

Example of statistics:

  1. frequencies per domain, producer, schema, or function type etc.
  2. ?? based on performance measures (to be researched):
  • sensitivity, specificity, accuracy (from what I've seen they are not supported by PMML)
  • can we connect performance/assesment with model verification  in PMML?

At the beginning DeVisa should be able to provide only a full report (PMQL) for the point (1).

update

Update occurs when a model needs only a certain type of adjustment. This use case should use a specific XML update technique. The update procedure is triggered by the system itself in most of the cases.

association rules scoring The scoring method receives as input a set of items (instances) and one or more association rule models. It determines all rules of each of the input models whose antecedent itemset is a subset of a the input itemset and returns the consequents of these rules as the inferred itemsets. An extension of this procedure computes all rules whose antecedent and consequent itemsets are included in the input itemset. This version is useful to determine which itemsets support which rules.