|
Devisa Use Cases
Summary
DM Consumer
|
The DM Consumer is a WS client application that uses DeVisa DM models.
|
download
|
A DM Consumer can download a DeVisa model in order to use it internally,
e.g import into the DM application.
|
authenticate
|
The WS client that invokes a certain method might need to be authenticated
in order to execute the required function.
|
admin
|
The Admin use case involves uploading/downloading the DeVisa models
without any other processing.
|
replace
|
Replace occurs when a model is replaced completely by a newer one. The
new model will get the same model ID in the repository.
|
upload
|
A DM Producer can upload models in the DeVisa repository via WS methods
(SOAP, XMLRPC).
|
update
|
Update occurs when a model needs only a certain type of adjustment. This
use case should use a specific XML update technique. The update
procedure is triggered by the system itself in most of the cases.
|
search
|
The searching functions allow inspecting the properties of the PMML
models in the repository. The search functions conform and therefore are
limited to the information that a PMML model can incorporate according
the PMML 3.2 specification.
DeVisa provides searching functions such as:
Selecting the models with desired properties
-
function type: classification, regression, clustering etc
-
type of model: tree, SVM, cluster etc
-
producer e.g all the models belonging to a certain producer application
-
degree of freshness: e.g models newer than a certain date
-
performance measures
-
fields statistics
Selecting the models that conform to a certain schema
-
Exact Schema. the document refers a data dictionary but does not refer
a model; It can specify desired properties though. DeVisa engine will
select the appropriate model.
-
Match Schema. the document describes a data dictionary (and possibly a
mining schema according to the data dictionary). In this case the
DeVisa engine is responsible for identifying the matching schema and
the appropriate model (should these exist).
This type of search is used especially as an extension to the scoring
use care
Full text search in the model repository
This type of search is useful when the client is looking for keywords in
the description of the model, in the field names or field description,
in the function type etc
Future work: create a search mechanism that allows more complex
predicates on the search criterions.
|
scoring
|
The scoring use case means applying the models on the new instances.
The scoring occurs via web service methods.
Depending on the models, there are several types of DeVisa scoring
procedures:
-
Classification Scoring.
-
Cluster Scoring.
-
Association rules scoring.
|
classification scoring
|
The scoring method receives as input a set of instances and one or more
classification models and classifies the instances with respect to the
models.
|
clustering scoring
|
The scoring method receives as input a set of instances and one or more
clustering models and assigns the instances to the most appropriate
cluster in each of the models.
|
association rules scoring
|
The scoring method receives as input a set of items (instances) and one or
more association rule models. It determines all rules of each of the input
models whose antecedent itemset is a subset of a the input itemset and
returns the consequents of these rules as the inferred itemsets. An
extension of this procedure computes all rules whose antecedent and
consequent itemsets are included in the input itemset. This version is
useful to determine which itemsets support which rules.
|
compose
|
Model Composition allows the combination of simple models into a single
composite PMML model.
PMML version 3.2 supports the combination of decision trees and simple
regression models. More general variants would be possible and may be
defined in future versions of PMML.
In PMML Model composition uses three syntactical concepts
-
The essential elements of a predictive model are captured in elements
that can be included in other models.
-
Embedded models can define new fields, similar to derived fields.
-
The leaf nodes in a decision tree can contain another predictive model.
In DeVisa simple models can be combined into more complex ones forming
new valid PMML documents.
A client application can specify the models subject to composition and
the combination method.
DeVisa identifies the specified models. If the models do not exist in
the repository then the process stops.
The found models are checked for compatibility. If they are not
compatible the process stops.
The new valid model is returned to the user/stored in the repository.
|
sequencing
|
Model sequencing is the process through which two or more models are
combined into a sequence where the results of one model are used as
input in another model.
Model sequencing is supported partially by the PMML specification.
Examples of sequencing:
-
The missing values in a regression model can be replaced by a set of
rules (or decision tree)
-
Several classification models with the same target value can be merged
via a voting scheme, i.e the final classification result can be
computed as a an average of the results of the initial classifiers.The
average can be computed by a regression model.
-
Prediction results may have to be combined with a cost or profit
matrix before a decision can be derived. A mailing campaign model may
use tree classification to determine response probabilities per
customer and channel. The cost matrix can be appended as a regression
model that applies cost weighting factors to different channels, e.g.,
high cost for phone and low cost for email. The final decision is then
based on the outcome of the regression model.
|
selection
|
Model selection in PMML allows for combining multiple 'embedded models',
aka model expressions, into the decision logic that selects one of the
models depending on the current input values.
Examples of selection
-
A common method for optimizing prediction models is the combination of
segmentation and regression. Data are grouped into segments and for
each segment there may be different regression equations. If the
segmentation can be expressed by decision rules then this kind of
segment based regression can be implemented by a decision tree where
any leaf node in the tree can contain an embedded regression model.
|
compare
|
A client application (DM Consumer) wants to compare two models.
The client needs to specify:
-
The two models to be compared, through an exact reference (model id)
-
The comparison type, which can be syntactic or semantic.
Syntactic Comparison. Two PMML models are compared through a XML
differencing approach. To be researched if the eXist's XML diff
extension module can be used here.
Semantic Comparison. Two models can be compared from the
following points of view:
-
Schema compatibility. This involves a DataDictionary compatibility
check and a MiningSchema compatibility check.
-
Function comparison. It is checked if the models fulfil the same
prediction task, the algorithms used to fulfil the task etc.
-
Performance measures (to be researched)
The Semantic comparison is useful for the cases in which the client
envisions a model composition and wants to pre-check the compatibility
of the models
|
statistics
|
An application can invoke this service to obtain statistics on the
models.
Example of statistics:
-
frequencies per domain, producer, schema, or function type etc.
-
?? based on performance measures (to be researched):
-
sensitivity, specificity, accuracy (from what I've seen they are not
supported by PMML)
-
can we connect performance/assesment with model verification in PMML?
At the beginning DeVisa should be able to provide only a full report
(PMQL) for the point (1).
|
DM Producer
|
The DM Producer is a WS client application -typically a DM application,
like Weka- who uploads models in the DeVisa Repository.
|
DeVisa
|
|
Details
DM Consumer
Visibility
|
public
|
Abstract
|
false
|
Leaf
|
true
|
Root
|
false
|
Documentation
|
The DM Consumer is a WS client application that uses DeVisa DM models.
|
Business Model
|
false
|
Relationships
To
|
End Model Element
|
download
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
To
|
End Model Element
|
search
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
To
|
End Model Element
|
classification scoring
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
To
|
End Model Element
|
clustering scoring
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
To
|
End Model Element
|
sequencing
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
To
|
End Model Element
|
selection
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
To
|
End Model Element
|
compare
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
From
|
End Model Element
|
statistics
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
From
|
End Model Element
|
association rules scoring
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
download
Abstract
|
false
|
Leaf
|
true
|
Root
|
false
|
Documentation
|
A DM Consumer can download a DeVisa model in order to use it internally,
e.g import into the DM application.
|
Rank
|
High
|
Business Model
|
false
|
Extension Points
Relationships
From
|
admin
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
From
|
End Model Element
|
DM Consumer
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
authenticate
Abstract
|
false
|
Leaf
|
true
|
Root
|
false
|
Documentation
|
The WS client that invokes a certain method might need to be authenticated
in order to execute the required function.
|
Rank
|
Medium
|
Business Model
|
false
|
Relationships
admin
Abstract
|
true
|
Leaf
|
false
|
Root
|
false
|
Documentation
|
The Admin use case involves uploading/downloading the DeVisa models
without any other processing.
|
Rank
|
Unspecified
|
Business Model
|
false
|
Relationships
To
|
replace
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
To
|
upload
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
To
|
download
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
To
|
update
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
Use Case Descriptions
Super Use Case
|
|
Author
|
dianagorea
|
Date
|
Jan 17, 2008 2:19:00 PM
|
Brief Description
|
The DM client uploads/downloads the DeVisa models without any other processing.
|
Preconditions
|
|
Post-conditions
|
|
Flow of Events
|
|
replace
Abstract
|
false
|
Leaf
|
true
|
Root
|
false
|
Documentation
|
Replace occurs when a model is replaced completely by a newer one. The
new model will get the same model ID in the repository.
|
Rank
|
Medium
|
Business Model
|
false
|
Relationships
From
|
admin
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
upload
Abstract
|
false
|
Leaf
|
false
|
Root
|
false
|
Documentation
|
A DM Producer can upload models in the DeVisa repository via WS methods
(SOAP, XMLRPC).
|
Rank
|
High
|
Business Model
|
false
|
Extension Points
Documentation
|
If the model that the DM producer has uploaded already exists in the
DeVisa repository then and the model is newer than the existing one then
it is replaced.
|
Relationships
From
|
admin
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
From
|
End Model Element
|
DM Producer
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
Use Case Descriptions
Super Use Case
|
admin
|
Author
|
dianagorea
|
Date
|
Jan 17, 2008 3:04:44 PM
|
Brief Description
|
|
Preconditions
|
The DM producer has a model expressed in PMML
|
Post-conditions
|
The model is stored in DeVisa PMML repository.
|
Flow of Events
|
1
|
upload request
|
|
2
|
|
authentication process
|
3
|
|
checks PMML valid
|
|
update
Abstract
|
false
|
Leaf
|
false
|
Root
|
false
|
Documentation
|
Update occurs when a model needs only a certain type of adjustment. This
use case should use a specific XML update technique. The update
procedure is triggered by the system itself in most of the cases.
|
Rank
|
Unspecified
|
Business Model
|
false
|
Relationships
From
|
admin
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
search
Abstract
|
true
|
Leaf
|
false
|
Root
|
false
|
Documentation
|
The searching functions allow inspecting the properties of the PMML
models in the repository. The search functions conform and therefore are
limited to the information that a PMML model can incorporate according
the PMML 3.2 specification.
DeVisa provides searching functions such as:
Selecting the models with desired properties
-
function type: classification, regression, clustering etc
-
type of model: tree, SVM, cluster etc
-
producer e.g all the models belonging to a certain producer application
-
degree of freshness: e.g models newer than a certain date
-
performance measures
-
fields statistics
Selecting the models that conform to a certain schema
-
Exact Schema. the document refers a data dictionary but does not refer
a model; It can specify desired properties though. DeVisa engine will
select the appropriate model.
-
Match Schema. the document describes a data dictionary (and possibly a
mining schema according to the data dictionary). In this case the
DeVisa engine is responsible for identifying the matching schema and
the appropriate model (should these exist).
This type of search is used especially as an extension to the scoring
use care
Full text search in the model repository
This type of search is useful when the client is looking for keywords in
the description of the model, in the field names or field description,
in the function type etc
Future work: create a search mechanism that allows more complex
predicates on the search criterions.
|
Rank
|
High
|
Business Model
|
true
|
Relationships
From
|
scoring
|
Visibility
|
Unspecified
|
Stereotypes
|
Extend
|
Condition
|
The client has specified the model through "Match" or "" case.
|
From
|
End Model Element
|
DM Consumer
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
References
Description
|
A complete search sequence diagram (also interaction with the client)
|
Type
|
Diagram
|
scoring
Abstract
|
true
|
Leaf
|
false
|
Root
|
true
|
Documentation
|
The scoring use case means applying the models on the new instances.
The scoring occurs via web service methods.
Depending on the models, there are several types of DeVisa scoring
procedures:
-
Classification Scoring.
-
Cluster Scoring.
-
Association rules scoring.
|
Rank
|
High
|
Business Model
|
true
|
Extension Points
Documentation
|
There are three ways in which a client application -a DM consumer that
invokes a scoring method- requests a model in DeVisa.
-
Exact Model. the document refers a model in the DeVisa catalog; The
DeVisa engine will use the specified model to execute the scoring task
-
Exact Schema. the document refers a data dictionary but does not refer
a model; It can specify desired properties though. DeVisa engine will
select the appropriate model.
-
Match Schema. the document describes a data dictionary (and possibly a
mining schema according to the data dictionary). In this case the
DeVisa engine is responsible for identifying the matching schema and
the appropriate model (should these exist).
Except for the first case, the model is laxly specified.
|
Relationships
To
|
search
|
Visibility
|
Unspecified
|
Stereotypes
|
Extend
|
Condition
|
The client has specified the model through "Match" or "" case.
|
References
classification scoring
Abstract
|
false
|
Leaf
|
true
|
Root
|
false
|
Documentation
|
The scoring method receives as input a set of instances and one or more
classification models and classifies the instances with respect to the
models.
|
Rank
|
High
|
Business Model
|
true
|
Relationships
From
|
scoring
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
From
|
End Model Element
|
DM Consumer
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
References
clustering scoring
Abstract
|
false
|
Leaf
|
true
|
Root
|
false
|
Documentation
|
The scoring method receives as input a set of instances and one or more
clustering models and assigns the instances to the most appropriate
cluster in each of the models.
|
Rank
|
Unspecified
|
Business Model
|
true
|
Relationships
From
|
scoring
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
From
|
End Model Element
|
DM Consumer
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
References
association rules scoring
Abstract
|
false
|
Leaf
|
true
|
Root
|
false
|
Documentation
|
The scoring method receives as input a set of items (instances) and one or
more association rule models. It determines all rules of each of the input
models whose antecedent itemset is a subset of a the input itemset and
returns the consequents of these rules as the inferred itemsets. An
extension of this procedure computes all rules whose antecedent and
consequent itemsets are included in the input itemset. This version is
useful to determine which itemsets support which rules.
|
Rank
|
High
|
Business Model
|
true
|
Relationships
From
|
scoring
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
To
|
End Model Element
|
DM Consumer
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
compose
Abstract
|
true
|
Leaf
|
false
|
Root
|
true
|
Documentation
|
Model Composition allows the combination of simple models into a single
composite PMML model.
PMML version 3.2 supports the combination of decision trees and simple
regression models. More general variants would be possible and may be
defined in future versions of PMML.
In PMML Model composition uses three syntactical concepts
-
The essential elements of a predictive model are captured in elements
that can be included in other models.
-
Embedded models can define new fields, similar to derived fields.
-
The leaf nodes in a decision tree can contain another predictive model.
In DeVisa simple models can be combined into more complex ones forming
new valid PMML documents.
A client application can specify the models subject to composition and
the combination method.
DeVisa identifies the specified models. If the models do not exist in
the repository then the process stops.
The found models are checked for compatibility. If they are not
compatible the process stops.
The new valid model is returned to the user/stored in the repository.
|
Rank
|
High
|
Business Model
|
true
|
Extension Points
Relationships
To
|
sequencing
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
To
|
selection
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
sequencing
Abstract
|
false
|
Leaf
|
true
|
Root
|
false
|
Documentation
|
Model sequencing is the process through which two or more models are
combined into a sequence where the results of one model are used as
input in another model.
Model sequencing is supported partially by the PMML specification.
Examples of sequencing:
-
The missing values in a regression model can be replaced by a set of
rules (or decision tree)
-
Several classification models with the same target value can be merged
via a voting scheme, i.e the final classification result can be
computed as a an average of the results of the initial classifiers.The
average can be computed by a regression model.
-
Prediction results may have to be combined with a cost or profit
matrix before a decision can be derived. A mailing campaign model may
use tree classification to determine response probabilities per
customer and channel. The cost matrix can be appended as a regression
model that applies cost weighting factors to different channels, e.g.,
high cost for phone and low cost for email. The final decision is then
based on the outcome of the regression model.
|
Rank
|
Unspecified
|
Business Model
|
true
|
Relationships
From
|
compose
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
From
|
End Model Element
|
DM Consumer
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
selection
Abstract
|
false
|
Leaf
|
true
|
Root
|
false
|
Documentation
|
Model selection in PMML allows for combining multiple 'embedded models',
aka model expressions, into the decision logic that selects one of the
models depending on the current input values.
Examples of selection
-
A common method for optimizing prediction models is the combination of
segmentation and regression. Data are grouped into segments and for
each segment there may be different regression equations. If the
segmentation can be expressed by decision rules then this kind of
segment based regression can be implemented by a decision tree where
any leaf node in the tree can contain an embedded regression model.
|
Rank
|
Unspecified
|
Business Model
|
true
|
Relationships
From
|
compose
|
Substitutable
|
false
|
Visibility
|
Unspecified
|
From
|
End Model Element
|
DM Consumer
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
compare
Abstract
|
true
|
Leaf
|
false
|
Root
|
true
|
Documentation
|
A client application (DM Consumer) wants to compare two models.
The client needs to specify:
-
The two models to be compared, through an exact reference (model id)
-
The comparison type, which can be syntactic or semantic.
Syntactic Comparison. Two PMML models are compared through a XML
differencing approach. To be researched if the eXist's XML diff
extension module can be used here.
Semantic Comparison. Two models can be compared from the
following points of view:
-
Schema compatibility. This involves a DataDictionary compatibility
check and a MiningSchema compatibility check.
-
Function comparison. It is checked if the models fulfil the same
prediction task, the algorithms used to fulfil the task etc.
-
Performance measures (to be researched)
The Semantic comparison is useful for the cases in which the client
envisions a model composition and wants to pre-check the compatibility
of the models
|
Rank
|
Medium
|
Business Model
|
true
|
Relationships
From
|
End Model Element
|
DM Consumer
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
statistics
Abstract
|
true
|
Leaf
|
false
|
Root
|
true
|
Documentation
|
An application can invoke this service to obtain statistics on the
models.
Example of statistics:
-
frequencies per domain, producer, schema, or function type etc.
-
?? based on performance measures (to be researched):
-
sensitivity, specificity, accuracy (from what I've seen they are not
supported by PMML)
-
can we connect performance/assesment with model verification in PMML?
At the beginning DeVisa should be able to provide only a full report
(PMQL) for the point (1).
|
Rank
|
Medium
|
Business Model
|
true
|
Relationships
To
|
End Model Element
|
DM Consumer
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
DM Producer
Visibility
|
public
|
Abstract
|
false
|
Leaf
|
true
|
Root
|
false
|
Documentation
|
The DM Producer is a WS client application -typically a DM application,
like Weka- who uploads models in the DeVisa Repository.
|
Business Model
|
false
|
Relationships
To
|
End Model Element
|
upload
|
Provide Property Getter Method
|
false
|
Provide Property Setter Method
|
false
|
Multiplicity
|
Unspecified
|
Visibility
|
private
|
Aggregation Kind
|
None
|
Navigable
|
true
|
|
Abstract
|
false
|
Leaf
|
false
|
Visibility
|
Unspecified
|
Derived
|
false
|
DeVisa
Abstract
|
false
|
Leaf
|
false
|
Root
|
false
|
Children
scoring
|
The scoring use case means applying the models on the new instances.
The scoring occurs via web service methods.
Depending on the models, there are several types of DeVisa scoring
procedures:
-
Classification Scoring.
-
Cluster Scoring.
-
Association rules scoring.
|
compare
|
A client application (DM Consumer) wants to compare two models.
The client needs to specify:
-
The two models to be compared, through an exact reference (model id)
-
The comparison type, which can be syntactic or semantic.
Syntactic Comparison. Two PMML models are compared through a XML
differencing approach. To be researched if the eXist's XML diff
extension module can be used here.
Semantic Comparison. Two models can be compared from the
following points of view:
-
Schema compatibility. This involves a DataDictionary compatibility
check and a MiningSchema compatibility check.
-
Function comparison. It is checked if the models fulfil the same
prediction task, the algorithms used to fulfil the task etc.
-
Performance measures (to be researched)
The Semantic comparison is useful for the cases in which the client
envisions a model composition and wants to pre-check the compatibility
of the models
|
compose
|
Model Composition allows the combination of simple models into a single
composite PMML model.
PMML version 3.2 supports the combination of decision trees and simple
regression models. More general variants would be possible and may be
defined in future versions of PMML.
In PMML Model composition uses three syntactical concepts
-
The essential elements of a predictive model are captured in elements
that can be included in other models.
-
Embedded models can define new fields, similar to derived fields.
-
The leaf nodes in a decision tree can contain another predictive model.
In DeVisa simple models can be combined into more complex ones forming
new valid PMML documents.
A client application can specify the models subject to composition and
the combination method.
DeVisa identifies the specified models. If the models do not exist in
the repository then the process stops.
The found models are checked for compatibility. If they are not
compatible the process stops.
The new valid model is returned to the user/stored in the repository.
|
authenticate
|
The WS client that invokes a certain method might need to be authenticated
in order to execute the required function.
|
admin
|
The Admin use case involves uploading/downloading the DeVisa models
without any other processing.
|
search
|
The searching functions allow inspecting the properties of the PMML
models in the repository. The search functions conform and therefore are
limited to the information that a PMML model can incorporate according
the PMML 3.2 specification.
DeVisa provides searching functions such as:
Selecting the models with desired properties
-
function type: classification, regression, clustering etc
-
type of model: tree, SVM, cluster etc
-
producer e.g all the models belonging to a certain producer application
-
degree of freshness: e.g models newer than a certain date
-
performance measures
-
fields statistics
Selecting the models that conform to a certain schema
-
Exact Schema. the document refers a data dictionary but does not refer
a model; It can specify desired properties though. DeVisa engine will
select the appropriate model.
-
Match Schema. the document describes a data dictionary (and possibly a
mining schema according to the data dictionary). In this case the
DeVisa engine is responsible for identifying the matching schema and
the appropriate model (should these exist).
This type of search is used especially as an extension to the scoring
use care
Full text search in the model repository
This type of search is useful when the client is looking for keywords in
the description of the model, in the field names or field description,
in the function type etc
Future work: create a search mechanism that allows more complex
predicates on the search criterions.
|
classification scoring
|
The scoring method receives as input a set of instances and one or more
classification models and classifies the instances with respect to the
models.
|
clustering scoring
|
The scoring method receives as input a set of instances and one or more
clustering models and assigns the instances to the most appropriate
cluster in each of the models.
|
sequencing
|
Model sequencing is the process through which two or more models are
combined into a sequence where the results of one model are used as
input in another model.
Model sequencing is supported partially by the PMML specification.
Examples of sequencing:
-
The missing values in a regression model can be replaced by a set of
rules (or decision tree)
-
Several classification models with the same target value can be merged
via a voting scheme, i.e the final classification result can be
computed as a an average of the results of the initial classifiers.The
average can be computed by a regression model.
-
Prediction results may have to be combined with a cost or profit
matrix before a decision can be derived. A mailing campaign model may
use tree classification to determine response probabilities per
customer and channel. The cost matrix can be appended as a regression
model that applies cost weighting factors to different channels, e.g.,
high cost for phone and low cost for email. The final decision is then
based on the outcome of the regression model.
|
selection
|
Model selection in PMML allows for combining multiple 'embedded models',
aka model expressions, into the decision logic that selects one of the
models depending on the current input values.
Examples of selection
-
A common method for optimizing prediction models is the combination of
segmentation and regression. Data are grouped into segments and for
each segment there may be different regression equations. If the
segmentation can be expressed by decision rules then this kind of
segment based regression can be implemented by a decision tree where
any leaf node in the tree can contain an embedded regression model.
|
download
|
A DM Consumer can download a DeVisa model in order to use it internally,
e.g import into the DM application.
|
replace
|
Replace occurs when a model is replaced completely by a newer one. The
new model will get the same model ID in the repository.
|
upload
|
A DM Producer can upload models in the DeVisa repository via WS methods
(SOAP, XMLRPC).
|
statistics
|
An application can invoke this service to obtain statistics on the
models.
Example of statistics:
-
frequencies per domain, producer, schema, or function type etc.
-
?? based on performance measures (to be researched):
-
sensitivity, specificity, accuracy (from what I've seen they are not
supported by PMML)
-
can we connect performance/assesment with model verification in PMML?
At the beginning DeVisa should be able to provide only a full report
(PMQL) for the point (1).
|
update
|
Update occurs when a model needs only a certain type of adjustment. This
use case should use a specific XML update technique. The update
procedure is triggered by the system itself in most of the cases.
|
association rules scoring
|
The scoring method receives as input a set of items (instances) and one or
more association rule models. It determines all rules of each of the input
models whose antecedent itemset is a subset of a the input itemset and
returns the consequents of these rules as the inferred itemsets. An
extension of this procedure computes all rules whose antecedent and
consequent itemsets are included in the input itemset. This version is
useful to determine which itemsets support which rules.
|
|