|
AbstractDeVisa is a framework for unifying the expression of different prediction models using Web technologies. The prediction models are stored in a PMML repository providing the following functions:
MotivationData mining often is characterized as being predictive or descriptive. The predictive nature of data mining is that the models produced from historical data, have the ability to predict outcomes. The descriptive nature of data mining is where the model itself is inspected, to understand the essence of the knowledge or patterns found in the data. Some models serve both predictive and descriptive purposes. For example, a decision tree can not only predict outcomes, but also provide human interpretable rules that explain why a prediction was made. Clustering models do not only provide the ability to assign a record to a cluster, but also a description of each cluster, either in the form of a representative point called a centroid, or as a rule that describes why a record is considered part of the cluster. The algorithms for building data mining models are computationally expensive, both because they are based on analyzing large volumes of data and because the algorithms themselves are complex. Therefore it is very practical to save the models and further process or query them. Furthermore, the true value of data mining does not reside in a set of complex algorithms, but in the practical questions that it can help solve. DeVisa is focusing on maintaining and exploiting a repository of predictive models. Hence the knowledge is treated as data facilitating that new knowledge is derived from it. The use of open standards provides wide access to the classification models. Users are capable to search and find a useful model, that can be tested online, compared to other models and/or combined (using techniques as bagging or boosting). The models can be refined and enhanced during their exploitation. Furthermore, domain experts will test the models on new data and can provide direct feedback to the mining experts that developed the models. The use of Web services technology as well as the use of a standard open format specially designed to express data mining models (PMML) improve the interoperability and scalability of the system. DeVisa FeaturesDeVisa is built on top of the native database system eXist. Thus the model repository takes advantage of all the database management facilities.
References |