Declarative inconsistency handling in relational and semi-structured databases
MetadataShow full item record
In many novel database applications inconsistency becomes an important issue that cannot be addressed with simple techniques like data cleaning. For instance, data conflicts can arise during the integration of independent data sources, and the user may not have privileges to resolve the inconsistencies. The framework of consistent query answers (CQA) aims to alleviate the impact of inconsistencies on query evaluation by considering all possible (minimal) repairs of the original database. The consistent answers are the answers present in every repair. Because the repairs are not materialized, this approach does not modify the state of the database i.e., no information is physically removed from the database. In this thesis we advance the research on consistent query answers in several directions. First, we address the open question of the complexity of computing consistent query answers in the presence of universal constraints. We show that in general the problem is [Special characters omitted.] <math> <f> <g>P</g><sup>P</sup><inf>2</inf></f> </math> -complete, but we also show that for acyclic sets of full tuple-generating dependencies and denial constraints, the problem is tractable. Second, we implement a practical system Hippo for computing consistent answers to a broad class of queries w.r.t. an acyclic set of full tuple-generating dependencies and denial constraints. The efficiency of this approach, however, suffers from an excessive number of database calls. We devise several optimizations addressing this problem, which make our system capable of handling large databases. Next, we investigate extending the CQA framework with user-specified preferences on how to resolve conflicts. In the data integration scenario, for instance, the user may possess partial information on the reliability of the data sources. This kind of information can be used to further refine the quality of consistent query answers. We propose a general framework of preference-based conflict resolution , identify a set of desired properties, and investigate their computational implications. Finally, we propose adapting the framework of CQA to semi-structured databases . This direction of study is inspired by the observation that many violations of consistency are encountered in the context of XML applications. We propose the framework of valid query answers , study its computational implications, and identify tractable cases. We also present an experimental evaluation of our approach.