Information incompleteness is a major data quality issue which is amplified by the increasing amount of data collected from unreliable sources. Assessing the completeness of data is crucial for determining the quality of the data and the validity of query answers.
In this thesis, we tackle the issue of extracting and reasoning about complete and missing information under relative information completeness setting. We advance the field by proposing two contributions: a pattern model for providing minimal covers summarizing the extent of complete and missing data partitions and a pattern algebra for deriving minimal pattern covers for query answers to analyze their validity.
The completeness pattern framework presents an intriguing opportunity to achieve many applications, particularly those aiming at improving the quality of tasks impacted by missing data. Data imputation is a well-known technique for repairing missing data values but can incur a prohibitive cost when applied to large data sets. Query-driven imputation offers a better alternative as it allows for We adopt a rule-based query rewriting technique for imputing the answers of aggregation queries that are missing or suffer from incorrectness due to data incompleteness. We present a novel query rewriting mechanism that is guided by the completeness pattern model and algebra.
We also, investigate the generalization of our pattern model for summarizing any data fragments. Summaries can be queried to analyze and compare data fragments in a synthetic and flexible way.