Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/24256
Title: | DEPENDENCY-PRESERVING GENERALIZATION FOR DATA PUBLISHING |
Authors: | Gorla, Harika |
Advisor: | Chiang, Fei |
Department: | Computing and Software |
Publication Date: | 2019 |
Abstract: | A vast amount of microdata about individuals and entities are collected and published for different purposes, such as demographic and public health research. However, data in its original form contains sensitive information about the individuals and publishing such data violates individuals privacy. To resolve this problem, privacy-preserving data publishing (PPDP) proposes many approaches to generate a public version of data that is practically useful and an individual’s privacy is protected. k-anonymity has emerged as an efficient approach to protect the individual’s privacy by generalizing and/or suppressing portions of the data to make individuals indistinguishable in the released data. Existing generalization algorithms focus on minimizing the information loss during generalization of attribute values. Any data dependencies defined over the data may be lost during this generalization step. A data dependency is a formal concept which is used to describe patterns in data. These patterns are employed during data analysis and data cleaning. A typical data dependency in a database is a Functional Dependency (FD): X -> Y expresses that the values of attribute X uniquely determine the values of attribute Y e.g. postal code -> province means the value of postal code uniquely determines the value of province. In this thesis, we study the problem of publishing data with two objectives. First, protecting the identity of the individuals in the published data through k-anonymity. Second, to provide high utility by preserving the instances of the data dependencies in the released data. We introduce dependency loss as a penalty measure for the anonymized public data. We define and study the problem of dependency-preserving generalization for finding a public database instance that guarantees privacy through k-anonymity and has minimum dependency loss. We present two clustering-based generalization algorithms that find such a database instance and we run experiments to show the comparable performance and improved utility in preserving data dependencies of our algorithms. |
URI: | http://hdl.handle.net/11375/24256 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Gorla_Harika_201904_MSc.pdf | 1.34 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.