DEPENDENCY-PRESERVING GENERALIZATION FOR DATA PUBLISHING

Gorla, Harika

DEPENDENCY-PRESERVING GENERALIZATION FOR DATA PUBLISHING

dc.contributor.advisor	Chiang, Fei
dc.contributor.author	Gorla, Harika
dc.contributor.department	Computing and Software	en_US
dc.date.accessioned	2019-04-22T18:19:59Z
dc.date.available	2019-04-22T18:19:59Z
dc.date.issued	2019
dc.description.abstract	A vast amount of microdata about individuals and entities are collected and published for different purposes, such as demographic and public health research. However, data in its original form contains sensitive information about the individuals and publishing such data violates individuals privacy. To resolve this problem, privacy-preserving data publishing (PPDP) proposes many approaches to generate a public version of data that is practically useful and an individual’s privacy is protected. k-anonymity has emerged as an efficient approach to protect the individual’s privacy by generalizing and/or suppressing portions of the data to make individuals indistinguishable in the released data. Existing generalization algorithms focus on minimizing the information loss during generalization of attribute values. Any data dependencies defined over the data may be lost during this generalization step. A data dependency is a formal concept which is used to describe patterns in data. These patterns are employed during data analysis and data cleaning. A typical data dependency in a database is a Functional Dependency (FD): X -> Y expresses that the values of attribute X uniquely determine the values of attribute Y e.g. postal code -> province means the value of postal code uniquely determines the value of province. In this thesis, we study the problem of publishing data with two objectives. First, protecting the identity of the individuals in the published data through k-anonymity. Second, to provide high utility by preserving the instances of the data dependencies in the released data. We introduce dependency loss as a penalty measure for the anonymized public data. We define and study the problem of dependency-preserving generalization for finding a public database instance that guarantees privacy through k-anonymity and has minimum dependency loss. We present two clustering-based generalization algorithms that find such a database instance and we run experiments to show the comparable performance and improved utility in preserving data dependencies of our algorithms.	en_US
dc.description.degree	Master of Science (MSc)	en_US
dc.description.degreetype	Thesis	en_US
dc.identifier.uri	http://hdl.handle.net/11375/24256
dc.language.iso	en	en_US
dc.title	DEPENDENCY-PRESERVING GENERALIZATION FOR DATA PUBLISHING	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Gorla_Harika_201904_MSc.pdf
Size:: 1.31 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.68 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Dissertations and Theses