Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

DEPENDENCY-PRESERVING GENERALIZATION FOR DATA PUBLISHING

dc.contributor.advisorChiang, Fei
dc.contributor.authorGorla, Harika
dc.contributor.departmentComputing and Softwareen_US
dc.date.accessioned2019-04-22T18:19:59Z
dc.date.available2019-04-22T18:19:59Z
dc.date.issued2019
dc.description.abstractA vast amount of microdata about individuals and entities are collected and published for different purposes, such as demographic and public health research. However, data in its original form contains sensitive information about the individuals and publishing such data violates individuals privacy. To resolve this problem, privacy-preserving data publishing (PPDP) proposes many approaches to generate a public version of data that is practically useful and an individual’s privacy is protected. k-anonymity has emerged as an efficient approach to protect the individual’s privacy by generalizing and/or suppressing portions of the data to make individuals indistinguishable in the released data. Existing generalization algorithms focus on minimizing the information loss during generalization of attribute values. Any data dependencies defined over the data may be lost during this generalization step. A data dependency is a formal concept which is used to describe patterns in data. These patterns are employed during data analysis and data cleaning. A typical data dependency in a database is a Functional Dependency (FD): X -> Y expresses that the values of attribute X uniquely determine the values of attribute Y e.g. postal code -> province means the value of postal code uniquely determines the value of province. In this thesis, we study the problem of publishing data with two objectives. First, protecting the identity of the individuals in the published data through k-anonymity. Second, to provide high utility by preserving the instances of the data dependencies in the released data. We introduce dependency loss as a penalty measure for the anonymized public data. We define and study the problem of dependency-preserving generalization for finding a public database instance that guarantees privacy through k-anonymity and has minimum dependency loss. We present two clustering-based generalization algorithms that find such a database instance and we run experiments to show the comparable performance and improved utility in preserving data dependencies of our algorithms.en_US
dc.description.degreeMaster of Science (MSc)en_US
dc.description.degreetypeThesisen_US
dc.identifier.urihttp://hdl.handle.net/11375/24256
dc.language.isoenen_US
dc.titleDEPENDENCY-PRESERVING GENERALIZATION FOR DATA PUBLISHINGen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gorla_Harika_201904_MSc.pdf
Size:
1.31 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: