DEPENDENCY-PRESERVING GENERALIZATION FOR DATA PUBLISHING

Gorla, Harika

DEPENDENCY-PRESERVING GENERALIZATION FOR DATA PUBLISHING

Files

Gorla_Harika_201904_MSc.pdf (1.31 MB)

Date

2019

Authors

Gorla, Harika

Abstract

A vast amount of microdata about individuals and entities are collected and published for different purposes, such as demographic and public health research. However, data in its original form contains sensitive information about the individuals and publishing such data violates individuals privacy. To resolve this problem, privacy-preserving data publishing (PPDP) proposes many approaches to generate a public version of data that is practically useful and an individual’s privacy is protected. k-anonymity has emerged as an efficient approach to protect the individual’s privacy by generalizing and/or suppressing portions of the data to make individuals indistinguishable in the released data. Existing generalization algorithms focus on minimizing the information loss during generalization of attribute values. Any data dependencies defined over the data may be lost during this generalization step. A data dependency is a formal concept which is used to describe patterns in data. These patterns are employed during data analysis and data cleaning. A typical data dependency in a database is a Functional Dependency (FD): X -> Y expresses that the values of attribute X uniquely determine the values of attribute Y e.g. postal code -> province means the value of postal code uniquely determines the value of province. In this thesis, we study the problem of publishing data with two objectives. First, protecting the identity of the individuals in the published data through k-anonymity. Second, to provide high utility by preserving the instances of the data dependencies in the released data. We introduce dependency loss as a penalty measure for the anonymized public data. We define and study the problem of dependency-preserving generalization for finding a public database instance that guarantees privacy through k-anonymity and has minimum dependency loss. We present two clustering-based generalization algorithms that find such a database instance and we run experiments to show the comparable performance and improved utility in preserving data dependencies of our algorithms.

URI

http://hdl.handle.net/11375/24256

Collections

Open Access Dissertations and Theses

Full item page

DEPENDENCY-PRESERVING GENERALIZATION FOR DATA PUBLISHING

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By