Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

Algorithms in Privacy & Security for Data Analytics and Machine Learning

dc.contributor.advisorSamavi, Reza
dc.contributor.authorLiang, Yuting
dc.contributor.departmentComputing and Softwareen_US
dc.date.accessioned2020-04-28T05:21:57Z
dc.date.available2020-04-28T05:21:57Z
dc.date.issued2020
dc.description.abstractApplications employing very large datasets are increasingly common in this age of Big Data. While these applications provide great benefits in various domains, their usage can be hampered by real-world privacy and security risks. In this work we propose algorithms which aim to provide privacy and security protection in different aspects of these applications. First, we address the problem of data privacy. When the datasets used contain personal information, they must be properly anonymized in order to protect the privacy of the subjects to which the records pertain. A popular privacy preservation technique is the k-anonymity model which guarantees that any record in the dataset must be indistinguishable from at least k-1 other records in terms of quasi-identifiers (i.e. the subset of attributes that can be used to deduce the identity of an individual). Achieving k-anonymity while considering the competing goal of data utility can be a challenge, especially for datasets containing large numbers of records. We formulate k-anonymization as an optimization problem with the objective to maximize data utility, and propose two practical algorithms for solving this problem. Secondly, we address the problem of application security; specifically, for predictive models using Deep Learning, where adversaries can use minimally perturbed inputs (a.k.a. adversarial examples) to cause a neural network to produce incorrect outputs. We propose an approach which protects against adversarial examples in image classification-type networks. The approach relies on two mechanisms: 1) a mechanism that increases robustness at the expense of accuracy; and, 2) a mechanism that improves accuracy. We show that an approach combining the two mechanisms can provide protection against adversarial examples while retaining accuracy. We provide experimental results to demonstrate the effectiveness of our algorithms for both problems.en_US
dc.description.degreeMaster of Science (MSc)en_US
dc.description.degreetypeThesisen_US
dc.description.layabstractApplications employing very large datasets are increasingly common in this age of Big Data. While these applications provide great benefits in various domains, their usage can be hampered by real-world privacy and security risks. In this work we propose algorithms which aim to provide privacy and security protection in different aspects of these applications. We address the problem of data privacy; when the datasets used contain personal information, they must be properly anonymized in order to protect the privacy of the subjects to which the records pertain. We propose two practical algorithms for anonymization which are also utility-centric. We address the problem of application security, specifically for Deep Learning applications where adversaries can use minimally perturbed inputs to cause a neural network to produce incorrect outputs. We propose an approach which protects against these attacks. We provide experimental results to demonstrate the effectiveness of our algorithms for both problems.en_US
dc.identifier.urihttp://hdl.handle.net/11375/25409
dc.language.isoenen_US
dc.subjectPrivacyen_US
dc.subjectSecurityen_US
dc.subjectAnonymizationen_US
dc.subjectMachine Learningen_US
dc.subjectAlgorithmsen_US
dc.titleAlgorithms in Privacy & Security for Data Analytics and Machine Learningen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Liang_Yuting_2020Apr_MSc.pdf
Size:
2.72 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: