Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

On the Sample Complexity of Privately Learning Gaussians and their Mixtures

dc.contributor.advisorAshtiani, Hassan
dc.contributor.authorAden-Ali, Ishaq
dc.contributor.departmentComputing and Softwareen_US
dc.date.accessioned2021-08-23T19:58:45Z
dc.date.available2021-08-23T19:58:45Z
dc.date.issued2021
dc.description.abstractMultivariate Gaussians: We provide sample complexity upper bounds for semi-agnostically learning multivariate Gaussians under the constraint of approximate differential privacy. These are the first  finite sample upper bounds for general Gaussians which do not impose restrictions on the parameters of the distribution. Our bounds are near-optimal in the case when the covariance is known to be the identity, and conjectured to be near-optimal in the general case. From a technical standpoint, we provide analytic tools for arguing the existence of global "locally small" covers from local covers of the space. These are exploited using modifications of recent techniques for for differentially private hypothesis selection. Mixtures of Gaussians: We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We provide the first sample complexity upper bounds for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. To prove our results, we design a new technique for privately learning mixture distributions. A class of distributions F is said to be list-decodable if there is an algorithm that, given "heavily corrupted" samples from a distribution f in F, outputs a list of distributions, H, such that one of the distributions in H approximates f. We show that if F is privately list-decodable then we can privately learn mixtures of distributions in F. Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable.en_US
dc.description.degreeMaster of Science (MSc)en_US
dc.description.degreetypeThesisen_US
dc.description.layabstractIs it possible to estimate an unknown probability distribution given random samples from it? This is a fundamental problem known as distribution learning (or density estimation) that has been studied by statisticians for decades, and in recent years has become a topic of interest for computer scientists. While distribution learning is a mature and well understood problem, in many cases the samples (or data) we observe may consist of sensitive information belonging to individuals and well-known solutions may inadvertently result in the leakage of private information. In this thesis we study distribution learning under the assumption that the data is generated from high-dimensional Gaussians (or their mixtures) with the aim of understanding how many samples an algorithm needs before it can guarantee a good estimate. Furthermore, to protect against leakage of private information, we consider approaches that satisfy differential privacy — the gold standard for modern private data analysis.en_US
dc.identifier.urihttp://hdl.handle.net/11375/26804
dc.language.isoenen_US
dc.subjectPAC learning, Learning Theory, Distribution Learning, Differential Privacy, Mixtures of Gaussiansen_US
dc.titleOn the Sample Complexity of Privately Learning Gaussians and their Mixturesen_US
dc.title.alternativePrivately Learning Gaussians and their Mixturesen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Aden-Ali_Ishaq_2021July_MSc.pdf
Size:
622.48 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: