On the Sample Complexity of Privately Learning Gaussians and their Mixtures

Aden-Ali, Ishaq

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/26804

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Ashtiani, Hassan	-
dc.contributor.author	Aden-Ali, Ishaq	-
dc.date.accessioned	2021-08-23T19:58:45Z	-
dc.date.available	2021-08-23T19:58:45Z	-
dc.date.issued	2021	-
dc.identifier.uri	http://hdl.handle.net/11375/26804	-
dc.description.abstract	Multivariate Gaussians: We provide sample complexity upper bounds for semi-agnostically learning multivariate Gaussians under the constraint of approximate differential privacy. These are the first finite sample upper bounds for general Gaussians which do not impose restrictions on the parameters of the distribution. Our bounds are near-optimal in the case when the covariance is known to be the identity, and conjectured to be near-optimal in the general case. From a technical standpoint, we provide analytic tools for arguing the existence of global "locally small" covers from local covers of the space. These are exploited using modifications of recent techniques for for differentially private hypothesis selection. Mixtures of Gaussians: We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We provide the first sample complexity upper bounds for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. To prove our results, we design a new technique for privately learning mixture distributions. A class of distributions F is said to be list-decodable if there is an algorithm that, given "heavily corrupted" samples from a distribution f in F, outputs a list of distributions, H, such that one of the distributions in H approximates f. We show that if F is privately list-decodable then we can privately learn mixtures of distributions in F. Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable.	en_US
dc.language.iso	en	en_US
dc.subject	PAC learning, Learning Theory, Distribution Learning, Differential Privacy, Mixtures of Gaussians	en_US
dc.title	On the Sample Complexity of Privately Learning Gaussians and their Mixtures	en_US
dc.title.alternative	Privately Learning Gaussians and their Mixtures	en_US
dc.type	Thesis	en_US
dc.contributor.department	Computing and Software	en_US
dc.description.degreetype	Thesis	en_US
dc.description.degree	Master of Science (MSc)	en_US
dc.description.layabstract	Is it possible to estimate an unknown probability distribution given random samples from it? This is a fundamental problem known as distribution learning (or density estimation) that has been studied by statisticians for decades, and in recent years has become a topic of interest for computer scientists. While distribution learning is a mature and well understood problem, in many cases the samples (or data) we observe may consist of sensitive information belonging to individuals and well-known solutions may inadvertently result in the leakage of private information. In this thesis we study distribution learning under the assumption that the data is generated from high-dimensional Gaussians (or their mixtures) with the aim of understanding how many samples an algorithm needs before it can guarantee a good estimate. Furthermore, to protect against leakage of private information, we consider approaches that satisfy differential privacy — the gold standard for modern private data analysis.	en_US
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Aden-Ali_Ishaq_2021July_MSc.pdf Open Access		622.48 kB	Adobe PDF	View/Open

Show simple item record