Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/27284
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorAyers, Paul-
dc.contributor.authorMeng, Fanwang-
dc.date.accessioned2022-01-19T20:41:17Z-
dc.date.available2022-01-19T20:41:17Z-
dc.date.issued2022-
dc.identifier.urihttp://hdl.handle.net/11375/27284-
dc.description.abstractMachine learning (ML) has enjoyed great success in chemistry and drug design, from designing synthetic pathways to drug screening, to biomolecular property predictions, etc.. However, ML model's generalizability and robustness require high-quality training data, which is often difficult to obtain, especially when the training data is acquired from experimental measurements. While one can always discard all data associated with noisy and/or missing values, this often results in discarding invaluable data. This thesis presents and applies mathematical techniques to solve this problem, and applies them to problems in molecular medicinal chemistry. In chapter 1, we indicate that the missing-data problem can be expressed as a matrix completion problem, and we point out how frequently matrix completion problems arise in (bio)chemical problems. Next, we use matrix completion to impute the missing values in protein-NMR data, and use this as a stepping-stone for understanding protein allostery in Chapter 2. This chapter also used several other techniques from statistical data analysis and machine learning, including denoising (from robust principal component analysis), latent feature identification from singular-value decomposition, and residue clustering by a Gaussian mixture model. In chapter 3, Δ-learning was used to predict the free energies of hydration (Δ𝐺). The aim of this study is to correct estimated hydration energies from low-level quantum chemistry calculations using continuum solvation models without significant additional computation. Extensive feature engineering, with 8 different regression algorithms and with Gaussian process regression (38 different kernels) were used to construct the predictive models. The optimal model gives us MAE of 0.6249 kcal/mol and RMSE of 1.0164 kcal/mol. Chapter 4 provides an open-source computational tool Procrustes to find the maximum similarities between metrics. Some examples are also given to show how to use Procrustes for chemical and biological problems. Finally, in Chapters 5 and 6, a database for permeability of the blood-brain barrier (BBB) was curated, and combined with resampling strategies to form predictive models. The resulting models have promising performance and are released along with a computational tool B3clf for its evaluation.en_US
dc.language.isoenen_US
dc.subjectmachine learningen_US
dc.subjectcomputational drug designen_US
dc.subjectmatrix completionen_US
dc.subjectmissing valueen_US
dc.subjectΔ-Learningen_US
dc.subjectimbalanced learningen_US
dc.titleOvercoming the Curse of Missing and Noisy Data in Computational Drug Designen_US
dc.typeThesisen_US
dc.contributor.departmentChemistry and Chemical Biologyen_US
dc.description.degreetypeThesisen_US
dc.description.degreeDoctor of Science (PhD)en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
Meng_Fanwang_202112_PhD.pdf
Access is allowed from: 2023-01-10
Fanwang Meng's Ph.D. thesis29.09 MBAdobe PDFView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue