Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/25973
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorAnderson, Laura N.-
dc.contributor.authorMorgenstern, Jason D.-
dc.date.accessioned2020-10-22T19:23:35Z-
dc.date.available2020-10-22T19:23:35Z-
dc.date.issued2020-
dc.identifier.urihttp://hdl.handle.net/11375/25973-
dc.descriptionMcMaster University MASTER OF PUBLIC HEALTH (2020) Hamilton, Ontario (Health Research Methods, Evidence, and Impact) TITLE: Applying Machine Learning to Determine Nutrients Predictive of Cardiovascular Disease Using Canadian Linked Population-Based Data AUTHOR: Jason D. Morgenstern, B.Sc. (University of Guelph), M.D. (Western University) SUPERVISOR: Professor L.N. Anderson, NUMBER OF PAGES: xv, 121en_US
dc.description.abstractThe use of big data and machine learning may help to address some challenges in nutritional epidemiology. The first objective of this thesis was to explore the use of machine learning prediction models in a hypothesis-generating approach to evaluate how detailed dietary features contribute to CVD risk prediction. The second objective was to assess the predictive performance of the models. A population-based retrospective cohort study was conducted using linked Canadian data from 2004 – 2018. Study participants were adults age 20 and older (n=12 130 ) who completed the 2004 Canadian Community Health Survey, Cycle 2.2, Nutrition (CCHS 2.2). Statistics Canada has linked the CCHS 2.2 data to the Discharge Abstracts Database and the Canadian Vital Statistics Death database, which were used to determine cardiovascular outcomes (stroke or ischemic heart disease events or deaths). Conditional inference forests were used to develop models. Then, permutation feature importance (PFI) and accumulated local effects (ALEs) were calculated to explore contributions of nutrients to predicted disease. Supplement-use (median PFI (M)=4.09 x 10-4, IQR=8.25 x 10-7 – 1.11 x 10-3) and caffeine (M=2.79 x 10-4, IQR= -9.11 x 10-5 – 5.86 x 10-4) had the highest median PFIs for nutrition-related features. Supplement-use was associated with decreased predicted risk of CVD (accumulated local effects range (ALER)= -3.02 x 10-4 – 2.76 x 10-4) and caffeine was associated with increased predicted risk (ALER= -9.96 x 10-4 – 0.035). The best-performing model had a logarithmic loss of 0.248. Overall, many non-linear relationships were observed, including threshold, j-shaped, and u-shaped. The results of this exploratory study suggest that applying machine learning to the nutritional epidemiology of CVD, particularly using big datasets, may help elucidate risks and improve predictive models. Given the limited application thus far, work such as this could lead to improvements in public health recommendations and policy related to dietary behaviours.en_US
dc.language.isoenen_US
dc.subjectmachine learningen_US
dc.subjectnutritional epidemiologyen_US
dc.subjectpublic healthen_US
dc.subjectartificial intelligenceen_US
dc.subjectconditional inference foresten_US
dc.subjectinterpretable machine learningen_US
dc.subjectpopulation health surveyen_US
dc.subjectdata linkageen_US
dc.subjectcardiovascular diseaseen_US
dc.subjectnutritionen_US
dc.subjectpredictionen_US
dc.subjectpredictive modelingen_US
dc.titleApplying Machine Learning to Explore Nutrients Predictive of Cardiovascular Disease Using Canadian Linked Population-Based Dataen_US
dc.title.alternativeMachine Learning to Predict Cardiovascular Disease with Nutritionen_US
dc.typeThesisen_US
dc.contributor.departmentClinical Epidemiology/Clinical Epidemiology & Biostatisticsen_US
dc.description.degreetypeThesisen_US
dc.description.degreeMaster of Public Health (MPH)en_US
dc.description.layabstractThis work explores the potential for machine learning to improve the study of diet and disease. In chapter 2, opportunities are identified for big data to make diet easier to measure. Also, we highlight how machine learning could find new, complex relationships between diet and disease. In chapter 3, we apply a machine learning algorithm, called conditional inference forests, to a unique Canadian dataset to predict whether people developed strokes or heart attacks. This dataset included responses to a health survey conducted in 2004, where participants’ responses have been linked to administrative databases that record when people go to hospital or die up until 2017. Using these techniques, we identified aspects of nutrition that predicted disease, including caffeine, alcohol, and supplement-use. This work suggests that machine learning may be helpful in our attempts to understand the relationships between diet and health.en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
morgenstern_jason_d_202010_mph.pdf
Access is allowed from: 2021-10-15
4.43 MBAdobe PDFView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue