Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/25973
Title: | Applying Machine Learning to Explore Nutrients Predictive of Cardiovascular Disease Using Canadian Linked Population-Based Data |
Other Titles: | Machine Learning to Predict Cardiovascular Disease with Nutrition |
Authors: | Morgenstern, Jason D. |
Advisor: | Anderson, Laura N. |
Department: | Clinical Epidemiology/Clinical Epidemiology & Biostatistics |
Keywords: | machine learning;nutritional epidemiology;public health;artificial intelligence;conditional inference forest;interpretable machine learning;population health survey;data linkage;cardiovascular disease;nutrition;prediction;predictive modeling |
Publication Date: | 2020 |
Abstract: | The use of big data and machine learning may help to address some challenges in nutritional epidemiology. The first objective of this thesis was to explore the use of machine learning prediction models in a hypothesis-generating approach to evaluate how detailed dietary features contribute to CVD risk prediction. The second objective was to assess the predictive performance of the models. A population-based retrospective cohort study was conducted using linked Canadian data from 2004 – 2018. Study participants were adults age 20 and older (n=12 130 ) who completed the 2004 Canadian Community Health Survey, Cycle 2.2, Nutrition (CCHS 2.2). Statistics Canada has linked the CCHS 2.2 data to the Discharge Abstracts Database and the Canadian Vital Statistics Death database, which were used to determine cardiovascular outcomes (stroke or ischemic heart disease events or deaths). Conditional inference forests were used to develop models. Then, permutation feature importance (PFI) and accumulated local effects (ALEs) were calculated to explore contributions of nutrients to predicted disease. Supplement-use (median PFI (M)=4.09 x 10-4, IQR=8.25 x 10-7 – 1.11 x 10-3) and caffeine (M=2.79 x 10-4, IQR= -9.11 x 10-5 – 5.86 x 10-4) had the highest median PFIs for nutrition-related features. Supplement-use was associated with decreased predicted risk of CVD (accumulated local effects range (ALER)= -3.02 x 10-4 – 2.76 x 10-4) and caffeine was associated with increased predicted risk (ALER= -9.96 x 10-4 – 0.035). The best-performing model had a logarithmic loss of 0.248. Overall, many non-linear relationships were observed, including threshold, j-shaped, and u-shaped. The results of this exploratory study suggest that applying machine learning to the nutritional epidemiology of CVD, particularly using big datasets, may help elucidate risks and improve predictive models. Given the limited application thus far, work such as this could lead to improvements in public health recommendations and policy related to dietary behaviours. |
Description: | McMaster University MASTER OF PUBLIC HEALTH (2020) Hamilton, Ontario (Health Research Methods, Evidence, and Impact) TITLE: Applying Machine Learning to Determine Nutrients Predictive of Cardiovascular Disease Using Canadian Linked Population-Based Data AUTHOR: Jason D. Morgenstern, B.Sc. (University of Guelph), M.D. (Western University) SUPERVISOR: Professor L.N. Anderson, NUMBER OF PAGES: xv, 121 |
URI: | http://hdl.handle.net/11375/25973 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
morgenstern_jason_d_202010_mph.pdf | 4.43 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.