Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/24866
Title: | Automated Text Mining and Ranked List Algorithms for Drug Discovery in Acute Myeloid Leukemia |
Authors: | Tran, Damian |
Advisor: | Hope, Kristin McArthur, Andrew Leber, Brian |
Department: | Health Sciences |
Keywords: | Acute myeloid leukemia;Drug discovery;Deep learning;Artificial intelligence;Literature review;Natural language processing;Automated data analysis;Chatbot;Convolutional neural network;Automated pipeline |
Publication Date: | 2019 |
Abstract: | Evidence-based software engineering (EBSE) solutions for drug discovery that are effective, affordable, and accessible all-in-one are lacking. This thesis chronicles the progression and accomplishments of the AiDA (Artificially-intelligent Desktop Assistant) functional artificial intelligence (AI) project for the purposes of drug discovery in the challenging acute myeloid leukemia context (AML). AiDA is a highly automated combined natural language processing (NLP) and spreadsheet feature extraction solution that harbours potential to disrupt the state of current research investigation methods using big data and aggregated literature. The completed work includes a text-to-function (T2F) NLP method for automated text interpretation, a ranked-list algorithm for multi-dataset analysis, and a custom multi-purpose neural network engine presented to the user using an open-source graphics engine. Validation of the deep learning engine using MNIST and CIFAR machine learning benchmark datasets showed performance comparable to state-of-the-art libraries using similar architectures. An n-dimensional word embedding method for the handling of unstructured natural language data was devised to feed convolutional neural network (CNN) models that over 25 random permutations correctly predicted functional responses to up to 86.64% of over 300 validation transcripts. The same CNN NLP infrastructure was then used to automate biomedical context recognition in >20000 literature abstracts with up to 95.7% test accuracy over several permutations. The AiDA platform was used to compile a bidirectional ranked list of potential gene targets for pharmaceuticals by extracting features from leukemia microarray data, followed by mining of the PubMed biomedical citation database to extract recyclable pharmaceutical candidates. Downstream analysis of the candidate therapeutic targets revealed enrichments in AML- and leukemic stem cell (LSC)-related pathways. The applicability of the AiDA algorithms in whole and part to the larger biomedical research field is explored. |
URI: | http://hdl.handle.net/11375/24866 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
tran_damian_v_201909_MSc.pdf | Main thesis document | 4.21 MB | Adobe PDF | View/Open |
S6_mutation_enrichments.xlsx | Supplementary Table S6 | 67.93 kB | Microsoft Excel XML | View/Open |
S7_context_abstracts.xlsx | Supplementary Table S7 | 476.53 kB | Microsoft Excel XML | View/Open |
S8_context_filtered.tsv | Supplementary Table S8 | 415.05 kB | Unknown | View/Open |
S1_drug_target_gene_ranks.xlsx | Supplementary Table S1 | 625.98 kB | Microsoft Excel XML | View/Open |
S3_AI_func_dataset.tsv | Supplementary Table S3 | 66.94 kB | Unknown | View/Open |
S2_categories_verbal.tsv | Supplementary Table S2 | 44.87 kB | Unknown | View/Open |
S4_gene_abstracts.tsv | Supplementary Table S4 | 64.52 MB | Unknown | View/Open |
S5_drug_abstracts.tsv | Supplementary Table S5 | 62.35 MB | Unknown | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.