Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

Automated Text Mining and Ranked List Algorithms for Drug Discovery in Acute Myeloid Leukemia

dc.contributor.advisorHope, Kristin
dc.contributor.advisorMcArthur, Andrew
dc.contributor.advisorLeber, Brian
dc.contributor.authorTran, Damian
dc.contributor.departmentHealth Sciencesen_US
dc.date.accessioned2019-10-02T14:17:25Z
dc.date.available2019-10-02T14:17:25Z
dc.date.issued2019
dc.description.abstractEvidence-based software engineering (EBSE) solutions for drug discovery that are effective, affordable, and accessible all-in-one are lacking. This thesis chronicles the progression and accomplishments of the AiDA (Artificially-intelligent Desktop Assistant) functional artificial intelligence (AI) project for the purposes of drug discovery in the challenging acute myeloid leukemia context (AML). AiDA is a highly automated combined natural language processing (NLP) and spreadsheet feature extraction solution that harbours potential to disrupt the state of current research investigation methods using big data and aggregated literature. The completed work includes a text-to-function (T2F) NLP method for automated text interpretation, a ranked-list algorithm for multi-dataset analysis, and a custom multi-purpose neural network engine presented to the user using an open-source graphics engine. Validation of the deep learning engine using MNIST and CIFAR machine learning benchmark datasets showed performance comparable to state-of-the-art libraries using similar architectures. An n-dimensional word embedding method for the handling of unstructured natural language data was devised to feed convolutional neural network (CNN) models that over 25 random permutations correctly predicted functional responses to up to 86.64% of over 300 validation transcripts. The same CNN NLP infrastructure was then used to automate biomedical context recognition in >20000 literature abstracts with up to 95.7% test accuracy over several permutations. The AiDA platform was used to compile a bidirectional ranked list of potential gene targets for pharmaceuticals by extracting features from leukemia microarray data, followed by mining of the PubMed biomedical citation database to extract recyclable pharmaceutical candidates. Downstream analysis of the candidate therapeutic targets revealed enrichments in AML- and leukemic stem cell (LSC)-related pathways. The applicability of the AiDA algorithms in whole and part to the larger biomedical research field is explored.en_US
dc.description.degreeMaster of Science (MSc)en_US
dc.description.degreetypeThesisen_US
dc.description.layabstractLead generation is an integral requirement of any research organization in all fields and is typically a time-consuming and therefore expensive task. This is due to the requirement of human intuition to be applied iteratively over a large body of evidence. In this thesis, a new technology called the Artificially-intelligent Desktop Assistant (AiDA) is explored in order to provide a large number of leads from accumulated biomedical information. AiDA was created using a combination of classical statistics, deep learning methods, and modern graphical interface engineering. It aims to simplify the interface between the researcher and an assortment of bioinformatics tasks by organically interpreting written text messages and responding with the appropriate task. AiDA was able to identify several potential targets for new pharmaceuticals in acute myeloid leukemia (AML), a cancer of the blood, by reading whole-genome data. It then discovered appropriate therapeutics by automatically scanning through the accumulated body of biomedical research papers. Analysis of the discovered drug targets shows that together, they are involved in key biological processes that are known by the scientific community to be involved in leukemia and other cancers.en_US
dc.identifier.urihttp://hdl.handle.net/11375/24866
dc.language.isoenen_US
dc.subjectAcute myeloid leukemiaen_US
dc.subjectDrug discoveryen_US
dc.subjectDeep learningen_US
dc.subjectArtificial intelligenceen_US
dc.subjectLiterature reviewen_US
dc.subjectNatural language processingen_US
dc.subjectAutomated data analysisen_US
dc.subjectChatboten_US
dc.subjectConvolutional neural networken_US
dc.subjectAutomated pipelineen_US
dc.titleAutomated Text Mining and Ranked List Algorithms for Drug Discovery in Acute Myeloid Leukemiaen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 5 of 9
Loading...
Thumbnail Image
Name:
tran_damian_v_201909_MSc.pdf
Size:
4.11 MB
Format:
Adobe Portable Document Format
Description:
Main thesis document
Loading...
Thumbnail Image
Name:
S6_mutation_enrichments.xlsx
Size:
67.93 KB
Format:
Microsoft Excel XML
Description:
Supplementary Table S6
Loading...
Thumbnail Image
Name:
S7_context_abstracts.xlsx
Size:
476.53 KB
Format:
Microsoft Excel XML
Description:
Supplementary Table S7
Loading...
Thumbnail Image
Name:
S8_context_filtered.tsv
Size:
415.05 KB
Format:
Unknown data format
Description:
Supplementary Table S8
Loading...
Thumbnail Image
Name:
S1_drug_target_gene_ranks.xlsx
Size:
625.98 KB
Format:
Microsoft Excel XML
Description:
Supplementary Table S1

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: