Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/25041
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorSekerinski, Emil-
dc.contributor.advisorCopp, John-
dc.contributor.authorYAN, YAN-
dc.date.accessioned2019-10-24T15:30:18Z-
dc.date.available2019-10-24T15:30:18Z-
dc.date.issued2019-
dc.identifier.urihttp://hdl.handle.net/11375/25041-
dc.description.abstractReal-time water quality monitoring using automated systems with sensors is becoming increasingly common, which enables and demands timely identification of unexpected values. Technical issues create anomalies, which at the rate of incoming data can prevent the manual detection of problematic data. This thesis deals with the problem of anomaly detection for water quality data using machine learning and statistic learning approaches. Anomalies in data can cause serious problems in posterior analysis and lead to poor decisions or incorrect conclusions. Five time series anomaly detection techniques: local outlier factor (machine learning), isolation forest (machine learning), robust random cut forest (machine learning), seasonal hybrid extreme studentized deviate (statistic learning approach), and exponential moving average (statistic learning approach) have been analyzed. Extensive experimental analysis of those techniques have been performed on data sets collected from sensors deployed in a wastewater treatment plant. The results are very promising. In the experiments, three approaches successfully detected anomalies in the ammonia data set. With the temperature data set, the local outlier factor successfully detected all twenty-six outliers whereas the seasonal hybrid extreme studentized deviate only detected one anomaly point. The exponential moving average identified ten time ranges with anomalies. Eight of them cover a total of fourteen anomalies. The reproducible experiments demonstrate that local outlier factor is a feasible approach for detecting anomalies in water quality data. Isolation forest and robust random cut forest also rate high anomaly scores for the anomalies. The result of the primary experiment confirms that local outlier factor is much faster than isolation forest, robust random cut forest, seasonal hybrid extreme studentized deviate and exponential moving average.en_US
dc.language.isoenen_US
dc.subjectAnomaly Detectionen_US
dc.subjectWater Qualityen_US
dc.subjectMachine Learningen_US
dc.subjectLocal Outlier Factoren_US
dc.subjectIsolation Foresten_US
dc.subjectRandom Cut Foresten_US
dc.subjectS-H-ESDen_US
dc.subjectEMAen_US
dc.subjectStatistic Learningen_US
dc.titleAnomaly Detection for Water Quality Dataen_US
dc.typeThesisen_US
dc.contributor.departmentComputing and Softwareen_US
dc.description.degreetypeThesisen_US
dc.description.degreeMaster of Computer Science (MCS)en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
YAN_YAN_201906_M.Sc..pdf
Open Access
3.46 MBAdobe PDFView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue