Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/26844
Title: INVESTIGATING THE EFFECT OF CLUSTER-BASED PREPROCESSING ON SOURCE-TO-SOURCE CODE TRANSLATION
Authors: Loganathan, Akila
Advisor: Paige, Richard
Department: Computing and Software
Publication Date: 2021
Abstract: Numerous programming languages have been proposed over the last 60 years. Programming languages, like other software systems, can become obsolete: their compilers, virtual machines, interpreters and libraries are no longer fit for purpose. As such, programs written using obsolete programming languages may need to be modernized, relying instead on modern languages, libraries and tools. Modernization is both a technical and social process; in this thesis, we focus on the technical aspects of modernization, particularly software migration, wherein a program written in one programming language is transformed into an equivalent or similar program written in a different language. Migration happens because many software systems that were developed decades ago can no longer be maintained and need to be overhauled to make it possible to implement new processes that can take advantage of new technologies recently developed. Migrating an existing codebase to a more efficient and modern programming language is often expensive, and there are different types of risks involved; for example, many functionalities may not be implemented properly after migration, i.e., the migration is inaccurate; or concerns for code quality may not be considered until the end of the migration; and for large code bases, the migration process may be slow, and may demand substantial resources to implement. Recent advancements in Artificial Intelligence in natural language translation have been widely accepted but their application to programming language translation have been limited due to the scarcity of parallel data (i.e., the collection of equivalent phrases in source language and their translations in a target language). This thesis explores a preliminary investigation into the use of unsupervisedlearning methods – specifically, a newly proposed K-Means clustering approach for preprocessing and analyzing the source code – prior to rule-based code translation. The thesis investigates such a process both generally and abstractly, and specifically, in the context of a concrete migration from C++ to Java. The thesis also presents a test set for evaluating such an approach, based on open source, which can be used as a general resource for both validating migration approaches and assessing their performance. The test results and our experiments show that our proposed translation approach based on unsupervised machine learning for preprocessing has a very good translation accuracy score of 77.89% and 81.34% when compared against an alternative approach with accuracy score of 33.24% and 59.96%, and also when compared with rule-based translation that excludes the preprocessing step with accuracy score 37.39% and 41.26%.
URI: http://hdl.handle.net/11375/26844
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
Loganathan_Akila_Finalsubmission2021August_MSc.pdf
Open Access
732.12 kBAdobe PDFView/Open
Show full item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue