, March 2018 - Present
This project was completed by myself and three other teammates as a final project for UWaterloo's Winter 2018 Statistical Learning - Classification
course. We designed and implemented a new CNN architecture that performs facial verification with high accuracy without compromising ease of implementation and speed of execution, using a modified VGG16-based architecture for transfer learning and pre-trained weights from ImageNet as a starting point. One of the many unique features that we introduced in our CNN architecture is target-specific parameter prediction, which allows the target image weights to have more interaction with the candidate weights as opposed to simple proximity scoring using absolute distance. The training and testing data used for this project are pulled from the Labeled Faces in the Wild Database
. Due to time and resource constraints, we were only able to achieve 97% accuracy on the training data. We have since obtained access to a 4 GPU machine from the Math Department and are currently continuing with the project.
The current paper draft
provides a clear explanation of the model design and implementation. Our team's main focus of this term is to tune the network so that it consistently achieves >95% testing accuracy without increasing the code runtime, as well as produce a final draft to submit to NIPS or CVPR. The jupyter notebook we've been working on up until recently can be found here
. The home directory for the project is pushed to github
Multiple Imputation for Latent Variables
, Apr 2018
As a part of the Winter 2018 Computational Inference
course final project, my teammate and I designed, implemented, documented, and thoroughly tested an R package that performs multiple imputation as well as pooled result analyses on survey data with ordinal responses. We were inspired by a multicultural identity survey data set that consisted of ordinal response questions designed with the intention of gauging the survey takers' opinions on three particular aspects of multicultural identity. These survey design aspects are treated as latent variables that need to be imputed. Our R package performs quantile regression on the survey responses in order to obtain discrete-to-continuous variable mapping, before using a probit model and factor analysis to sample more continuous values of mapped survey responses. The hypothesized latent variables are then imputed multiple times and fitted over a linear model before the parameters are pooled and estimated using Rubin's Rule. It was a project that opened our eyes to the world of survey data analysis and pushed us to be more creative with our problem solving.
I have re-organized the submitted final report into an easy-to-navigate web page that can be accessed here
. (Some mathematical symbols on the page may not be supported by older browsers.) Please note that the figures and tables in the report are not generated real-time using the code segments due to rendering time and runtime issues that are specified in the "Discussion" section of the report. This page is a replica of the summary.html
page generated from the respective Rmarkdown file located in my personal branch of the project repository, which can be accessed here
Parametric and Non-Parametric Analyses on Financial Data
, Apr 2018
For my Winter 2018 Statistical Learning - Function Estimation
course's final project, my teammates and I decided to survey a variety of parametric and non-parametric models in order to predict log-scaled retained earnings of publically listed U.S. Companies. The data set we used was obtained from Quandl
. Working with this data set was a little tricky, as we found out that almost half of the companies have a large number of missing columns due to the fact that they had been delisted. Additionally the original retained earnings had a number of outliers that would significantly affect the model fit but would have made the model less generalized if removed. Hence a lot of work went into data cleaning, pre-processing, and re-scaling.
Prior to model tuning, we used smoothing splines to perform variable selection. The statistical analyses methods surveyed in this project were thin-plate splines
, random forest
, and boosting
. 100 training and testing sets were re-sampled from the cleaned data using Bootstrap. These data sets were used in calculating the Average Predicted Squared Error (APSE) when tuning the models' hyper-parameters.
Like the previous Multiple Imputation project, this project's final report has also been re-compiled into an easy-to-navigate web page, accessible here
. The original Rmarkdown file (which also includes the R code used) and the data set can be found on the github repository here
, May - Aug 2017
Waterloop is University of Waterloo's team for the Hyperloop Pod Competition held annually at SpaceX HQ in California.
I worked with the team as a software systems developer between May and August 2017, with a focus on data analysis and telemetry. Some major tasks I've taken on include the design and implementation of a navigation system using IMUs and optical sensors, noise reduction of raw sensor data using support vector regression, and the re-design of the system state diagram. I've been blessed with the opportunity to work with amazingly talented developers and engineers from many different programs, and it has been a wonderful learning experience. For more information, please visit our website at
Since the Summer 2017 competition season, major sections of the Waterloop pod's embedded software system has been re-designed and my previous contributions have been archived as a result. I have migrated my works on navigation and telemetry to an archive directory on my personal github repository which can be accessed here
, Sep 2015 - Oct 2016
I have attended 8 hackathons since Sep 2015, where my role has mostly involved designing and
implementing data analysis and machine learning algorithms for our teams' hacks.
Here are some highlights.
- VisualTA, DubHacks, Oct 2016
Our team developed a Hololens hack which performed real-time image analysis on faces, designed to
be used as a visual aid for teachers with overcroweded classrooms to detect students who look confused.
I designed and trained a decision tree algorithm from scratch over the weekend, which performed at
an approximately 90% accuracy rate. Our hack won the prize for Best Use of Data Visualization at DubHacks.
Source code available on devpost and github. Click
for more details.
- STEMLabs, HackMIT, Sep 2016
Our team created a Hololens hack designed as a visual aid for students to visualize tough concepts in STEM.
I worked on the animation scripts written in C#. We were one of the Top 10 teams during final presentation.
Click here for
more details and demo video.
- Mortgage-Freeman, MLH Prime, Aug 2016
Mortgage-Freeman is a web app that recommends mortgage plans based on the user's credit card history.
I worked on designing and implementing the algorithms for calculating mortgage interest rates,
monthly payments, and duration. I also optimized and reverse-engineered ordinary annuity formulas
and performed regression analysis of U.S. 5/1 Adjustable mortgage rates using data from Quandl
and python libraries like pandas and numpy. Our hack was runner-up for Best Use of Capital One's API
and runner-up for Best Developer Tool, presented by GitHub. More details
For a complete list, please visit my