Hello, I'm Rosie.

Welcome to my site!

About Me

Hi! My name is Rosie Zou. I am a fourth-year Computer Science student in the Data Science program at University of Waterloo. Over the past few years, I have developed a focused interest in machine learning and artificial intelligence, and have had the pleasure to work on a number of projects in the areas of autonomous vehicles, computer vision, natural language processing, computational inference, and etc. Please scroll down for more details on my work experience and professional skills.

Thanks for stopping by. Cheers!

Work Experience -- To be updated

Undergraduate Research Assistant, University of Waterloo, May 2017 - present




Equity Trading Intern, TD Securities, Apr - Dec 2016

As a part of the algorithm trading team at TD Securities, I completed a variety of data analysis and visualization tasks during my internship, mainly using Python, SQL, and VBScript. Utilizing various illustrative tools as well as high-volume historical trade data, I was able to precisely analyze market segmentation, which helped our sales team create effective marketing campaigns that attracted more institutional clients.


Associate Business Analyst, Scotiabank, Sep - Dec 2015

While working with the Compliance team at Scotiabank, I assisted my Project Manager with the financial reporting software migration project for Scotiabank Ireland in light of the introduction of Basel III regulations in Europe. In addition to the software migration project, I was also responsible for daily and ad-hoc software maintenance requests.

Projects

CSEye, March - Sept 2018

Update: Our paper has been accepted to 2019 AAAI's Student Abstract Category. The final camera-ready draft can be found here.

This project was completed by myself and three other teammates as a final project for UWaterloo's Winter 2018 Statistical Learning - Classification course. We designed and implemented a new CNN architecture that performs facial verification with high accuracy without compromising ease of implementation and speed of execution, using a modified VGG16-based architecture for transfer learning and pre-trained weights from ImageNet as a starting point. One of the many unique features that we introduced in our CNN architecture is target-specific parameter prediction, which allows the target image weights to have more interaction with the candidate weights as opposed to simple proximity scoring using absolute distance. The training and testing data used for this project are pulled from the Labeled Faces in the Wild Database. Due to time and resource constraints, we were only able to achieve 97% accuracy on the training data. We have since obtained access to a 4 GPU machine from the Math Department and are currently continuing with the project.

The current paper draft provides a clear explanation of the model design and implementation. Our team's main focus of this term is to tune the network so that it consistently achieves >95% testing accuracy without increasing the code runtime, as well as produce a final draft to submit to NIPS or CVPR. The jupyter notebook we've been working on up until recently can be found here. The home directory for the project is pushed to github.


Multiple Imputation for Latent Variables, Apr 2018
As a part of the Winter 2018 Computational Inference course final project, my teammate and I designed, implemented, documented, and thoroughly tested an R package that performs multiple imputation as well as pooled result analyses on survey data with ordinal responses. We were inspired by a multicultural identity survey data set that consisted of ordinal response questions designed with the intention of gauging the survey takers' opinions on three particular aspects of multicultural identity. These survey design aspects are treated as latent variables that need to be imputed. Our R package performs quantile regression on the survey responses in order to obtain discrete-to-continuous variable mapping, before using a probit model and factor analysis to sample more continuous values of mapped survey responses. The hypothesized latent variables are then imputed multiple times and fitted over a linear model before the parameters are pooled and estimated using Rubin's Rule. It was a project that opened our eyes to the world of survey data analysis and pushed us to be more creative with our problem solving.

I have re-organized the submitted final report into an easy-to-navigate web page that can be accessed here. (Some mathematical symbols on the page may not be supported by older browsers.) Please note that the figures and tables in the report are not generated real-time using the code segments due to rendering time and runtime issues that are specified in the "Discussion" section of the report. This page is a replica of the summary.html page generated from the respective Rmarkdown file located in my personal branch of the project repository, which can be accessed here.


Parametric and Non-Parametric Analyses on Financial Data, Apr 2018
For my Winter 2018 Statistical Learning - Function Estimation course's final project, my teammates and I decided to survey a variety of parametric and non-parametric models in order to predict log-scaled retained earnings of publically listed U.S. Companies. The data set we used was obtained from Quandl. Working with this data set was a little tricky, as we found out that almost half of the companies have a large number of missing columns due to the fact that they had been delisted. Additionally the original retained earnings had a number of outliers that would significantly affect the model fit but would have made the model less generalized if removed. Hence a lot of work went into data cleaning, pre-processing, and re-scaling.

Prior to model tuning, we used smoothing splines to perform variable selection. The statistical analyses methods surveyed in this project were thin-plate splines, random forest, and boosting. 100 training and testing sets were re-sampled from the cleaned data using Bootstrap. These data sets were used in calculating the Average Predicted Squared Error (APSE) when tuning the models' hyper-parameters.

Like the previous Multiple Imputation project, this project's final report has also been re-compiled into an easy-to-navigate web page, accessible here. The original Rmarkdown file (which also includes the R code used) and the data set can be found on the github repository here.


Waterloop, May - Aug 2017
Waterloop is University of Waterloo's team for the Hyperloop Pod Competition held annually at SpaceX HQ in California. I worked with the team as a software systems developer between May and August 2017, with a focus on data analysis and telemetry. Some major tasks I've taken on include the design and implementation of a navigation system using IMUs and optical sensors, noise reduction of raw sensor data using support vector regression, and the re-design of the system state diagram. I've been blessed with the opportunity to work with amazingly talented developers and engineers from many different programs, and it has been a wonderful learning experience. For more information, please visit our website at teamwaterloop.ca and github.com/teamwaterloop

Update: Since the Summer 2017 competition season, major sections of the Waterloop pod's embedded software system has been re-designed and my previous contributions have been archived as a result. I have migrated my works on navigation and telemetry to an archive directory on my personal github repository which can be accessed here.


Hackathons, Sep 2015 - Oct 2016
I have attended 8 hackathons since Sep 2015, where my role has mostly involved designing and implementing data analysis and machine learning algorithms for our teams' hacks. Here are some highlights.

  • VisualTA, DubHacks, Oct 2016
    Our team developed a Hololens hack which performed real-time image analysis on faces, designed to be used as a visual aid for teachers with overcroweded classrooms to detect students who look confused. I designed and trained a decision tree algorithm from scratch over the weekend, which performed at an approximately 90% accuracy rate. Our hack won the prize for Best Use of Data Visualization at DubHacks. Source code available on devpost and github. Click here for more details.
  • STEMLabs, HackMIT, Sep 2016
    Our team created a Hololens hack designed as a visual aid for students to visualize tough concepts in STEM. I worked on the animation scripts written in C#. We were one of the Top 10 teams during final presentation. Click here for more details and demo video.
  • Mortgage-Freeman, MLH Prime, Aug 2016
    Mortgage-Freeman is a web app that recommends mortgage plans based on the user's credit card history. I worked on designing and implementing the algorithms for calculating mortgage interest rates, monthly payments, and duration. I also optimized and reverse-engineered ordinary annuity formulas and performed regression analysis of U.S. 5/1 Adjustable mortgage rates using data from Quandl and python libraries like pandas and numpy. Our hack was runner-up for Best Use of Capital One's API and runner-up for Best Developer Tool, presented by GitHub. More details here.
For a complete list, please visit my Devpost page.

More About Me

Yay you're still here! :)

Besides mathematics, I also enjoy singing, cooking, anime, and salsa dancing. Some of my favorite cooking channels are JunsKitchen, Food Wishes, and How To Cake It.

I am a hobby photographer. Check out my work on 500px.

Oh and I have a weakness for cute animals, especially this cat.

Epilogue

"I'd like to thank my mama for raising me so well."
-- Rosie Zou, from her Turing Award acceptance speech, 30 years from now



Rosie Zou. All rights reserved.