Charles Lo |

Digital Hardware and Software Engineer
e-mail: charles at charleslo dot net

Research

I am interested in hardware design, reconfigurable computing and machine learning. In the past I have designed high-performance and multi-FPGA accelerators for Object Detectors and neural networks. I was also involved with the early development of the Xilinx SDAccel tools for integrating FPGA accelerators in the OpenCL heterogeneous computing framework. Most recently, I have been applying Gaussian processes in Bayesian optimization techniques to perform design-space exploration of High-Level Synthesis and other hardware generators.

Other Projects

Flexible Gaussian process Library GitHub

Python 3 software used in research to describe hierarchical Gaussian process models
Supports composing squared exponential and linear kernels
Uses Cython-based fast exponential library to compute squared exponential kernel

“Image Labelling using Feature Learning and Boltzmann Machine-Augmented CRFs,” ECE1510 Project Report, 2014
PDF

Evaluated the combination of Neural Networks, Conditional Random Fields (CRFs) and Restricted Boltzmann Machines for image labelling.
Image segmentation was performed to obtain superpixels which were inferred using a neural network.
A CRF was then used to smooth labelling across adjacent superpixels while a global RBM provided location-based labelling.
Results showed the benefit of combining the techniques, but superpixel-based classification held back performance.

“Heterogeneous Stream Computing in SAVI,” ECE1548 Project Report, 2013
PDF

Proposed a method of mapping streaming task graphs on to virtualized heterogeneous resources in a cloud environment.
Compute kernel management and routing inspired by Software Defined Networking to simplify global control.
Preliminary prototype designed with x86 virtual machines, virtualized FPGA kernels and OpenFlow.

“A High-Performance Architecture for Training Viola-Jones Object Detectors,” MASc Project, 2012 Thesis, Paper

Training an ensemble of decision trees is highly task parallel but not well suited for GPUs
Developed a PCIe-FPGA system targeting a Xilinx Virtex-6 device to accelerator this task using a systolic array architecture
Performance of the floorplanned array scaled linearly and outperformed the multi-threaded OpenCV implementation

“Nonlinear Dimensionality Reduction for Music Feature Extraction,” CSC2515 Project Report, 2010
PDF

2-D Visualization of Compressed Features

Experimented with PCA, Autoencoders, LLE and t-SNE as methods for compressing high-dimensional audio features for music genre classification.
Compression would allow for short feature tags to group together similar types of music in large databases.
Results found the t-SNE performed the best in maintaining neighbourhood structure.