Developed OpenACC applications including NAS Parallel Benchmarks and Rodinia benchmarks.
Key contributor of OpenACC benchmarks in SPEC ACCEL V1.0 (see SPEC ACCEL V1.0).
OpenACC 1.0 Validation Suite
Designed and implemented a robust and scalable validation infrastructure to evaluate the correctness of OpenACC 1.0 in compilers.
Evaluated four compilers (CAPS, CRAY, PGI and OpenUH) with this suite.
Integrated the validation suite into the harness infrastructure of the TITAN supercomputer at Oak Ridge National Laboratory.
Intern HPC Developer at TOTAL E&P Research and Technology USA
Accelerated the Oil & Gas industry production-level seismic imaging application Kirchhoff Migration to GPUs using CUDA and OpenACC two GPU parallel programming models.
Analyzed the performance bottleneck and then improved the performance significantly by applying different optimizations.The final GPU version was more than 14x faster than the 10-core parallel CPU version.
Solved the critical issue of numerical inaccuracy in the application.
Intern HPC Developer at Repsol E&P USA
Parallelized the Oil & Gas industry production-level application Kirchhoff Demigration with the hybrid model of MPI and POSIX threads.
Designed and developed a framework to achieve dynamic load balancing in MPI applications.
Applied the same framework for parallel I/O which can reach the theoretical peak performance for parallel read, with Panasas parallel file system and RAID 1/5 data storage.
Proposed and implemented a strategy to solve the fault tolerance issue in MPI applications.
Ported the C++ version of Oil & Gas Reverse Time Migration (RTM) application with the hybrid model of POSIX threads, CUDA and MPI on GPU cluster to maximize both the intra-node and inter-node parallelism and performance.