Systems. Robotics. Hardware. AI
An end-to-end ML compiler that lowers MobileNetV2 through 4 IR layers with custom CUDA kernels, delivering a 4.67× inference speedup over PyTorch eager.
Developed a high-performance RISC-V processor featuring advanced pipelining, branch prediction, and caching mechanisms.
A high-performance, multi-threaded game engine crafted in C++ with Lua scripting and advanced physics.
A portable cloth piano using capacitive cloth and pressure-sensitive gloves.
Algorithms for motion planning, kinematics, and control systems in robotics.
Custom system calls, Unix shell, and memory management in C++.
Implementation of neural networks and reinforcement learning algorithms in Python.
Video CDN implementation and network diagnostic tools.