- Systems
- C++, CUDA, Python, Java — low-level optimization, memory management, performance profiling
- ML Infrastructure
- PyTorch (C++ extensions), TensorFlow, CUDA kernel development, NCCL, MPI, TensorRT
- Hardware
- Cache simulation, performance analysis (Nsight Compute, perf), Roofline modeling, memory hierarchy optimization
- Cloud & Tools
- AWS (Lambda, CDK, DynamoDB, S3, API Gateway, Bedrock, Cognito), Docker, Linux, Git, Next.js, React, Jest