Agent Runtime Engineer
Connects inference, tool use, memory, evals, world state and interface design into controllable AI systems.
Universal FAW Labs builds AI systems across runtime, interface, media, and embodied experiments. The studio direction is simple: make intelligence observable, steerable, and useful in the places where people and machines actually work together.
Open Sheng-Kai portfolio
Beyond HMI, I want to build interfaces designed for AI itself: tool surfaces where AI can call modules quickly, inspect state, and coordinate its own actions. This will be the next stage of my experiments.
Positioning: AI systems engineer for the 2026 agent stack. I work across inference runtime, MLX porting, agent orchestration, memory, evaluation, world-model state, and HMI so research models become fast, inspectable product systems.
Connects inference, tool use, memory, evals, world state and interface design into controllable AI systems.
Perform cross-framework GPU kernel optimization including PyTorch/CUDA to MLX ports, tensor layout, state_dict surgery, custom op implementation, Metal backend, parity validation and profiling.
Design and optimize LLM/agent inference pipelines with KV/prefix cache, chunked prefill, quantization, batching, warm services, and TTFT/decode benchmarks across local and cloud.
Task contracts, verifier loops, accepted / rejected memory, semantic recall, route tests, safety gates and replayable traces.
Develop latent planning systems for robots using visual state, 3DGS reconstruction, pose and motion modeling, afterstate prediction, diffusion repair, and compact scoring for embodied control.
CLI, Desktop, Browser, VSMONSTER, 2D / 3D FOCUS and shader controls for observable state, interruption, review and recovery.