Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three …
arXiv:2604.02344v1 Announce Type: new Abstract: WebGPU's security-focused design imposes per-operation validation that compounds across the many small dispatches in neural network inference, yet the true …
J\k{e}drzej Maczan
3 views