Letting Tensors Soar: R-Fork Accelerates Large Model Weight Loading
We introduce Tensor R-Fork, a novel weight loading method that leverages efficient cross-node device-to-device interconnects to achieve zero-copy tensor loading from running SGLang instances to new instances, reducing loading time from minutes to seconds.