본문 바로가기

LLM1

[논문 리뷰] AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning ServingAbstract model parallelism can be additionally used for the statistical multiplexing of multiple devices.We explore the new trade-off space and present a novel serving system. BackgroundCloud serving system should satisfy the SLO on latencyIntra-operator parallelismsingle operator is partitioned across multiple d.. 2025. 3. 2.

이전 1 다음

티스토리툴바