Jack Min Ong
AI & ML interests
Recent Activity
Organizations
Yea sure, here it is: https://github.com/huggingface/blog/pull/3309
Nice article! Was really informative to know how all the frameworks are thinking about the problem.
On the MoE LoRA part, prime-rl actually supports MoE LoRA as well: https://github.com/PrimeIntellect-ai/prime-rl/blob/main/src/prime_rl/trainer/models/layers/lora/multi_moe.py.
vLLM releases have supported per-expert LoRA loading and serving for MoEs for awhile: https://github.com/vllm-project/vllm/blob/2488a82f89b15ad2ebed12160dcc423d44210db2/vllm/lora/ops/triton_ops/fused_moe_lora_op.py#L158.
SGL has an unmerged PR to support MoE LoRAs: https://github.com/sgl-project/sglang/pull/14105.
Support for expert parallel inference with MoE LoRAs are currently in the works for both vLLM and SGL as far as im aware.
Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries


- +7