ERAI News

slime - Khung RL post-training native cho LLM va agentic workflow

Python 6.5k stars 2 giờ trước
slime - Khung RL post-training native cho LLM va agentic workflow

Điểm nổi bật

  • Do moi trong slot: repo updated_at luc 20:58 va pushed_at luc 20:07 gio Sai Gon, nam tron trong cua so quet 15h-21h.
  • Do pho bien: 6,501 stars, 941 forks va 195 stars hom nay tren GitHub Trending Python.
  • Pham vi he thong: noi Megatron voi SGLang, bo sung Data Buffer, rollout, reward, verifier va workflow agentic ma khong tach thanh nhieu framework roi rac.
  • Bang chung production: README cho biet slime dung sau cac dot post-training cua GLM-4.5 den GLM-5.2, dong thoi lam nen cho nhieu he thong RL agentic khac.

Biểu đồ

flowchart LR A[Prompt va du lieu] --> B[Data Buffer] B --> C[Rollout SGLang] C --> D[Reward va Verifier] D --> E[Training Megatron] E --> F[Dong bo tham so] F --> C

Tóm tắt

slime la mot trong nhung repo dang noi bat nhat cua hom nay neu nhin tu goc do ha tang huan luyen cho AI. Thay vi tiep can RL post-training nhu mot bo script chay mot lan, du an nay tuyen bo ro rang no la mot framework native cho duong ong Megatron + SGLang, tap trung vao RL scaling, data generation va cac workload agentic co tool, verifier va multi-turn rollout.

Dieu lam slime dang chu y khong chi la muc tang star, ma la thoi diem cap nhat nam ngay trong cua so slot toi. No cho thay repo van dang duoc day phat trien, trong khi noi dung README lai co kha nang noi chuyen truc tiep voi nhu cau cua thi truong: lam sao de bien reinforcement learning tu mot tap demo thu nghiem thanh mot ha tang co the debug, profil, tai lap va van hanh lau dai.

Chi tiết

Trong lan song mo hinh ma nguon mo, phan lon su chu y thuong dan cho checkpoint moi. Nhung neu nhin tu goc do nang luc canh tranh ben duoi, gia tri that lai nam o framework post-training. slime danh thang vao lop nay. README cua no mo ta hai nang luc cot loi: huan luyen hieu nang cao bang cach noi Megatron voi SGLang, va tao du lieu linh hoat thong qua cac giao dien data generation va rollout engine. Diem hay la du an khong co gang "om" tat ca backend duoi mot lop abstraction mong manh. Nguoc lai, no chu y toi mot duong di rat ro: toi uu sau cho Megatron + SGLang.

Lua chon nay co ly do chien luoc. Nhiem vu RL cho LLM va agentic workflow thuong vo cung de vo mong neu bo tri he thong qua nhieu lop wrapper. slime giu control surface cua Megatron va SGLang gan voi upstream, cho phep doi ngu dung truc tiep argument, topo, PD disaggregation, delta weight sync va external rollout engines. Dieu do rat quan trong voi nhom can toi uu throughput, cache, route va dong bo tham so tren cum GPU that, thay vi chi can mot API "de dung".

Mot diem manh khac la slime khong dong agentic RL thanh mot nhanh rieng. Trong README, cac vi du multi-agent, search, fully async va coding agent RL deu di qua cung data buffer va rollout loop. Cach thiet ke nay giup nhieu workload khac nhau chia se cung mot hat nhan huan luyen, thay vi moi nhom bai toan lai dung mot framework rieng. Tu goc nhin doanh nghiep va research infra, day la dau hieu cua mot he thong co kha nang mo rong lau dai hon.

Bang chung "battle-tested" cung khong chi la loi marketing. Repo neu ro no dung sau cac dot post-training GLM-4.5 den GLM-5.2, va liet ke them he sinh thai mo rong nhu Miles, vime, Relax, OpenClaw-RL, TritonForge va qqr. Nghia la slime dang dan tro thanh mot substrate chung cho nhieu huong RL khac nhau, tu post-training model lon den toi uu coding agent hoac verifiable environments.

Tat nhien, slime khong phai du an cho so dong nguoi dung rong. No yeu cau nang luc van hanh GPU, hieu Megatron, SGLang va du lieu RL. Nhưng chinh vi the, viec repo tiep tuc duoc cap nhat sat gio quet va nam trong top Trending Python la tin hieu dang chu y: cuoc dua open-source dang tien tu "model nao moi" sang "ha tang nao giup huan luyen va cai tien model/agent nhanh hon, on dinh hon". Trong bo ba candidate toi nay, slime la du an co do moi xac thuc nhat trong cua so 6 gio.

Nguồn

© 2024 AI News. All rights reserved.