ERAI News

Hyper-Extract bien RAG kieu mau thanh CLI trich xuat tri thuc co cau truc

Python 1.8k stars 2 giờ trước
Hyper-Extract bien RAG kieu mau thanh CLI trich xuat tri thuc co cau truc

Điểm nổi bật

  • Stars: 1,797 stars va 124 stars tang them trong bang Trending hom nay.
  • Cau truc tri thuc: ho tro 8 kieu du lieu tu list, set den graph, hypergraph va spatio-temporal graph.
  • Kho template: co 80+ YAML templates cho finance, legal, medical, industry va general.
  • Dong goi san pham: cung cap CLI he parse, he search, he show va kha nang chay local voi vLLM.

Biểu đồ

flowchart LR A[Tai lieu phi cau truc] --> B[Hyper-Extract CLI] B --> C[Template YAML] B --> D[LLM va embedder] C --> E[Graph hoac hypergraph] D --> E E --> F[Search va visualize]

Tóm tắt

Hyper-Extract noi bat khong phai vi tuyen bo "RAG tot hon", ma vi no dong goi mot bai toan kho thanh cong cu de dung ngay. Thay vi yeu cau team tu ghep extractor, schema, storage va visualization, repo dua ra mot duong day lien lac: chon template, parse tai lieu, sinh knowledge abstract co kieu du lieu ro rang, roi truy van va xem truc quan.

Gia tri cua du an nam o viec no bien phan "schema design" thanh tai san co the tai su dung. Voi nhung doi muon bien bao cao, hop dong, nghien cuu hay ghi chu noi bo thanh tri thuc cau truc, day la mot huong thuc dung hon nhieu so voi cach chi nem chunk vao vector store roi ky vong prompt se giai quyet phan con lai.

Chi tiết

README cua Hyper-Extract rat ro ve dinh vi: day la mot framework trich xuat tri thuc bang LLM, nhung muc tieu khong dung o entity extraction co ban. Du an day tham vong sang mot tap doi tuong co kieu du lieu manh hon, gom graph, hypergraph, temporal graph va spatio-temporal graph. Dieu do quan trong vi no thay doi cach doi san pham nghi ve "nho" cua AI. Thay vi mot kho chunk de retrieve, ban co mot tap quan he, thuc the va dinh danh co the tien hoa theo thoi gian.

Lop gia tri lon nhat cua repo nam o bo template. Hon 80 mau YAML duoc chia theo domain, cho phep user nhay thang vao cac bai toan cu the nhu earnings graph, academic graph hay biography graph ma khong can tu thiet ke schema tu dau. Voi cac team khong co qua nhieu hieu biet ve knowledge engineering, day la mot cach cat giam do ma sat rat lon. Template cung giup tao tinh on dinh cho output, mot yeu to song con neu du lieu trich xuat se duoc dua vao workflow xep hang, kiem tra hoac ra quyet dinh.

Du an cung co cach tiep can hop ly voi triet ly local-first. README dua thang vi du chay Qwen3.5-9B va bge-m3 qua vLLM, nghia la tai lieu co the duoc xu ly on-premise. O giai doan ma doanh nghiep bat dau dong boi toan AI governance va data residency, thong diep nay rat hop ngu canh. Hyper-Extract khong bi khoa vao mot nha cung cap model duy nhat; no dua structured output capability len lam giao dien chinh.

Ve ky thuat, ba lop Auto-Types, Methods va Templates la cach to chuc rat de hieu. Auto-Types dinh nghia kieu du lieu dich. Methods cho biet dung extraction engine nao. Templates gom schema va quy uoc domain. Su tach lop nay bien repo thanh nen tang hon la mot bo script. Day la ly do no dang co co hoi len thanh mot "middleware" cho nhung doi xay graph memory, doc intelligence hoac compliance pipeline.

Diem can than trong la du an van phu thuoc nang vao chat luong structured output cua model va su dung schema dung nganh. Neu team chon template sai hoac ky vong extract qua muc, ket qua van co the dep nhung mong. Dù vậy, doi voi slot 9h nay, Hyper-Extract xung dang duoc xem la mot trong nhung repo AI dang hot nhat vi no bien bai toan tri thuc co cau truc thanh mot thao tac CLI thuc su su dung duoc.

Nguồn

© 2024 AI News. All rights reserved.