Isaacus

Team

company

Verified

https://isaacus.com/

AI & ML interests

We build legal AI models for legal tech firms that help them ship smarter products, faster.

Recent Activity

umarbutler published an article 14 days ago

Introducing AI chunking to semchunk

abdurrahmanbutler updated a dataset 14 days ago

isaacus/legal-rag-qa

umarbutler updated a Space 21 days ago

isaacus/README

View all activity

Papers

Legal RAG Bench: an end-to-end benchmark for legal RAG

The Massive Legal Embedding Benchmark (MLEB)

View all Papers

Articles

Introducing AI chunking to semchunk

14 days ago

•

Kanon 2 Reranker: the most powerful reranker for legal RAG

27 days ago

•

Introducing Kanon 2 Enricher — the world’s first hierarchical graphitization model

Mar 3

•

posted an update 14 days ago

Post

143

Isaacus just shipped a major update to semchunk: AI-powered chunking based on a document’s knowledge graph representation⚡

This isn’t a tweak on existing semantic chunking. It’s an entirely new paradigm, built on hierarchical document segmentation rather than heuristics or standard embedding-based semantic approaches.

We benchmarked our AI chunking mode across a full RAG pipeline against popular alternatives like LangChain, Chonkie, and our own non-AI semantic chunker. The results were clear: semchunk’s AI mode delivered a 15% relative improvement in RAG correctness over Chonkie. It also produced more aesthetically coherent and readable when judged by a human evaluator while also being faster than all other chunking methods when run on a consumer PC.

These gains are powered by Isaacus' Kanon 2 Enricher model, which performs hierarchical document segmentation and directly powers our AI chunking mode.

As far as we know, semchunk is one of the first chunking libraries to offer true AI-powered, hierarchical-segmentation-based chunking, and the results show how much better RAG can get when chunking improves.

https://huggingface.co/blog/isaacus/introducing-ai-chunking-to-semchunk

umarbutler

published an article 14 days ago

Article

Introducing AI chunking to semchunk

14 days ago

•

abdurrahmanbutler

updated a dataset 14 days ago

isaacus/legal-rag-qa

Viewer • Updated 14 days ago • 328 • 138 • 3

umarbutler

posted an update 15 days ago

Post

4819

Isaacus, the AI research company building legal superintelligence, is hiring!

We're looking for passionate engineers who love to build and tinker and want to have an impact on the world. Specifically, we're hiring:
• ML engineers (Australia).
• Data engineers (Australia).
• Full-stack engineers (Australia).
• DevRel engineers (Australia, San Francisco, and London).
• DevOps engineers (Australia, San Francisco, and London).

If you'd like to be a founding employee at one of the few VC-backed LLM research labs in the world, receive generous equity compensation, and work alongside other highly motivated, highly skilled engineers, get in touch: https://isaacus.com/careers

umarbutler

updated a Space 21 days ago

Isaacus

🚀

umarbutler

updated a collection 23 days ago

Open Legal Data

Collection

A collection of our favorite open-source legal datasets on Hugging Face. • 15 items • Updated 23 days ago • 7

umarbutler

published a dataset 23 days ago

isaacus/legal-rag-qa

Viewer • Updated 14 days ago • 328 • 138 • 3

umarbutler

updated a dataset 23 days ago

isaacus/legal-rag-qa

Viewer • Updated 14 days ago • 328 • 138 • 3

abdurrahmanbutler

posted an update 27 days ago

Post

192

Isaacus just shipped a new state-of-the-art model, this time focused on reranking for legal RAG.

Although Kanon 2 Embedder already represents the frontier of legal-domain retrieval, we know that not everyone is ready to re-embed their entire corpus. We also knew there was still accuracy left on the table for teams handling highly sensitive legal work.

Enter Kanon 2 Reranker: the world’s best legal reranking model.

We tested it across both production RAG pipelines and standalone retrieval tasks, and the results were remarkable.

Not only does it outperform the competition in a category where there are still very few serious alternatives, it also delivers major retrieval accuracy gains over our standalone embedder. Those improvements translated into exceptional downstream performance.

In our final test, we compared Voyage AI by MongoDB 2.5 Rerank with Kanon 2 Reranker on Legal RAG Bench, using identical embedding models, generative models, and pipeline hyperparameters. The only difference was the reranker.

The result: Kanon 2 Reranker decisively outperformed Voyage 2.5 Rerank.

On holdout questions, the head-to-head margin was one of the most extreme we have seen: for every 1 question Voyage got right and we got wrong, there were 6 questions we got right and Voyage got wrong.

We share an example in the blog post where Voyage Rerank actually underperforms Kanon 2 Embedder on its own, delivering the wrong context to the LLM. In that case, not using a reranker at all would have led to the correct answer.

All in all, I’m immensely proud of the performance gains we’ve achieved.
But as we always say, the best benchmark is your own data.

So redeem your free credits, give Kanon 2 Reranker a try, and see firsthand the difference our models can make:
https://huggingface.co/blog/isaacus/kanon-2-reranker

umarbutler

published an article 27 days ago

Article

Kanon 2 Reranker: the most powerful reranker for legal RAG

27 days ago

•

umarbutler

updated a dataset 29 days ago

isaacus/legal-rag-bench

Viewer • Updated 29 days ago • 4.98k • 753 • 16

umarbutler

updated a dataset 30 days ago

isaacus/open-australian-legal-corpus

Viewer • Updated 30 days ago • 147k • 2.99k • 88

umarbutler

posted an update about 1 month ago

Post

1942

This awesome visualization by @abdurrahmanbutler tracks how reliant the High Court of Australia has been on UK precedents over time.

Back in the early 1900s, up to 70% of citations in High Court decisions were from the UK. Today, that number sits around 20%.

This change seems to have happened gradually as Australia gained more and more independence from the UK, culminating in the Australia Acts of 1986, where we see a nice bump in the proportion of Australian cases cited.

These insights would not be possible without our latest legal AI model, Kanon 2 Enricher, which we used to extract dates and citations from High Court decisions in isaacus/open-australian-legal-corpus and categorize citations by jurisdiction. You can learn about Kanon 2 Enricher here: https://isaacus.com/blog/kanon-2-enricher.

abdurrahmanbutler

updated a dataset about 1 month ago

isaacus/high-court-of-australia-cases

Viewer • Updated about 1 month ago • 8.1k • 74 • 3

umarbutler

updated a dataset about 1 month ago

isaacus/high-court-of-australia-cases

Viewer • Updated about 1 month ago • 8.1k • 74 • 3

abdurrahmanbutler

posted an update about 1 month ago

Post

2581

🚀 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗞𝗮𝗻𝗼𝗻 𝟮 𝗘𝗻𝗿𝗶𝗰𝗵𝗲𝗿: 𝘁𝗵𝗲 𝘄𝗼𝗿𝗹𝗱’𝘀 𝗳𝗶𝗿𝘀𝘁 𝗵𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗴𝗿𝗮𝗽𝗵𝗶𝘁𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹

Today we’re publicly releasing Kanon 2 Enricher, and with it, an entirely new class of AI model that we’re calling a hierarchical graphitization model.
This is fundamentally different from both universal extraction models and generative models.

As a hierarchical graphitization model, Kanon 2 Enricher natively outputs a 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗴𝗿𝗮𝗽𝗵 rather than tokens, which makes it architecturally incapable of hallucinating or inventing text that wasn’t present in the input.

What that enables in practice is unlike any other model or ML architecture on the market:

• 𝗡𝗼 𝗵𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗶𝗼𝗻𝘀 🤖
It cannot hallucinate. All references and links are stored as spans, meaning exact character offsets anchored to the original text.

• 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝘀𝗲𝗴𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻, 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 📑
It deconstructs a document’s full nested hierarchy, down to chapters, sections, clauses, schedules, signatures, and even singular sentences, and classifies each span with dozens of contextual features.

• 𝗘𝗻𝘁𝗶𝘁𝘆 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻, 𝗱𝗶𝘀𝗮𝗺𝗯𝗶𝗴𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗹𝗶𝗻𝗸𝗶𝗻𝗴 🔗
It resolves what references actually point to, then links entities, citations, and cross-references into a single coherent graph.

• 𝗚𝗿𝗮𝗽𝗵-𝗳𝗶𝗿𝘀𝘁 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 🏃‍➡️
Small enough to run locally on a consumer PC with sub-second latency, and it stays reliable on long documents where front

To read more about our new model, check out our latest Hugging Face article:
https://huggingface.co/blog/isaacus/introducing-kanon-2-enricher