Configuration Parsing Warning: Invalid JSON for config file config.json
Configuration Parsing Warning: Invalid JSON for config file tokenizer_config.json
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

legal-nli-roberta-base

Model Overview

The legal-nli-roberta-base model is a fine-tuned RoBERTa-Base model specialized in Legal Natural Language Inference (NLI). Its purpose is to analyze the logical relationship between two pieces of text from legal documents (e.g., a court ruling and a proposed amendment, or a contract clause and a statement of fact). It classifies the relationship into one of three standard NLI categories: Entailment, Contradiction, or Neutral.

Model Architecture

  • Base Model: RoBERTa-Base (A robustly optimized BERT approach).
  • Task: Sequence Classification (RobertaForSequenceClassification) on text pairs.
  • Mechanism: The model takes a pair of texts, the Premise (e.g., the contract clause) and the Hypothesis (e.g., the claim about the clause), separated by a special token. The Transformer encoder processes the concatenated input, and the final representation of the [CLS] token is passed to a classification head.
  • Labels:
    • Entailment (0): The Hypothesis must be true if the Premise is true.
    • Contradiction (1): The Hypothesis must be false if the Premise is true.
    • Neutral (2): The Hypothesis could be true or false.

Intended Use

  • Contract Review Automation: Checking if a new clause contradicts an existing term or if a statement of compliance is supported by the contract text.
  • Legal Research: Analyzing case law to determine if a new judgment supports or refutes a previous ruling or legal principle.
  • Compliance Checks: Verifying internal company policies against external regulations.

Limitations and Ethical Considerations

  • Domain-Specific Logic: Legal documents often rely on highly nuanced, domain-specific definitions and complex nested dependencies (e.g., "if A, then B, unless C is true"). The model may struggle with these deep, long-range logical structures.
  • No Substitute for Counsel: This model is a tool for rapid analysis, not a provider of legal advice. Final interpretation and decision-making must be handled by qualified legal professionals.
  • Data Bias: The model is trained on specific corpora of legal texts. If the data is biased towards certain jurisdictions or legal systems, performance on others will be poor.
  • Max Length: The model's context window is limited (max_position_embeddings=514). Long premises (e.g., entire contract sections) must be truncated, leading to potential loss of critical information.

Example Code

To analyze the relationship between a legal premise and a hypothesis:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "YourOrg/legal-nli-roberta-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Define Premise and Hypothesis
premise = "The supplier is obligated to deliver all specified goods no later than December 31, 2025, provided that the initial payment is received by October 1st."
hypothesis = "The supplier must deliver the goods by December 31, 2025, regardless of the payment date."

# Encode the text pair
encoded_input = tokenizer(
    premise, 
    hypothesis, 
    truncation=True, 
    padding=True, 
    return_tensors="pt"
)

# Inference
with torch.no_grad():
    outputs = model(**encoded_input)
    logits = outputs.logits

# Get the predicted label
predicted_class_id = logits.argmax().item()
predicted_label = model.config.id2label[predicted_class_id]

print(f"Premise: {premise[:50]}...")
print(f"Hypothesis: {hypothesis}")
print(f"NLI Relationship: **{predicted_label}**") 
# Expected Output: Contradiction (due to the "provided that" condition)
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support