πŸ¦… Earlybird: Fast & Accurate AI Text Detection

Earlybird is a lightweight, high-speed AI text detection model designed to classify text as either Human-Written or AI-Generated.

Built on the efficient DistilRoBERTa architecture, it was fine-tuned on the W.O.R.M. (Wait, Original or Machine) dataset.

⚑ Model Stats

  • Architecture: DistilRoBERTa (82M parameters)
  • Primary Task: Binary Classification (Human vs. AI)
  • Context Window: 512 Tokens
  • Inference Speed: <50ms (CPU) / <5ms (GPU)

πŸš€ Overview

Earlybird is designed for rapid, real-time detection. Unlike generative Large Language Models (LLMs) that are slow and resource-heavy, Earlybird uses a distilled encoder architecture. This allows it to process text in milliseconds, making it ideal for high-volume applications like content moderation, academic integrity checks, and spam filtering.

The model analyzes stylistic patterns, perplexity, and token transitions to determine if a text was written by a human or generated by models like GPT-4, Claude, Llama, or Mistral.

πŸ“š Training Data

Earlybird was trained on Mega-WORM, a unified dataset curated from four major open-source collections. The training data was rigorously filtered to ensure high-quality prose, focusing on texts with sufficient context (essays, blog posts, articles).

πŸ“Š Performance Benchmarks

The model excels at identifying AI-generated content in Medium and Long-form text (over 100 words). However, users should be aware of limitations regarding very short texts.

Detailed Length Breakdown

Text Category Word Count Accuracy Performance
Short Text <100 words 76.31% ⚠️ Weak
Medium Text 100 - 300 words 96.48% βœ… Excellent
Long Text 300+ words 95.01% βœ… Excellent

Overall Metrics

Metric Score
Overall Accuracy 89.43%

⚠️ Important Limitations

  • Short Text Instability: As shown in the benchmarks, the model's accuracy drops significantly (to ~76%) on texts under 100 words (e.g., short tweets, single sentences). It is not recommended for use on short social media comments without human review.
  • Context Requirement: The model relies on analyzing sentence structure and paragraph flow. Without enough words, it lacks the context needed to make a high-confidence prediction.
  • False Positives: Highly formal, academic human writing can occasionally be flagged as AI due to its rigid structure.
Downloads last month
100
Safetensors
Model size
82.1M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for noumenon-labs/Earlybird-fast

Finetuned
(743)
this model
Finetunes
1 model
Quantizations
2 models

Evaluation results