PyTorch
Spanish
English
Mixture of Experts
mamba
ssm
reasoning
chain-of-thought

Aethelred-7B-v2.0 (In-Training)

⚠️ Status: Work in Progress

The model is still in training. Don't expect this to be a finished product or stable version yet. Current Loss: ~5.6 at step 15,000.

Architecture

  • Hybrid Core: Mamba-2 + Sliding Window Attention (Flash Attention 3)
  • MoE: 16 Experts (4 active)
  • Optimizer: GaLore + Sophia-G + Muon
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for J4HDx/Aethelred-7B-v2

Finetuned
(1111)
this model

Datasets used to train J4HDx/Aethelred-7B-v2