J4HDx
/

Aethelred-7B-v2

Mixture of Experts

chain-of-thought

Model card Files Files and versions

Aethelred-7B-v2.0 (In-Training)

⚠️ Status: Work in Progress

The model is still in training. Don't expect this to be a finished product or stable version yet. Current Loss: ~5.6 at step 15,000.

Architecture

Hybrid Core: Mamba-2 + Sliding Window Attention (Flash Attention 3)
MoE: 16 Experts (4 active)
Optimizer: GaLore + Sophia-G + Muon

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for J4HDx/Aethelred-7B-v2

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Finetuned

(1111)

this model

Datasets used to train J4HDx/Aethelred-7B-v2