https://huggingface.co/nightmedia/Qwen3-14B-Scientist-BF16

#1650

by nightmedia - opened 3 days ago

Discussion

nightmedia

3 days ago

Once again hopeful that it will work this time :)

https://huggingface.co/nightmedia/Qwen3-14B-Scientist-BF16

I added a few more models to make it fun

internlm/JanusCoder-14B
Azure99/Blossom-V6.3-14B
TeichAI/Qwen3-14B-Polaris-Alpha-Distill
TeichAI/Qwen3-14B-Gemini-3-Pro-Preview-High-Reasoning-Distill
TeichAI/Qwen3-14B-Claude-4.5-Opus-High-Reasoning-Distill
MegaScience/Qwen3-14B-MegaScience
Jasaxion/MathSmith-HC-Qwen3-14B-ShortCoT

RichardErkhov

3 days ago

Accept me, I will queue a bit later as I need to go somewhere right now, and queue is busy lol. If any more models let me know so I can queue when I come back

nightmedia

3 days ago

•

edited 3 days ago

Well, since you are willing, here is a MoE that would be nice in GGUF

https://huggingface.co/nightmedia/Qwen3-30B-A3B-Architect7

GAIR/SR-Scientist-30B
NousResearch/nomos-1
YOYO-AI/Qwen3-30B-A3B-YOYO-V2
YOYO-AI/Qwen3-30B-A3B-YOYO-V4
miromind-ai/MiroThinker-v1.0-30B

I get these metrics in MLX

mxfp4    0.551,0.692,0.876,0.749,0.422,0.794,0.691
qx64-hi  0.561,0.725,0.879,0.753,0.468,0.794,0.686
qx86-hi  0.563,0.737,0.878,0.758,0.448,0.803,0.698

qx86-hi  PPL: 4.392 ± 0.026

It might probably taper off by q3 to act like a normal MoE

RichardErkhov

3 days ago

nightmedia/Qwen3-14B-Scientist-BF16: no architectures entry (malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)")

architect7 was added a week ago apparently: https://hf.tst.eu/model#Qwen3-30B-A3B-Architect7-i1-GGUF

RichardErkhov

3 days ago

not sure why json thing happens, it seems like you already have it, perhaps something is wrong with formatting or forgot a comma somewhere?

nightmedia

3 days ago

•

edited 3 days ago

That’s okay, I will dig into why that happens, probably you have a strict set of rules and I am missing something… I’ll fix it.

nightmedia changed discussion status to closed 3 days ago

RichardErkhov

3 days ago

I mean I can try to force it, but because of the queue it will take a while to get an answer. Want to fix it or should I try to force it ?

nightmedia

3 days ago

No, give me a bit to double check. It might be my chat templates that I included—copy-pasted it, and might contain an “atom of garbage” :)
If I have the jinja template, there’s no need to put it in the config

nightmedia

3 days ago

I was also tired last night, I meant the other Architect
https://huggingface.co/nightmedia/Qwen3-30B-A3B-Architect5

This is a version without nomos, different performance profile

The MoEs had no issues, mergekit has problems with dense models… and probably me doing way too many things at once

nightmedia

3 days ago

Ok, I removed the chat templates from the tokenizer_config.json, it should be fine now, let’s try again 🙏😀

RichardErkhov

3 days ago

https://huggingface.co/nightmedia/Qwen3-30B-A3B-Architect5

HTTP 404, please make it public =)

nightmedia/Qwen3-14B-Scientist-BF16: no architectures entry (malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)")

that's so weird, let's ask @nicoboss

nightmedia

3 days ago

I found the dtype was set to float32, probably got carried over from a merge
Changed it to
"dtype": "bfloat16",

nicoboss

3 days ago

@RichardErkhov This is probably a side effect from queuing a gated model. The JSON syntax validator doesn't use the HuggingFace token you provide and so tries to validate the 404 error as JSON and complains. If you provide a Huggingface token you know has access to the model it is save to ignore it and simply force add it. Should the JSON really be invalid it will simply fail after downloading and you can review the issue using llmc audit. If the JSON is invalid the HuggingFace model card will usually also put a warning on top as it does for all the latest Nemotron models where NVidia puts a corrupted JSON on purpose despite knowing everyone hates them for doing so. Please also make sure to specify nico1 as worker for gated models as otherwise importance matrix computation will fail if it gets queued to rich1 or marco as thouse expect nico1 to redownload the model for importance matrix computation for which the HuggingFace token would be missing on nico1.

RichardErkhov

3 days ago

Ah yes, I forgot the token! How could I ! Ok 5 minutes I queue with token

RichardErkhov

3 days ago

•

edited 3 days ago

Added both models, let's see how it goes

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment