Add MMLU-Pro evaluation result (64.4)
#116 opened 5 days ago
by
burtenshaw
Add GSM8K evaluation result
#115 opened 17 days ago
by
burtenshaw
Add GSM8K evaluation result (89.3%)
#114 opened 17 days ago
by
burtenshaw
Add GSM8K evaluation result
#113 opened 17 days ago
by
burtenshaw
Add GSM8K evaluation result
#112 opened 17 days ago
by
burtenshaw
Production deployment considerations
1
#111 opened about 1 month ago
by
Cagnicolas
dememe4301
#110 opened 2 months ago
by
kubilayarikan
Update inference/model.py
#109 opened 3 months ago
by
Crossberry
Update README.md
#107 opened 6 months ago
by
reactkick
Remove redundant code
#106 opened 7 months ago
by
GloomScythe
MTP Integration: Unexpectedly High Loss with Loaded Weights
#105 opened 7 months ago
by
parambole
add AIBOM
👍
1
#104 opened 8 months ago
by
RiccardoDav
Update tokenizer_config.json
#101 opened 9 months ago
by
Akshay47
DeepSeek V3 model Bad Cases Genuine User Open Reviews and Comments Collection
#99 opened 9 months ago
by
DeepNLP
Make config params float to avoid warning in Transformers
#97 opened 10 months ago
by
Rocketknight1
Point to latest checkpoint
#96 opened 10 months ago
by
victor
how to convert model to bf16
#95 opened 11 months ago
by
Saicy
Update README.md
#94 opened 11 months ago
by
Alirezaaa123456
Deepseek V3
#93 opened 11 months ago
by
cybercyb
【Q】shared_head weights of MTP
👀
4
#92 opened 11 months ago
by
huang11
fix for transformers 4.49 compatibility.
1
#91 opened 11 months ago
by
katuni4ka
Update README.md
#90 opened 11 months ago
by
baishihao
无辅助损失专家偏置代码实现的小问题 A Small Issue in the Code Implementation of Auxiliary-Loss-Free Load Balancing Expert Bias
#89 opened 11 months ago
by
liyang31163150
Fix generation with latest transformers
#88 opened 11 months ago
by
kylesayrs
Add pipeline tag
#86 opened 12 months ago
by
nielsr
Some of the safetensor files are not marked as safe
#85 opened 12 months ago
by
tanmaylaud
Update README.md
#84 opened 12 months ago
by
MTayira
ValueError: Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float8_e4m3fn
#82 opened 12 months ago
by
ajtakto
Update README.md
#81 opened 12 months ago
by
deleted
Update README.md
#80 opened 12 months ago
by
zhup
Update README.md
#79 opened 12 months ago
by
zhup
chat
#77 opened 12 months ago
by
rojithonline
DeepSeek-V3-lite naming conventions?
❤️
1
6
#76 opened 12 months ago
by
AlphaGaO
torch.distributed.DistNetworkError
#75 opened 12 months ago
by
yu19920006607
remove reference to deprecated transformers code
2
#74 opened about 1 year ago
by
winglian
Update README.md
#73 opened about 1 year ago
by
SamimSaikia
DeepSeek R1 answer ChatGPT ??
😔
1
4
#72 opened about 1 year ago
by
valerebron
ValueError: Unrecognized configuration class <class 'transformers_modules.configuration_deepseek.DeepseekV3Config'> to build an AutoTokenizer.
11
#69 opened about 1 year ago
by
ajtakto
Paralelized script
#67 opened about 1 year ago
by
ajtakto
I am getting an error message while executing pip install - r requirements. txt
6
#64 opened about 1 year ago
by
yu19920006607
`aux_loss_alpha` should be 1e-4 instead of 1e-3?
#61 opened about 1 year ago
by
cuichenx
captcha not loading on edge
#60 opened about 1 year ago
by
leo-smi
Upload shreya.zip
#59 opened about 1 year ago
by
Msdthala
Upload IMG_20250111_184317.jpg
#58 opened about 1 year ago
by
Sajalhero
无辅助损失的专家路由
2
#56 opened about 1 year ago
by
qing9
AI Games
#55 opened about 1 year ago
by
ChickenUJHAYIUSGU
Upload IMG_0509 4.HEIC
#54 opened about 1 year ago
by
borhanrabbany
how to inference with mtp?
#53 opened about 1 year ago
by
duanyu