Resources

View closed (26)

Add MMLU-Pro evaluation result (64.4)

#116 opened 5 days ago by

burtenshaw

Add GSM8K evaluation result

#115 opened 17 days ago by

burtenshaw

Add GSM8K evaluation result (89.3%)

#114 opened 17 days ago by

burtenshaw

Add GSM8K evaluation result

#113 opened 17 days ago by

burtenshaw

Add GSM8K evaluation result

#112 opened 17 days ago by

burtenshaw

Production deployment considerations

#111 opened about 1 month ago by

Cagnicolas

dememe4301

#110 opened 2 months ago by

kubilayarikan

Update inference/model.py

#109 opened 3 months ago by

Crossberry

Update README.md

#107 opened 6 months ago by

reactkick

Remove redundant code

#106 opened 7 months ago by

GloomScythe

MTP Integration: Unexpectedly High Loss with Loaded Weights

#105 opened 7 months ago by

parambole

add AIBOM

👍 1

#104 opened 8 months ago by

RiccardoDav

deep

#103 opened 8 months ago by

xyall

sunde

#102 opened 9 months ago by

hjj1962

Update tokenizer_config.json

#101 opened 9 months ago by

Akshay47

DeepSeek V3 model Bad Cases Genuine User Open Reviews and Comments Collection

#99 opened 9 months ago by

DeepNLP

Make config params float to avoid warning in Transformers

#97 opened 10 months ago by

Rocketknight1

Point to latest checkpoint

#96 opened 10 months ago by

victor

how to convert model to bf16

#95 opened 11 months ago by

Saicy

Update README.md

#94 opened 11 months ago by

Alirezaaa123456

Deepseek V3

#93 opened 11 months ago by

cybercyb

【Q】shared_head weights of MTP

👀 4

#92 opened 11 months ago by

huang11

fix for transformers 4.49 compatibility.

#91 opened 11 months ago by

katuni4ka

Update README.md

#90 opened 11 months ago by

baishihao

无辅助损失专家偏置代码实现的小问题 A Small Issue in the Code Implementation of Auxiliary-Loss-Free Load Balancing Expert Bias

#89 opened 11 months ago by

liyang31163150

Fix generation with latest transformers

#88 opened 11 months ago by

kylesayrs

Add pipeline tag

#86 opened 12 months ago by

nielsr

Some of the safetensor files are not marked as safe

#85 opened 12 months ago by

tanmaylaud

Update README.md

#84 opened 12 months ago by

MTayira

ValueError: Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float8_e4m3fn

#82 opened 12 months ago by

ajtakto

Update README.md

#81 opened 12 months ago by deleted

Update README.md

#80 opened 12 months ago by

zhup

Update README.md

#79 opened 12 months ago by

zhup

chat

#77 opened 12 months ago by

rojithonline

DeepSeek-V3-lite naming conventions?

❤️ 1

#76 opened 12 months ago by

AlphaGaO

torch.distributed.DistNetworkError

#75 opened 12 months ago by

yu19920006607

remove reference to deprecated transformers code

#74 opened about 1 year ago by

winglian

Update README.md

#73 opened about 1 year ago by

SamimSaikia

DeepSeek R1 answer ChatGPT ??

😔 1

#72 opened about 1 year ago by

valerebron

ValueError: Unrecognized configuration class <class 'transformers_modules.configuration_deepseek.DeepseekV3Config'> to build an AutoTokenizer.

#69 opened about 1 year ago by

ajtakto

Paralelized script

#67 opened about 1 year ago by

ajtakto

I am getting an error message while executing pip install - r requirements. txt

#64 opened about 1 year ago by

yu19920006607

`aux_loss_alpha` should be 1e-4 instead of 1e-3?

#61 opened about 1 year ago by

cuichenx

captcha not loading on edge

#60 opened about 1 year ago by

leo-smi

Upload shreya.zip

#59 opened about 1 year ago by

Msdthala

Upload IMG_20250111_184317.jpg

#58 opened about 1 year ago by

Sajalhero

无辅助损失的专家路由

#56 opened about 1 year ago by

qing9

AI Games

#55 opened about 1 year ago by

ChickenUJHAYIUSGU

Upload IMG_0509 4.HEIC

#54 opened about 1 year ago by

borhanrabbany

how to inference with mtp?

#53 opened about 1 year ago by

duanyu