17 149 286

Travis King

travisking

AI & ML interests

have you heard of generative AI?

Recent Activity

upvoted a paper 1 day ago

TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

liked a model 8 days ago

nvidia/NitroGen

upvoted a paper 10 days ago

Are We on the Right Way to Assessing LLM-as-a-Judge?

View all activity

Organizations

None yet

upvoted a paper 1 day ago

TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

Paper • 2512.20757 • Published 9 days ago • 16

liked a model 8 days ago

nvidia/NitroGen

Updated 14 days ago • 426

upvoted a paper 10 days ago

Are We on the Right Way to Assessing LLM-as-a-Judge?

Paper • 2512.16041 • Published 14 days ago • 32

upvoted a paper 13 days ago

Hierarchical Dataset Selection for High-Quality Data Sharing

Paper • 2512.10952 • Published 21 days ago • 1

liked a dataset 15 days ago

nvidia/Nemotron-PII

Viewer • Updated 15 days ago • 200k • 1.68k • 47

upvoted 2 papers 17 days ago

Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems

Paper • 2512.11150 • Published 21 days ago • 4

BEAVER: An Efficient Deterministic LLM Verifier

Paper • 2512.05439 • Published 27 days ago • 35

New activity in mistralai/Devstral-Small-2-24B-Instruct-2512 17 days ago

base model inconsistent with architecture claims

#17 opened 17 days ago by

travisking

upvoted a paper 21 days ago

Towards a Science of Scaling Agent Systems

Paper • 2512.08296 • Published 23 days ago • 13

liked a model 21 days ago

Motif-Technologies/Motif-2-12.7B-Reasoning

Text Generation • 13B • Updated 20 days ago • 640 • 34

liked a Space 24 days ago

Evaluation Guidebook

📝

221

Display benchmark evaluation data for LLMs

liked 2 models 28 days ago

nvidia/Qwen3-Nemotron-32B-GenRM-Principle

Text Generation • 33B • Updated Oct 30, 2025 • 849 • 11

nvidia/Llama-3.3-Nemotron-70B-Reward-Principle

Text Generation • 71B • Updated Oct 30, 2025 • 72 • 5

upvoted 2 collections 29 days ago

Skywork-Reward-V2

Collection

Scaling preference data curation to the extreme • 9 items • Updated Jul 4, 2025 • 26

Reward Models 10-2025

Collection

A collection of great reward models for research and production • 7 items • Updated 9 days ago • 12

liked a Space 29 days ago

JudgeBench Leaderboard

🏆

Generate a leaderboard for evaluating language models

liked a dataset about 1 month ago

nex-agi/agent-sft

Preview • Updated 23 days ago • 555 • 101

upvoted a collection about 1 month ago

Olmo 3 Pre-training

Collection

All artifacts related to Olmo 3 pre-training • 10 items • Updated 9 days ago • 32

liked a dataset about 1 month ago

allenai/dolma3_longmino_mix-100B-1125

Updated Nov 24, 2025 • 76k • 6

liked a model about 1 month ago

p-e-w/gpt-oss-20b-heretic

Text Generation • 21B • Updated Nov 16, 2025 • 1.53k • 70

Travis King

AI & ML interests

Recent Activity

Organizations

travisking's activity

base model inconsistent with architecture claims

Evaluation Guidebook

JudgeBench Leaderboard