Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
623.8
TFLOPS
7
20
33
Bowen Peng
bloc97
Follow
Fishtiks's profile picture
CommandPrompt's profile picture
Djsmartberry's profile picture
36 followers
·
3 following
bloc97
AI & ML interests
Machine Learning, Computer Graphics, Language Models
Recent Activity
liked
a model
8 days ago
ideogram-ai/ideogram-4-fp8
upvoted
a
paper
15 days ago
JLT: Clean-Latent Prediction in Latent Diffusion Transformers
upvoted
a
paper
24 days ago
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation
View all activity
Organizations
bloc97
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
commented
a paper
about 1 month ago
Efficient Pre-Training with Token Superposition
Paper
•
2605.06546
•
Published
May 7
•
46
•
8
New activity in
PsycheFoundation/consilience-40b-7Y9v38s5
11 months ago
fixed typo in readme
#1 opened about 1 year ago by
johnpotter
New activity in
NousResearch/Nous-Capybara-34B
over 2 years ago
How did you train this without going OOM in RAM & VRAM?
3
#15 opened over 2 years ago by
vicplus
New activity in
NousResearch/Yarn-Mistral-7b-128k
over 2 years ago
VRAM usage for full 128k tokens
7
#5 opened over 2 years ago by
Hypersniper
sliding_window = 131072? Sliding window attention doesn't work for 128?
1
#4 opened over 2 years ago by
keyishen
New activity in
NousResearch/Yarn-Llama-2-13b-64k
almost 3 years ago
Hardware requirements for the model.
2
#1 opened almost 3 years ago by
Sc0urge