Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
mpasila
's Collections
not very positive datasets
Finnish fine-tunes
Japanese2English datasets
ExLlamaV2 quantizations
Finnish Instruct Datasets
Pre-training dataset prep
Magnum used datasets
Pre-training dataset prep
updated
Oct 26, 2024
Some datasets I should probably use.
Upvote
-
Sort: Collection
JeanKaddour/minipile
Viewer
•
Updated
Jun 20, 2023
•
1.01M
•
4.27k
•
149
wikimedia/wikipedia
Viewer
•
Updated
Jan 9, 2024
•
61.6M
•
179k
•
1.26k
neuralwork/arxiver
Viewer
•
Updated
Nov 1, 2024
•
63.4k
•
2.97k
•
368
ohsuz/tiny-textbooks-edu
Viewer
•
Updated
Jun 11, 2024
•
3.31M
•
163
•
2
ohsuz/tiny-code-textbooks-edu
Viewer
•
Updated
Jun 11, 2024
•
1.84M
•
17
•
2
Upvote
-
Sort: Collection
Share collection
View history
Collection guide
Browse collections