Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
kirch
's Collections
Scotch & SOTA 🥃 Pt. 1: Big Boi LLM 🚛
Scotch & SOTA 🥃 Pt. 2: Quantized Small Boi LLM 👉👈
Scotch & SOTA 🥃 Pt. 3: Image Sorcery 🔮
Scotch & SOTA 🥃 Pt. 4: Pre-Training Datasets 📜
Scotch & SOTA 🥃 Pt. 5: Instruction Tuning Datasets 👩🏫
Scotch & SOTA 🥃 Pt. 6: Dialogue Tuning Datasets 💬
Scotch & SOTA 🥃 Pt. 7: Human Feedback Datasets 🫣
Scotch & SOTA 🥃 Pt. 4: Multi-Modal 🔀
Scotch & SOTA 🥃 Pt. 4: Pre-Training Datasets 📜
updated
Mar 2
We gotta start somewhere, these jsonl's aren't gonna train themselves.
Upvote
-
allenai/dolma
Updated
Apr 17, 2024
•
2.76k
•
1.02k
allenai/peS2o
Updated
Oct 13, 2024
•
8.72k
•
196
tiiuae/falcon-refinedweb
Viewer
•
Updated
Jun 20, 2023
•
968M
•
41.8k
•
904
CarperAI/pilev2-dev
Preview
•
Updated
Mar 13, 2023
•
12
•
26
AlgorithmicResearchGroup/arxiv_cplusplus_research_code
Viewer
•
Updated
Sep 4, 2024
•
1.63M
•
47
•
9
bigcode/the-stack
Viewer
•
Updated
Apr 13, 2023
•
546M
•
12.2k
•
982
bigcode/starcoderdata
Viewer
•
Updated
May 16, 2023
•
207M
•
27.6k
•
494
euirim/goodwiki
Viewer
•
Updated
Sep 11, 2023
•
44.8k
•
169
•
54
nampdn-ai/tiny-textbooks
Viewer
•
Updated
Jul 3, 2024
•
420k
•
477
•
168
nampdn-ai/tiny-codes
Viewer
•
Updated
Sep 30, 2023
•
1.63M
•
2.06k
•
287
roneneldan/TinyStories
Viewer
•
Updated
Aug 12, 2024
•
2.14M
•
97.5k
•
956
nampdn-ai/tiny-bridgedict
Viewer
•
Updated
Aug 4, 2023
•
17.6k
•
10
•
18
nampdn-ai/tiny-webtext
Viewer
•
Updated
Aug 27, 2023
•
2.32M
•
100
•
34
Upvote
-
Share collection
View history
Collection guide
Browse collections