Bilingual LMs ( L1 {es fr de pl tr ar zh} + L2 en ) trained on Cultura-X for L1 and FineWebEdu (L2)
Suchir Salhan
suchirsalhan
AI & ML interests
Multilinguality and Cognitively-Inspired AI. Tokenization, Pretraining, Interpretability & Alignment.
Recent Activity
updated a model 1 day ago
MultilingualUnigramLM/las-code-tokenizers-OLMo-2-1124-7B-ass_bat_c_plus27 published a model 1 day ago
MultilingualUnigramLM/las-code-tokenizers-OLMo-2-1124-7B-ass_bat_c_plus27 updated a dataset 2 days ago
MultilingualUnigramLM/The-Stack-10K