Dataset?

by 0xbitches - opened Apr 5, 2024

Apr 5, 2024

Hi, the model card claims the model is trained using publicly available datasets, but I cannot seem to find any references to what these datasets are. Are there plans to include the actual datasets used?

euclaise

Apr 5, 2024

The pie charts have each of the datasets listed

0xbitches

Apr 5, 2024

I see. I guess my question is then are there plans to organize the dataset and publish the selected subsets to make sure the model training is reproducible.

YikangS

JetMoE org Apr 5, 2024

Yes, we plan to give more details about the data selection and mixture in our technical report.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment