JINSOO LLM 1B - Korean Language Model

๋ชจ๋ธ ์„ค๋ช…

๊ตญ๋ฆฝ์ฐฝ์›๋Œ€ํ•™๊ต IBDP ์—ฐ๊ตฌ์‹ค์—์„œ ์‹คํ—˜์šฉ์œผ๋กœ ๊ฐœ๋ฐœํ•œ ํ•œ๊ตญ์–ด 1B ํŒŒ๋ผ๋ฏธํ„ฐ ์–ธ์–ด BASE ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๊ฐœ์ธ ํ”„๋กœ์ ํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ์–ด๋ ค์›€์œผ๋กœ ํ† ํฌ๋‚˜์ด์ €๋Š” beomi/KoAlpaca-Polyglot-5.8B ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ ์™ธ์—๋Š” PyTorch๋กœ ์ง์ ‘ ๊ตฌํ˜„ํ•œ Transformer ๊ธฐ๋ฐ˜ ๋””์ฝ”๋” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์œ„์น˜ ์ธ์ฝ”๋”ฉ์˜ ์˜ํ–ฅ์„ ๋ณด๊ธฐ ์œ„ํ•ด RoPE๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ์•„๋ฌด๊ฒƒ๋„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ V2๋ฒ„์ „์—์„œ๋Š” ์œ„์น˜ ์ธ์ฝ”๋”ฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ SFT๋ฅผ ์œ„ํ•œ ์™„์ „ํ•œ BASE๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

ํ•™์Šต ํ† ํฐ์€ ์•ฝ ~13B ์œผ๋กœ ์•ฝ๊ฐ„ ์–ธ๋”ํŠธ๋ ˆ์ด๋‹ ๋˜์—ˆ์œผ๋ฉฐ ๋งˆ์ง€๋ง‰์— ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋กœ ์ถœ๋ ฅ ๋ถ„ํฌ๋งŒ ์‚ด์ง ๋‹ค๋“ฌ์—ˆ์Šต๋‹ˆ๋‹ค

  • RMSNorm: ํšจ์œจ์ ์ธ ์ •๊ทœํ™” ๋ ˆ์ด์–ด
  • SwiGLU: ๊ฐœ์„ ๋œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜ (FFN multiplier = 3x)
  • Flash Attention: ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์  ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜
  • Weight Tying: ์ž„๋ฒ ๋”ฉ/์ถœ๋ ฅ ๋ ˆ์ด์–ด ๊ฐ€์ค‘์น˜ ๊ณต์œ 
  • Causal Masking: ์ž๊ธฐํšŒ๊ท€์  ํ…์ŠคํŠธ ์ƒ์„ฑ

๋ชจ๋ธ ๊ตฌ์กฐ

Parameters: ~1.37B
Architecture: Custom decoder-only transformer (PyTorch ๊ตฌํ˜„)

Hyperparameters:
  hidden_size: 2048
  num_hidden_layers: 24
  num_attention_heads: 16
  intermediate_size: 6144
  max_position_embeddings: 2048
  vocab_size: 30004

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

๊ธฐ๋ณธ ์ถ”๋ก 

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# ๋ชจ๋ธ ๋ฐ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
model = AutoModelForCausalLM.from_pretrained(
    "DokHee/jinsoo-llm-1b-korean",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("DokHee/jinsoo-llm-1b-korean")

# ํ…์ŠคํŠธ ์ƒ์„ฑ
prompt = "1,2,3,4"
# response : 1,2,3,4,5,6,7,8,9,10,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78


# prompt = "์‚ฌ๋žŒ์˜ ์‹ฌ์žฅ๋ณ‘์€ "
# response : ์‚ฌ๋žŒ์˜ ์‹ฌ์žฅ๋ณ‘์€ ฮฒ ์•„๋ฏธ๋…ธ๊ณ„ ์˜ ๋ฉด์—ญ์ฒด๊ณ„๊ฐ€ ๊ณผ๋„ํ•˜๊ฒŒ ํ™œ์„ฑํ™”๋˜์–ด ์žˆ์–ด ๋…ธํ™”๊ฐ€ ์ง„ํ–‰๋˜๋ฉด ์„ธํฌ๊ฐ€ ์ฃฝ๊ฒŒ ๋˜๊ณ , ์ด๋Š” ์„ธํฌ๊ฐ€ ์ฃฝ๊ฒŒ ๋˜์–ด ์„ธํฌ๊ฐ€ ์ฃฝ๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด๋‹ค.
# ์ด๋Ÿฌํ•œ ๋…ธํ™”๋Š” ์œ ์ „์ , ํ™˜๊ฒฝ์ , ์‚ฌํšŒ์  ์š”์ธ๋“ค๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋ฉฐ, ์ด๋Ÿฌํ•œ ํ™˜๊ฒฝ์  ์š”์ธ๋“ค์€ ๋…ธํ™” ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋‹ค์–‘ํ•œ ์š”์ธ๋“ค์„ ํฌํ•จํ•œ๋‹ค.
# ๋…ธํ™” ๊ณผ์ •์—์„œ๋Š” ์œ ์ „์ , ํ™˜๊ฒฝ์  ์š”์ธ๋“ค๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ์œ ์ „์  ์š”์ธ๋“ค์ด ๋ณตํ•ฉ์ ์œผ๋กœ ์ž‘์šฉํ•˜์—ฌ ๋…ธํ™” ๊ณผ์ •์„ ๊ฐ€์†ํ™”์‹œํ‚ค๊ฒŒ ๋œ๋‹ค.
# ์œ ์ „์  ์š”์ธ๋“ค์€ ์œ ์ „ ... 

inputs = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.56,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ํ•™์Šต ์ •๋ณด

  • ํ† ํฌ๋‚˜์ด์ €: beomi/KoAlpaca-Polyglot-5.8B (PAD ํ† ํฐ ์ถ”๊ฐ€)
  • ํ•™์Šต ํ™˜๊ฒฝ: NVIDIA GPU, PyTorch 2.7.0
  • ํ˜ผํ•ฉ ์ •๋ฐ€๋„: BFloat16
  • ์ตœ์ ํ™”: AdamW (lr=1e-5, weight_decay=0.01)
  • ์ปจํ…์ŠคํŠธ ๊ธธ์ด: 2048 tokens

์ œํ•œ์‚ฌํ•ญ

  • ์ด ๋ชจ๋ธ์€ ํ•œ๊ตญ์–ด์— ํŠนํ™”๋˜์–ด ์žˆ์œผ๋ฉฐ, ๋‹ค๋ฅธ ์–ธ์–ด์—์„œ๋Š” ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
  • 1B ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋Œ€ํ˜• ๋ชจ๋ธ ๋Œ€๋น„ ๋ณต์žกํ•œ ์ถ”๋ก  ๋Šฅ๋ ฅ์ด ์ œํ•œ์ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
  • RoPE ๋ฐ ๊ธฐํƒ€ SFT, ํ›„์ฒ˜๋ฆฌ๋ฅผ ํ•˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ ํ•œ๊ตญ์–ด ์ƒ์„ฑ์€ ๊ฐ€๋Šฅํ•˜๋‚˜ ๋ฌธ๋งฅ ์ผ๊ด€์„ฑ์ด ์—†์Šต๋‹ˆ๋‹ค.

๋ผ์ด์„ ์Šค

Apache 2.0

ํฌ๋ ˆ๋”ง

  • ํ† ํฌ๋‚˜์ด์ €: beomi/KoAlpaca-Polyglot-5.8B
  • ์•„ํ‚คํ…์ฒ˜: PyTorch๋กœ ์ง์ ‘ ๊ตฌํ˜„ํ•œ Transformer ๊ธฐ๋ฐ˜ ๋””์ฝ”๋”

๋ฌธ์˜

์ด์Šˆ๋‚˜ ์งˆ๋ฌธ์ด ์žˆ์œผ์‹œ๋ฉด ๋ชจ๋ธ ์ €์žฅ์†Œ์— ์ด์Šˆ๋ฅผ ๋“ฑ๋กํ•ด์ฃผ์„ธ์š”.

Downloads last month
10
Safetensors
Model size
1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support