umd-zhou-lab/claude2_alpaca
Viewer β’ Updated β’ 52k β’ 81 β’ 5
How to use umd-zhou-lab/claude2-alpaca-13B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="umd-zhou-lab/claude2-alpaca-13B") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("umd-zhou-lab/claude2-alpaca-13B")
model = AutoModelForCausalLM.from_pretrained("umd-zhou-lab/claude2-alpaca-13B")How to use umd-zhou-lab/claude2-alpaca-13B with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "umd-zhou-lab/claude2-alpaca-13B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "umd-zhou-lab/claude2-alpaca-13B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/umd-zhou-lab/claude2-alpaca-13B
How to use umd-zhou-lab/claude2-alpaca-13B with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "umd-zhou-lab/claude2-alpaca-13B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "umd-zhou-lab/claude2-alpaca-13B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "umd-zhou-lab/claude2-alpaca-13B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "umd-zhou-lab/claude2-alpaca-13B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use umd-zhou-lab/claude2-alpaca-13B with Docker Model Runner:
docker model run hf.co/umd-zhou-lab/claude2-alpaca-13B
This model is trained by fine-tuning llama-2 with claude2 alpaca data.
The primary use of this model is research on large language models and chatbots. The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.
We use the prompt from Stanford Alpaca
| Hyperparameter | Global Batch Size | Learning rate | Epochs | Max length | Weight decay |
|---|---|---|---|---|---|
| Model (13B) | 128 | 1e-5 | 5 | 2048 | 0 |
Compared to the llama2-chat, our models can have better average performance.
| Average | ARC | HellaSwag | MMLU | TruthfulQA | Alpaca_Eval | Avg Length | |
|---|---|---|---|---|---|---|---|
| Llama-2-7b-chat | 56.335 | 52.9 | 78.55 | 48.32 | 45.57 | 71.37 | 1479 |
| Llama-2-13b-chat | 59.935 | 59.04 | 81.94 | 54.64 | 44.12 | 81.09 | 1513 |
| claude_alpaca-7b | 57.78 | 56.66 | 81.17 | 46.58 | 46.71 | 71.23 | 1066 |
| claude_alpaca-13b | 61.29 | 61.18 | 84.08 | 55.74 | 44.18 | 78.93 | 1127 |
Please consider citing our paper if you think our codes, data, or models are useful. Thank you!
@misc{claude2-alpaca,
author = {Lichang Chen and Khalid Saifullah and Ming Li and Tianyi Zhou and Heng Huang},
title = {Claude2-Alpaca: Instruction tuning datasets distilled from claude},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Lichang-Chen/claude2-alpaca}},
}