Instructions to use froggeric/WestLake-10.7B-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use froggeric/WestLake-10.7B-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="froggeric/WestLake-10.7B-v2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("froggeric/WestLake-10.7B-v2")
model = AutoModelForCausalLM.from_pretrained("froggeric/WestLake-10.7B-v2")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use froggeric/WestLake-10.7B-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "froggeric/WestLake-10.7B-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "froggeric/WestLake-10.7B-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/froggeric/WestLake-10.7B-v2

SGLang

How to use froggeric/WestLake-10.7B-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "froggeric/WestLake-10.7B-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "froggeric/WestLake-10.7B-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "froggeric/WestLake-10.7B-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "froggeric/WestLake-10.7B-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use froggeric/WestLake-10.7B-v2 with Docker Model Runner:
```
docker model run hf.co/froggeric/WestLake-10.7B-v2
```

Benchmark Evals?

by senseable - opened Mar 23, 2024

Discussion

senseable

Mar 23, 2024

Just discovered this model and I agree it's writing and reasoning depth seems greatly improved.
Are you going to submit this to Huggingface leaderboard, I'm interested in seeing it's benchmarks.
Nice work!

seyf1elislam

Mar 23, 2024

•

edited Mar 23, 2024

@senseable

i just tried to compare another 7b model(not this one) , with its extended version (using same config) on openllm leaderboard results, here is what i get :

comparison

Metric	diff	Extended(10.7b)	Origin(7b)
Avg.	-3.76	69.75	73.51
AI2 Reasoning Challenge (25-Shot)	-3.07	68.09	71.16
HellaSwag (10-Shot)	-0.66	87.10	87.76
MMLU (5-Shot)	-0.34	64.43	64.77
TruthfulQA (0-shot)	-0.97	64.28	65.25
Winogrande (5-shot)	-0.31	82.72	83.03
GSM8k (5-shot)	-17.21	51.86	69.07

but the effect in the chat seems good and stable , thank for this great config

senseable

Mar 23, 2024

@seyf1elislam Interesting, thanks for sharing. Nothing a little fine-tuning couldn't fix with potentially a higher ceiling on evals like MMLU.

froggeric

Owner Mar 27, 2024

@senseable Exactly: we have the potential to build some amazing larger models with the great Mistral-7B as a base. Your fine-tune is the perfect starting point. I think the process should go fine-tune > self-merge > fine-tune > self-merge > fine-tune > etc

After each self-merge, reapplying the original fine-tune should help realign the layers and get rid of the errors introduced by the self-merge. It should also result in a new model which can be further self-merged. If you would like to give a try to reapplying your WestLake fine-tune to this 10.7B self-merge, I would like to try to see how far we can push it. I expect the next good self-merge could result in a 16-20B model. And maybe it is possible to push it all the way to 34B.

Here is the HF LLM leaderboard comparison:

comparison

Metric	diff	WestLake-10.7B-v2	WestLake-7B-v2
Avg.	-5.14	70.28	75.42
AI2 Reasoning Challenge (25-Shot)	-1.88	71.16	73.04
HellaSwag (10-Shot)	-0.72	87.93	88.65
MMLU (5-Shot)	-0.90	63.81	64.71
TruthfulQA (0-shot)	-2.15	64.91	67.06
Winogrande (5-shot)	-1.58	85.40	86.98
GSM8k (5-shot)	-19.18	48.45	67.63

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment