Instructions to use froggeric/WestLake-10.7B-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use froggeric/WestLake-10.7B-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="froggeric/WestLake-10.7B-v2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("froggeric/WestLake-10.7B-v2") model = AutoModelForCausalLM.from_pretrained("froggeric/WestLake-10.7B-v2") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use froggeric/WestLake-10.7B-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "froggeric/WestLake-10.7B-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "froggeric/WestLake-10.7B-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/froggeric/WestLake-10.7B-v2
- SGLang
How to use froggeric/WestLake-10.7B-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "froggeric/WestLake-10.7B-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "froggeric/WestLake-10.7B-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "froggeric/WestLake-10.7B-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "froggeric/WestLake-10.7B-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use froggeric/WestLake-10.7B-v2 with Docker Model Runner:
docker model run hf.co/froggeric/WestLake-10.7B-v2
Benchmark Evals?
Just discovered this model and I agree it's writing and reasoning depth seems greatly improved.
Are you going to submit this to Huggingface leaderboard, I'm interested in seeing it's benchmarks.
Nice work!
i just tried to compare another 7b model(not this one) , with its extended version (using same config) on openllm leaderboard results, here is what i get :
comparison
| Metric | diff | Extended(10.7b) | Origin(7b) |
|---|---|---|---|
| Avg. | -3.76 | 69.75 | 73.51 |
| AI2 Reasoning Challenge (25-Shot) | -3.07 | 68.09 | 71.16 |
| HellaSwag (10-Shot) | -0.66 | 87.10 | 87.76 |
| MMLU (5-Shot) | -0.34 | 64.43 | 64.77 |
| TruthfulQA (0-shot) | -0.97 | 64.28 | 65.25 |
| Winogrande (5-shot) | -0.31 | 82.72 | 83.03 |
| GSM8k (5-shot) | -17.21 | 51.86 | 69.07 |
but the effect in the chat seems good and stable , thank for this great config
@seyf1elislam Interesting, thanks for sharing. Nothing a little fine-tuning couldn't fix with potentially a higher ceiling on evals like MMLU.
@senseable Exactly: we have the potential to build some amazing larger models with the great Mistral-7B as a base. Your fine-tune is the perfect starting point. I think the process should go fine-tune > self-merge > fine-tune > self-merge > fine-tune > etc
After each self-merge, reapplying the original fine-tune should help realign the layers and get rid of the errors introduced by the self-merge. It should also result in a new model which can be further self-merged. If you would like to give a try to reapplying your WestLake fine-tune to this 10.7B self-merge, I would like to try to see how far we can push it. I expect the next good self-merge could result in a 16-20B model. And maybe it is possible to push it all the way to 34B.
Here is the HF LLM leaderboard comparison:
comparison
| Metric | diff | WestLake-10.7B-v2 | WestLake-7B-v2 |
|---|---|---|---|
| Avg. | -5.14 | 70.28 | 75.42 |
| AI2 Reasoning Challenge (25-Shot) | -1.88 | 71.16 | 73.04 |
| HellaSwag (10-Shot) | -0.72 | 87.93 | 88.65 |
| MMLU (5-Shot) | -0.90 | 63.81 | 64.71 |
| TruthfulQA (0-shot) | -2.15 | 64.91 | 67.06 |
| Winogrande (5-shot) | -1.58 | 85.40 | 86.98 |
| GSM8k (5-shot) | -19.18 | 48.45 | 67.63 |