This is a decensored version of deepseek-ai/deepseek-coder-33b-instruct, made using Heretic v1.0.1
Abliteration parameters
| Parameter | Value |
|---|---|
| direction_index | 25.61 |
| attn.o_proj.max_weight | 1.49 |
| attn.o_proj.max_weight_position | 37.32 |
| attn.o_proj.min_weight | 1.45 |
| attn.o_proj.min_weight_distance | 31.10 |
| mlp.down_proj.max_weight | 0.81 |
| mlp.down_proj.max_weight_position | 56.20 |
| mlp.down_proj.min_weight | 0.62 |
| mlp.down_proj.min_weight_distance | 5.57 |
Performance
| Metric | This model | Original model (deepseek-ai/deepseek-coder-33b-instruct) |
|---|---|---|
| KL divergence | 0.02 | 0 (by definition) |
| Refusals | 70/100 | 97/100 |
[🏠Homepage] | [🤖 Chat with DeepSeek Coder] | [Discord] | [Wechat(微信)]
1. Introduction of Deepseek Coder
Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.
Massive Training Data: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
Highly Flexible & Scalable: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
Superior Model Performance: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
Advanced Code Completion Capabilities: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.
2. Model Summary
deepseek-coder-33b-instruct is a 33B parameter model initialized from deepseek-coder-33b-base and fine-tuned on 2B tokens of instruction data.
- Home Page: DeepSeek
- Repository: deepseek-ai/deepseek-coder
- Chat With DeepSeek Coder: DeepSeek-Coder
3. How to Use
Here give some examples of how to use our model.
Chat Model Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
messages=[
{ 'role': 'user', 'content': "write a quick sort algorithm in python."}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# tokenizer.eos_token_id is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
4. License
This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.
See the LICENSE-MODEL for more details.
5. Contact
If you have any questions, please raise an issue or contact us at [email protected].
Important Disclaimer
This model has been modified to remove safety guardrails and refusal behaviors.
Intended Use
- Research and educational purposes
- Understanding model behavior and limitations
- Creative writing and roleplay with consenting adults
- Red-teaming and safety research
Not Intended For
- Generating harmful, illegal, or unethical content
- Harassment, abuse, or malicious activities
- Misinformation or deception
- Any use that violates applicable laws
User Responsibility
By using this model, you acknowledge that:
- You are solely responsible for how you use this model and any content it generates
- The model creator accepts no liability for misuse or harmful outputs
- You will comply with all applicable laws and ethical guidelines
- You understand this model may produce inaccurate, biased, or inappropriate content
Technical Note
This model was created using abliteration techniques that suppress the "refusal direction" in the model's activation space. This does not add new capabilities—it only removes trained refusal behaviors from the base model.
Use responsibly. You have been warned.
- Downloads last month
- 15