wangkanai commited on
Commit
8f516e6
·
verified ·
1 Parent(s): d25a77e

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ qwen3-vl-2b-thinking-abliterated.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
+ tags:
6
+ - qwen3
7
+ - vision-language
8
+ - multimodal
9
+ - abliterated
10
+ - thinking
11
+ - image-generation
12
+ ---
13
+
14
+ <!-- README Version: v1.0 -->
15
+
16
+ # Qwen3-VL-2B-Thinking (Abliterated)
17
+
18
+ A 2-billion parameter vision-language model from the Qwen3-VL family, featuring abliterated safety filters for unrestricted generation and enhanced reasoning capabilities. This model combines visual understanding with text generation, enabling multimodal analysis and creative applications.
19
+
20
+ ## Model Description
21
+
22
+ **Qwen3-VL-2B-Thinking-Abliterated** is a modified version of Qwen3-VL-2B optimized for:
23
+
24
+ - **Vision-Language Understanding**: Process images and generate contextual text responses
25
+ - **Multimodal Reasoning**: Analyze visual content with detailed explanations
26
+ - **Unrestricted Generation**: Abliterated safety layers for creative freedom
27
+ - **Thinking Mode**: Enhanced reasoning and step-by-step analysis capabilities
28
+ - **Efficient Inference**: 2B parameters for consumer hardware deployment
29
+
30
+ **Key Features**:
31
+ - Visual question answering (VQA)
32
+ - Image captioning and description
33
+ - Visual reasoning and analysis
34
+ - Multimodal conversation
35
+ - Creative image interpretation
36
+
37
+ ## Repository Contents
38
+
39
+ ```
40
+ qwen3-vl-2b-thinking/
41
+ ├── qwen3-vl-2b-thinking-abliterated.safetensors # PyTorch model (4.0GB)
42
+ └── qwen3-vl-2b-thinking-abliterated.gguf # GGUF quantized (3.3GB)
43
+ ```
44
+
45
+ **Total Repository Size**: ~7.3GB
46
+
47
+ ### Model Formats
48
+
49
+ | File | Format | Size | Use Case |
50
+ |------|--------|------|----------|
51
+ | `qwen3-vl-2b-thinking-abliterated.safetensors` | SafeTensors | 4.0GB | Transformers, PyTorch |
52
+ | `qwen3-vl-2b-thinking-abliterated.gguf` | GGUF | 3.3GB | llama.cpp, Ollama |
53
+
54
+ ## Hardware Requirements
55
+
56
+ ### Minimum Requirements
57
+ - **VRAM**: 6GB (GGUF quantized inference)
58
+ - **RAM**: 8GB system memory
59
+ - **Disk Space**: 8GB available
60
+ - **GPU**: CUDA-compatible (RTX 2060+) or Apple Silicon
61
+
62
+ ### Recommended Requirements
63
+ - **VRAM**: 8-12GB (SafeTensors full precision)
64
+ - **RAM**: 16GB system memory
65
+ - **Disk Space**: 10GB available
66
+ - **GPU**: RTX 3060 Ti+ or Apple M1 Pro+
67
+
68
+ ### Performance Estimates
69
+ - **GGUF on 8GB VRAM**: ~15-25 tokens/sec
70
+ - **SafeTensors on 12GB VRAM**: ~20-35 tokens/sec
71
+ - **CPU inference**: ~2-5 tokens/sec (not recommended)
72
+
73
+ ## Usage Examples
74
+
75
+ ### Using Transformers (SafeTensors)
76
+
77
+ ```python
78
+ from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
79
+ from PIL import Image
80
+ import torch
81
+
82
+ # Load model and processor
83
+ model_path = "E:/huggingface/qwen3-vl-2b-thinking"
84
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
85
+ model_path,
86
+ torch_dtype=torch.float16,
87
+ device_map="auto"
88
+ )
89
+ processor = AutoProcessor.from_pretrained(model_path)
90
+
91
+ # Load image
92
+ image = Image.open("image.jpg")
93
+
94
+ # Create conversation
95
+ messages = [
96
+ {
97
+ "role": "user",
98
+ "content": [
99
+ {"type": "image", "image": image},
100
+ {"type": "text", "text": "Describe this image in detail."}
101
+ ]
102
+ }
103
+ ]
104
+
105
+ # Process and generate
106
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
107
+ inputs = processor(text=[text], images=[image], return_tensors="pt").to("cuda")
108
+
109
+ # Generate response
110
+ with torch.no_grad():
111
+ outputs = model.generate(**inputs, max_new_tokens=256)
112
+ response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
113
+
114
+ print(response)
115
+ ```
116
+
117
+ ### Using llama.cpp (GGUF)
118
+
119
+ ```bash
120
+ # Download llama.cpp with vision support
121
+ git clone https://github.com/ggerganov/llama.cpp
122
+ cd llama.cpp
123
+ make
124
+
125
+ # Run inference with image
126
+ ./llama-cli \
127
+ --model "E:/huggingface/qwen3-vl-2b-thinking/qwen3-vl-2b-thinking-abliterated.gguf" \
128
+ --image "image.jpg" \
129
+ --prompt "Describe this image:" \
130
+ --n-gpu-layers 32 \
131
+ --ctx-size 4096
132
+ ```
133
+
134
+ ### Using Ollama (GGUF)
135
+
136
+ ```bash
137
+ # Create Modelfile
138
+ cat > Modelfile <<EOF
139
+ FROM E:/huggingface/qwen3-vl-2b-thinking/qwen3-vl-2b-thinking-abliterated.gguf
140
+ PARAMETER temperature 0.7
141
+ PARAMETER top_p 0.9
142
+ EOF
143
+
144
+ # Create Ollama model
145
+ ollama create qwen3-vl-thinking -f Modelfile
146
+
147
+ # Run interactive session
148
+ ollama run qwen3-vl-thinking "Analyze this image: image.jpg"
149
+ ```
150
+
151
+ ### Visual Question Answering
152
+
153
+ ```python
154
+ # Detailed analysis with thinking mode
155
+ messages = [
156
+ {
157
+ "role": "user",
158
+ "content": [
159
+ {"type": "image", "image": image},
160
+ {"type": "text", "text": "Think step-by-step and explain what's happening in this image."}
161
+ ]
162
+ }
163
+ ]
164
+
165
+ # Model will provide detailed reasoning in its response
166
+ ```
167
+
168
+ ## Model Specifications
169
+
170
+ ### Architecture
171
+ - **Model Type**: Vision-Language Transformer
172
+ - **Base Architecture**: Qwen3-VL
173
+ - **Parameters**: 2 billion
174
+ - **Modifications**: Abliterated safety layers, enhanced reasoning
175
+ - **Vision Encoder**: ViT-based image encoder
176
+ - **Text Decoder**: Qwen3 transformer decoder
177
+
178
+ ### Technical Details
179
+ - **Precision**: FP16 (SafeTensors), Quantized (GGUF)
180
+ - **Context Length**: 4096 tokens
181
+ - **Image Resolution**: 448x448 (default), up to 1024x1024
182
+ - **Vocabulary Size**: ~151,000 tokens
183
+ - **Training**: Multimodal pretraining + instruction tuning
184
+
185
+ ### Supported Tasks
186
+ - Image captioning
187
+ - Visual question answering
188
+ - Scene understanding
189
+ - Object detection (descriptive)
190
+ - Visual reasoning
191
+ - Image-to-text generation
192
+ - Multimodal conversation
193
+
194
+ ## Performance Tips
195
+
196
+ ### Optimization Strategies
197
+
198
+ 1. **VRAM Optimization**:
199
+ ```python
200
+ # Use 8-bit quantization
201
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
202
+ model_path,
203
+ load_in_8bit=True,
204
+ device_map="auto"
205
+ )
206
+ ```
207
+
208
+ 2. **Image Preprocessing**:
209
+ ```python
210
+ # Resize large images
211
+ from PIL import Image
212
+ image = Image.open("large_image.jpg")
213
+ image = image.resize((448, 448))
214
+ ```
215
+
216
+ 3. **Batch Processing**:
217
+ ```python
218
+ # Process multiple images efficiently
219
+ images = [Image.open(f"image{i}.jpg") for i in range(4)]
220
+ inputs = processor(images=images, text=prompts, return_tensors="pt")
221
+ ```
222
+
223
+ 4. **GGUF Performance**:
224
+ - Use `--n-gpu-layers 32` for GPU acceleration
225
+ - Adjust `--ctx-size` based on available VRAM
226
+ - Use `--threads` for CPU optimization
227
+
228
+ ### Generation Parameters
229
+
230
+ ```python
231
+ generation_config = {
232
+ "max_new_tokens": 256,
233
+ "temperature": 0.7,
234
+ "top_p": 0.9,
235
+ "do_sample": True,
236
+ "repetition_penalty": 1.1
237
+ }
238
+
239
+ outputs = model.generate(**inputs, **generation_config)
240
+ ```
241
+
242
+ ## Abliteration Notice
243
+
244
+ This model has been **abliterated** (safety filters removed) for:
245
+ - Unrestricted creative content generation
246
+ - Research and experimentation
247
+ - Artistic applications without content restrictions
248
+
249
+ **Important**: Users are responsible for ethical use and compliance with local laws. This model may generate unrestricted content.
250
+
251
+ ## License
252
+
253
+ Licensed under **Apache 2.0**. Free for commercial and research use with attribution.
254
+
255
+ Key provisions:
256
+ - ✅ Commercial use permitted
257
+ - ✅ Modification and distribution allowed
258
+ - ✅ Private use permitted
259
+ - ⚠️ Provide attribution and license notice
260
+ - ⚠️ State changes if modified
261
+
262
+ Full license: [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
263
+
264
+ ## Citation
265
+
266
+ ```bibtex
267
+ @misc{qwen3vl2b-thinking-abliterated,
268
+ title={Qwen3-VL-2B-Thinking-Abliterated},
269
+ author={Qwen Team and Community Contributors},
270
+ year={2025},
271
+ howpublished={\url{https://huggingface.co/Qwen}},
272
+ note={Abliterated vision-language model with enhanced reasoning}
273
+ }
274
+ ```
275
+
276
+ ## Official Resources
277
+
278
+ - **Qwen Official**: https://github.com/QwenLM/Qwen
279
+ - **Transformers Docs**: https://huggingface.co/docs/transformers
280
+ - **llama.cpp**: https://github.com/ggerganov/llama.cpp
281
+ - **Model Family**: https://huggingface.co/Qwen
282
+
283
+ ## Acknowledgments
284
+
285
+ - **Qwen Team**: Original Qwen3-VL architecture and pretraining
286
+ - **Community**: Abliteration techniques and reasoning enhancements
287
+ - **Hugging Face**: Model hosting and transformers library
qwen3-vl-2b-thinking-abliterated.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7530922c695bb15091a39e690f1379461338b630cb271816aee051bed8bf13f3
3
+ size 3447350144
qwen3-vl-2b-thinking-abliterated.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:293ac2e5469f4350e3f8b5484fbdb95d49e2be534cdc2408f976ac88332d29d6
3
+ size 4255140312