Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

.gitattributes +1 -0
README.md +287 -0
qwen3-vl-2b-thinking-abliterated.gguf +3 -0
qwen3-vl-2b-thinking-abliterated.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+qwen3-vl-2b-thinking-abliterated.gguf filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,287 @@

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: image-text-to-text
+tags:
+  - qwen3
+  - vision-language
+  - multimodal
+  - abliterated
+  - thinking
+  - image-generation
+---
+<!-- README Version: v1.0 -->
+# Qwen3-VL-2B-Thinking (Abliterated)
+A 2-billion parameter vision-language model from the Qwen3-VL family, featuring abliterated safety filters for unrestricted generation and enhanced reasoning capabilities. This model combines visual understanding with text generation, enabling multimodal analysis and creative applications.
+## Model Description
+**Qwen3-VL-2B-Thinking-Abliterated** is a modified version of Qwen3-VL-2B optimized for:
+- **Vision-Language Understanding**: Process images and generate contextual text responses
+- **Multimodal Reasoning**: Analyze visual content with detailed explanations
+- **Unrestricted Generation**: Abliterated safety layers for creative freedom
+- **Thinking Mode**: Enhanced reasoning and step-by-step analysis capabilities
+- **Efficient Inference**: 2B parameters for consumer hardware deployment
+**Key Features**:
+- Visual question answering (VQA)
+- Image captioning and description
+- Visual reasoning and analysis
+- Multimodal conversation
+- Creative image interpretation
+## Repository Contents
+```
+qwen3-vl-2b-thinking/
+├── qwen3-vl-2b-thinking-abliterated.safetensors  # PyTorch model (4.0GB)
+└── qwen3-vl-2b-thinking-abliterated.gguf         # GGUF quantized (3.3GB)
+```
+**Total Repository Size**: ~7.3GB
+### Model Formats
+| File | Format | Size | Use Case |
+|------|--------|------|----------|
+| `qwen3-vl-2b-thinking-abliterated.safetensors` | SafeTensors | 4.0GB | Transformers, PyTorch |
+| `qwen3-vl-2b-thinking-abliterated.gguf` | GGUF | 3.3GB | llama.cpp, Ollama |
+## Hardware Requirements
+### Minimum Requirements
+- **VRAM**: 6GB (GGUF quantized inference)
+- **RAM**: 8GB system memory
+- **Disk Space**: 8GB available
+- **GPU**: CUDA-compatible (RTX 2060+) or Apple Silicon
+### Recommended Requirements
+- **VRAM**: 8-12GB (SafeTensors full precision)
+- **RAM**: 16GB system memory
+- **Disk Space**: 10GB available
+- **GPU**: RTX 3060 Ti+ or Apple M1 Pro+
+### Performance Estimates
+- **GGUF on 8GB VRAM**: ~15-25 tokens/sec
+- **SafeTensors on 12GB VRAM**: ~20-35 tokens/sec
+- **CPU inference**: ~2-5 tokens/sec (not recommended)
+## Usage Examples
+### Using Transformers (SafeTensors)
+```python
+from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
+from PIL import Image
+import torch
+# Load model and processor
+model_path = "E:/huggingface/qwen3-vl-2b-thinking"
+model = Qwen2VLForConditionalGeneration.from_pretrained(
+    model_path,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+processor = AutoProcessor.from_pretrained(model_path)
+# Load image
+image = Image.open("image.jpg")
+# Create conversation
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "image": image},
+            {"type": "text", "text": "Describe this image in detail."}
+        ]
+    }
+]
+# Process and generate
+text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = processor(text=[text], images=[image], return_tensors="pt").to("cuda")
+# Generate response
+with torch.no_grad():
+    outputs = model.generate(**inputs, max_new_tokens=256)
+response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
+print(response)
+```
+### Using llama.cpp (GGUF)
+```bash
+# Download llama.cpp with vision support
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
+make
+# Run inference with image
+./llama-cli \
+  --model "E:/huggingface/qwen3-vl-2b-thinking/qwen3-vl-2b-thinking-abliterated.gguf" \
+  --image "image.jpg" \
+  --prompt "Describe this image:" \
+  --n-gpu-layers 32 \
+  --ctx-size 4096
+```
+### Using Ollama (GGUF)
+```bash
+# Create Modelfile
+cat > Modelfile <<EOF
+FROM E:/huggingface/qwen3-vl-2b-thinking/qwen3-vl-2b-thinking-abliterated.gguf
+PARAMETER temperature 0.7
+PARAMETER top_p 0.9
+EOF
+# Create Ollama model
+ollama create qwen3-vl-thinking -f Modelfile
+# Run interactive session
+ollama run qwen3-vl-thinking "Analyze this image: image.jpg"
+```
+### Visual Question Answering
+```python
+# Detailed analysis with thinking mode
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "image": image},
+            {"type": "text", "text": "Think step-by-step and explain what's happening in this image."}
+        ]
+    }
+]
+# Model will provide detailed reasoning in its response
+```
+## Model Specifications
+### Architecture
+- **Model Type**: Vision-Language Transformer
+- **Base Architecture**: Qwen3-VL
+- **Parameters**: 2 billion
+- **Modifications**: Abliterated safety layers, enhanced reasoning
+- **Vision Encoder**: ViT-based image encoder
+- **Text Decoder**: Qwen3 transformer decoder
+### Technical Details
+- **Precision**: FP16 (SafeTensors), Quantized (GGUF)
+- **Context Length**: 4096 tokens
+- **Image Resolution**: 448x448 (default), up to 1024x1024
+- **Vocabulary Size**: ~151,000 tokens
+- **Training**: Multimodal pretraining + instruction tuning
+### Supported Tasks
+- Image captioning
+- Visual question answering
+- Scene understanding
+- Object detection (descriptive)
+- Visual reasoning
+- Image-to-text generation
+- Multimodal conversation
+## Performance Tips
+### Optimization Strategies
+1. **VRAM Optimization**:
+   ```python
+   # Use 8-bit quantization
+   model = Qwen2VLForConditionalGeneration.from_pretrained(
+       model_path,
+       load_in_8bit=True,
+       device_map="auto"
+   )
+   ```
+2. **Image Preprocessing**:
+   ```python
+   # Resize large images
+   from PIL import Image
+   image = Image.open("large_image.jpg")
+   image = image.resize((448, 448))
+   ```
+3. **Batch Processing**:
+   ```python
+   # Process multiple images efficiently
+   images = [Image.open(f"image{i}.jpg") for i in range(4)]
+   inputs = processor(images=images, text=prompts, return_tensors="pt")
+   ```
+4. **GGUF Performance**:
+   - Use `--n-gpu-layers 32` for GPU acceleration
+   - Adjust `--ctx-size` based on available VRAM
+   - Use `--threads` for CPU optimization
+### Generation Parameters
+```python
+generation_config = {
+    "max_new_tokens": 256,
+    "temperature": 0.7,
+    "top_p": 0.9,
+    "do_sample": True,
+    "repetition_penalty": 1.1
+}
+outputs = model.generate(**inputs, **generation_config)
+```
+## Abliteration Notice
+This model has been **abliterated** (safety filters removed) for:
+- Unrestricted creative content generation
+- Research and experimentation
+- Artistic applications without content restrictions
+**Important**: Users are responsible for ethical use and compliance with local laws. This model may generate unrestricted content.
+## License
+Licensed under **Apache 2.0**. Free for commercial and research use with attribution.
+Key provisions:
+- ✅ Commercial use permitted
+- ✅ Modification and distribution allowed
+- ✅ Private use permitted
+- ⚠️ Provide attribution and license notice
+- ⚠️ State changes if modified
+Full license: [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+## Citation
+```bibtex
+@misc{qwen3vl2b-thinking-abliterated,
+  title={Qwen3-VL-2B-Thinking-Abliterated},
+  author={Qwen Team and Community Contributors},
+  year={2025},
+  howpublished={\url{https://huggingface.co/Qwen}},
+  note={Abliterated vision-language model with enhanced reasoning}
+}
+```
+## Official Resources
+- **Qwen Official**: https://github.com/QwenLM/Qwen
+- **Transformers Docs**: https://huggingface.co/docs/transformers
+- **llama.cpp**: https://github.com/ggerganov/llama.cpp
+- **Model Family**: https://huggingface.co/Qwen
+## Acknowledgments
+- **Qwen Team**: Original Qwen3-VL architecture and pretraining
+- **Community**: Abliteration techniques and reasoning enhancements
+- **Hugging Face**: Model hosting and transformers library

qwen3-vl-2b-thinking-abliterated.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7530922c695bb15091a39e690f1379461338b630cb271816aee051bed8bf13f3
+size 3447350144

qwen3-vl-2b-thinking-abliterated.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:293ac2e5469f4350e3f8b5484fbdb95d49e2be534cdc2408f976ac88332d29d6
+size 4255140312