Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
license: mpl-2.0
|
| 6 |
+
base_model: Qwen/Qwen3-1.7B
|
| 7 |
+
tags:
|
| 8 |
+
- lightning
|
| 9 |
+
- hermes-3
|
| 10 |
+
- utility
|
| 11 |
+
- on-device
|
| 12 |
+
- text-generation
|
| 13 |
+
- finetune
|
| 14 |
+
datasets:
|
| 15 |
+
- NousResearch/Hermes-3-Dataset
|
| 16 |
+
pipeline_tag: text-generation
|
| 17 |
+
inference: true
|
| 18 |
+
model_creator: TitleOS
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
# ⚡ Lightning-1.7B
|
| 22 |
+
|
| 23 |
+
<div align="center">
|
| 24 |
+
<img src="https://img.shields.io/badge/Model-Lightning--1.7B-blue?style=for-the-badge&logo=huggingface" alt="Model Name">
|
| 25 |
+
<img src="https://img.shields.io/badge/Base-Qwen3--1.7B-orange?style=for-the-badge" alt="Base Model">
|
| 26 |
+
<img src="https://img.shields.io/badge/License-MPL_2.0-brightgreen?style=for-the-badge" alt="License">
|
| 27 |
+
</div>
|
| 28 |
+
|
| 29 |
+
<br>
|
| 30 |
+
|
| 31 |
+
**Lightning-1.7B** is a high-efficiency utility model designed for edge computing and low-latency workflows. Finetuned from the powerful **Qwen3-1.7B** base upon the rich **NousResearch Hermes-3 dataset**, Lightning serves as a bridge between raw analytic logic and creative inference.
|
| 32 |
+
|
| 33 |
+
While it boasts improved capabilities in logic, Q/A, and coding compared to its base, its true strength lies in its **enhanced creativity** and **utility functions**. It is engineered to be the perfect "sidecar" model—small enough to run on-device with minimal memory impact, yet smart enough to handle complex metadata generation tasks.
|
| 34 |
+
|
| 35 |
+
## 🚀 Key Features
|
| 36 |
+
|
| 37 |
+
* **Ultra-Lightweight:** At 1.7B parameters, it runs efficiently on consumer hardware, laptops, and even mobile devices with minimal VRAM usage.
|
| 38 |
+
* **Hermes-Powered Creativity:** Leveraging the Hermes-3 dataset, Lightning moves beyond robotic responses, offering nuanced understanding for tasks that require a "human touch," such as summarizing tone or generating creative search queries.
|
| 39 |
+
* **Utility Specialist:** Specifically optimized for background tasks like tagging, title generation, and creating search inquiries from conversation context.
|
| 40 |
+
* **Low Latency:** Designed for speed, making it ideal for real-time applications where response time is critical.
|
| 41 |
+
|
| 42 |
+
## 🎯 Use Cases
|
| 43 |
+
|
| 44 |
+
Lightning-1.7B is best utilized not as a general chatbot, but as a specialized **Analytic & Utility Engine**:
|
| 45 |
+
|
| 46 |
+
1. **Conversation Auto-Titling:** accurately summarizing long context windows into punchy, relevant titles.
|
| 47 |
+
2. **Search Query Generation:** converting user intent or conversation history into optimized search engine queries.
|
| 48 |
+
3. **Onboard Tagging:** analyzing text streams to apply metadata tags (e.g., sentiment, topic, urgency) locally without API calls.
|
| 49 |
+
4. **JSON Formatting:** extracting structured data from unstructured text with higher reliability than standard small models.
|
| 50 |
+
|
| 51 |
+
## 💻 Quickstart
|
| 52 |
+
|
| 53 |
+
You can run Lightning-1.7B using the `transformers` library.
|
| 54 |
+
|
| 55 |
+
```python
|
| 56 |
+
import torch
|
| 57 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 58 |
+
|
| 59 |
+
model_name = "TitleOS/Lightning-1.7B"
|
| 60 |
+
|
| 61 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 62 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 63 |
+
model_name,
|
| 64 |
+
torch_dtype=torch.bfloat16,
|
| 65 |
+
device_map="auto"
|
| 66 |
+
)
|
| 67 |
+
|
| 68 |
+
# Example: Generating a search query from a user thought
|
| 69 |
+
prompt = """<|im_start|>system
|
| 70 |
+
You are a utility AI. Generate a specific Google search query based on the user's confused thought.<|im_end|>
|
| 71 |
+
<|im_start|>user
|
| 72 |
+
I remember there was this movie about a guy who lives in a computer but doesn't know it, and takes a red pill?<|im_end|>
|
| 73 |
+
<|im_start|>assistant
|
| 74 |
+
"""
|
| 75 |
+
|
| 76 |
+
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
| 77 |
+
|
| 78 |
+
outputs = model.generate(
|
| 79 |
+
**inputs,
|
| 80 |
+
max_new_tokens=64,
|
| 81 |
+
temperature=0.3,
|
| 82 |
+
do_sample=True
|
| 83 |
+
)
|
| 84 |
+
|
| 85 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 86 |
+
# Output: "movie guy lives in computer takes red pill matrix plot"
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
Merged FP16 and Quantizations:
|
| 90 |
+
|
| 91 |
+
FP16: https://huggingface.co/TitleOS/Lightning-1.7B
|
| 92 |
+
|
| 93 |
+
Q4_K_M:https://huggingface.co/TitleOS/Lightning-1.7B-Q4_K_M-GGUF
|
| 94 |
+
|
| 95 |
+
Q8: https://huggingface.co/TitleOS/Lightning-1.7B-Q8_0-GGUF
|
| 96 |
+
|
| 97 |
+
📊 Performance & Benchmarks
|
| 98 |
+
|
| 99 |
+
Lightning-1.7B punches above its weight class. By sacrificing some breadth of general world knowledge found in larger models, it focuses density on instruction following and creative interpretation.
|
| 100 |
+
|
| 101 |
+
Logic & Coding: Slight improvement over base Qwen3-1.7B.
|
| 102 |
+
|
| 103 |
+
Creativity & Nuance: Significant improvement due to Hermes-3 fine-tuning.
|
| 104 |
+
|
| 105 |
+
Memory Footprint: ~3.5GB VRAM (in FP16), <2GB (in 4-bit/8-bit quant).
|
| 106 |
+
|
| 107 |
+
🔧 Training Details
|
| 108 |
+
|
| 109 |
+
Base Model: Qwen3-1.7B
|
| 110 |
+
|
| 111 |
+
Dataset: NousResearch/Hermes-3-Dataset
|
| 112 |
+
|
| 113 |
+
Fine-tuning Approach: Lora Alpha 32/Lora R 16 focused on preserving the base model's speed while injecting the "Hermes" personality and instruction-following capabilities.
|
| 114 |
+
|
| 115 |
+
⚠️ Limitations
|
| 116 |
+
|
| 117 |
+
Knowledge Cutoff: As a small model, Lightning does not possess vast encyclopedic knowledge. It is best used for processing the text given to it in the context window rather than retrieving facts.
|
| 118 |
+
|
| 119 |
+
Complex Reasoning: While logic is improved, multi-step mathematical reasoning or complex coding challenges should be offloaded to larger models (7B+).
|
| 120 |
+
|
| 121 |
+
📜 License
|
| 122 |
+
|
| 123 |
+
This model is released under the Mozilla Public License 2.0 (MPL-2.0).
|
| 124 |
+
|
| 125 |
+
Created by TitleOS.
|