Ellbendls commited on
Commit
3ed7704
Β·
verified Β·
1 Parent(s): 956bd39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +144 -7
README.md CHANGED
@@ -1,11 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF
2
 
3
- Derived GGUF exports of `Ellbendls/Qwen-3-4b-Text_to_SQL`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
- Files:
6
- - Base: `Qwen-3-4b-Text_to_SQL-F16.gguf`
7
- - Quants: Qwen-3-4b-Text_to_SQL-q6_k.gguf
8
 
9
- Converted and quantized with llama.cpp.
10
- Attribution: base model `Ellbendls/Qwen-3-4b-Text_to_SQL` (see original license).
11
- Generated: 2025-09-17T07:58:42.430503Z
 
1
+
2
+ ---
3
+ library_name: gguf
4
+ license: apache-2.0
5
+ base_model:
6
+ - Ellbendls/Qwen-3-4b-Text_to_SQL
7
+ - Qwen/Qwen3-4B-Instruct-2507
8
+ tags:
9
+ - gguf
10
+ - llama.cpp
11
+ - qwen
12
+ - text-to-sql
13
+ - sql
14
+ - instruct
15
+ language:
16
+ - eng
17
+ - zho
18
+ - fra
19
+ - spa
20
+ - por
21
+ - deu
22
+ - ita
23
+ - rus
24
+ - jpn
25
+ - kor
26
+ - vie
27
+ - tha
28
+ - ara
29
+ pipeline_tag: text-generation
30
+ ---
31
+
32
  # Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF
33
 
34
+ Quantized GGUF builds of `Ellbendls/Qwen-3-4b-Text_to_SQL` for fast CPU/GPU inference with llama.cpp-compatible runtimes.
35
+
36
+ - **Base model**. Fine-tuned from **Qwen/Qwen3-4B-Instruct-2507** for Text-to-SQL.
37
+ - **License**. Apache-2.0 (inherits from base). Keep attribution.
38
+ - **Purpose**. Turn natural language into SQL. When schema is missing, the model can infer a simple schema then produce SQL.
39
+
40
+ ## Files
41
+
42
+ Base and quantized variants:
43
+
44
+ - `Qwen-3-4b-Text_to_SQL-F16.gguf` β€” reference float16 export
45
+ - `Qwen-3-4b-Text_to_SQL-q2_k.gguf`
46
+ - `Qwen-3-4b-Text_to_SQL-q3_k_m.gguf`
47
+ - `Qwen-3-4b-Text_to_SQL-q4_k_s.gguf`
48
+ - `Qwen-3-4b-Text_to_SQL-q4_k_m.gguf` ← good default
49
+ - `Qwen-3-4b-Text_to_SQL-q5_k_m.gguf`
50
+ - `Qwen-3-4b-Text_to_SQL-q6_k.gguf`
51
+ - `Qwen-3-4b-Text_to_SQL-q8_0.gguf` ← near-lossless, larger
52
+
53
+ Conversion and quantization done with `llama.cpp`.
54
+
55
+ ## Recommended pick
56
+
57
+ - **Q4_K_M**. Best balance of speed and quality for laptops and small servers.
58
+ - **Q5_K_M**. Higher quality, a bit more RAM/VRAM.
59
+ - **Q8_0**. Highest quality among quants. Use if you have headroom.
60
+
61
+ ## Approximate memory needs
62
+
63
+ These are ballpark for a 4B model. Real usage varies by runtime and context length.
64
+
65
+ - Q4_K_M: 3–4 GB RAM/VRAM
66
+ - Q5_K_M: 4–5 GB
67
+ - Q8_0: 6–8 GB
68
+ - F16: 10–12 GB
69
+
70
+ ## Quick start
71
+
72
+ ### llama.cpp (CLI)
73
+
74
+ CPU only:
75
+ ```bash
76
+ ./llama-cli -m Qwen-3-4b-Text_to_SQL-q4_k_m.gguf \
77
+ -p "Generate SQL to get average salary by department in 2024." \
78
+ -n 256 -t 6
79
+ ````
80
+
81
+ NVIDIA GPU offload (build with `-DLLAMA_CUBLAS=ON`):
82
+
83
+ ```bash
84
+ ./llama-cli -m Qwen-3-4b-Text_to_SQL-q4_k_m.gguf \
85
+ -p "Generate SQL to get average salary by department in 2024." \
86
+ -n 256 -ngl 999 -t 6
87
+ ```
88
+
89
+ ### Python (llama-cpp-python)
90
+
91
+ ```python
92
+ from llama_cpp import Llama
93
+
94
+ llm = Llama(model_path="Qwen-3-4b-Text_to_SQL-q4_k_m.gguf", n_ctx=4096, n_gpu_layers=35) # set 0 for CPU-only
95
+ prompt = "Generate SQL to list total orders and revenue by month for 2024."
96
+ out = llm(prompt, max_tokens=256, temperature=0.2, top_p=0.9)
97
+ print(out["choices"][0]["text"].strip())
98
+ ```
99
+
100
+ ### LM Studio / Kobold / text-generation-webui
101
+
102
+ * Select the `.gguf` file and load.
103
+ * Set temperature 0.1–0.3 for deterministic SQL.
104
+ * Use a system prompt to anchor behavior.
105
+
106
+ ## Model details
107
+
108
+ * **Base**. `Qwen/Qwen3-4B-Instruct-2507` (32k context, multilingual).
109
+ * **Fine-tune**. Trained on `gretelai/synthetic_text_to_sql`.
110
+ * **Task**. NL β†’ SQL. Capable of simple schema inference when needed.
111
+ * **Languages**. Works best in English. Can follow prompts in several languages from the base model.
112
+
113
+ ## Conversion reproducibility
114
+
115
+ Export used:
116
+
117
+ ```bash
118
+ python convert_hf_to_gguf.py /path/to/hf_model --outtype f16 --outfile Qwen-3-4b-Text_to_SQL-F16.gguf
119
+ ```
120
+
121
+ Quantization used:
122
+
123
+ ```bash
124
+ ./llama-quantize Qwen-3-4b-Text_to_SQL-F16.gguf Qwen-3-4b-Text_to_SQL-q4_k_m.gguf Q4_K_M
125
+ # likewise for q2_k, q3_k_m, q5_k_m, q8_0
126
+ ```
127
+
128
+ ## Intended use and limits
129
+
130
+ * **Use**. Analytics, reporting, dashboards, data exploration, SQL prototyping.
131
+ * **Limits**. No database connectivity. It only generates SQL text. Validate and test queries before use in production. Provide real schema for best accuracy.
132
+
133
+ ## Attribution
134
+
135
+ * Base model: [`Qwen/Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
136
+ * Fine-tuned model: [`Ellbendls/Qwen-3-4b-Text_to_SQL`](https://huggingface.co/Ellbendls/Qwen-3-4b-Text_to_SQL)
137
+
138
+ ## License
139
+
140
+ Apache-2.0. Include license and NOTICE from upstream when redistributing the weights. Do not imply endorsement from Qwen or original authors.
141
+
142
+ ## Changelog
143
 
144
+ * 2025-09-17. Initial GGUF release. Added q2\_k, q3\_k\_m, q4\_k\_m, q5\_k\_m, q8\_0, and F16.
 
 
145
 
146
+ ```
147
+ ::contentReference[oaicite:0]{index=0}
148
+ ```