Instructions to use QuantFactory/UIGEN-FX-4B-Preview-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/UIGEN-FX-4B-Preview-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuantFactory/UIGEN-FX-4B-Preview-GGUF")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("QuantFactory/UIGEN-FX-4B-Preview-GGUF", dtype="auto")

llama-cpp-python

How to use QuantFactory/UIGEN-FX-4B-Preview-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/UIGEN-FX-4B-Preview-GGUF",
	filename="UIGEN-FX-4B-Preview.Q2_K.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use QuantFactory/UIGEN-FX-4B-Preview-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use QuantFactory/UIGEN-FX-4B-Preview-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantFactory/UIGEN-FX-4B-Preview-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/UIGEN-FX-4B-Preview-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M

SGLang

How to use QuantFactory/UIGEN-FX-4B-Preview-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantFactory/UIGEN-FX-4B-Preview-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/UIGEN-FX-4B-Preview-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantFactory/UIGEN-FX-4B-Preview-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/UIGEN-FX-4B-Preview-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use QuantFactory/UIGEN-FX-4B-Preview-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M
```

Unsloth Studio new

How to use QuantFactory/UIGEN-FX-4B-Preview-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/UIGEN-FX-4B-Preview-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/UIGEN-FX-4B-Preview-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/UIGEN-FX-4B-Preview-GGUF to start chatting

Docker Model Runner
How to use QuantFactory/UIGEN-FX-4B-Preview-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/UIGEN-FX-4B-Preview-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/UIGEN-FX-4B-Preview-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.UIGEN-FX-4B-Preview-GGUF-Q4_K_M

List all available models

lemonade list

QuantFactory/UIGEN-FX-4B-Preview-GGUF

This is quantized version of Tesslate/UIGEN-FX-4B-Preview created using llama.cpp

Original Model Card

@@@ ALERT @@@ — Research Preview: some generations may not be ready for production. Use repeat_penalty or frequency_penalty at >= 1.1

Tesslate • UIGEN Series

UIGEN-FX-4B-Preview

FX = “Frontend Engineer.” This upgrade in the UIGEN line focuses on better visual polish, functional structure, and web-ready markup to ship cleaner, more complete websites from a single prompt.

Open weights Web-only bias Mobile-first output Minimal JS by default

Model Repo Try in Designer Demos Website Discord

What is FX?

UIGEN-FX-4B-Preview is a 4B parameter UI generation model tuned to behave like a frontend engineer across 22 Frameworks.

Why 4B?

Small enough for laptops and fast iteration, while keeping strong structure and visual consistency. FX emphasizes layout rhythm, spacing, and component composition to reduce cleanup work.

Quickstart

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Tesslate/UIGEN-FX-4B-Preview"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
prompt = """Make a single-file landing page for 'LatticeDB'.
Style: modern, generous whitespace, Tailwind, rounded-xl, soft gradients.
Sections: navbar, hero (headline + 2 CTAs), features grid, pricing (3 tiers),
FAQ accordion, footer. Constraints: semantic HTML, no external JS."""
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=2200, temperature=0.6, top_p=0.9)
print(tok.decode(out[0], skip_special_tokens=True))

vLLM

vllm serve Tesslate/UIGEN-FX-4B-Preview \
  --host 0.0.0.0 --port 8000 \
  --max-model-len 65536 \
  --gpu-memory-utilization 0.92

sglang

python -m sglang.launch_server \
  --model-path Tesslate/UIGEN-FX-4B-Preview \
  --host 0.0.0.0 --port 5000 \
  --mem-fraction-static 0.94 \
  --attention-backend flashinfer \
  --served-model-name UIGEN-FX-4B-Preview

Tip: Lower temperature (0.4–0.6) yields stricter, cleaner markup; raise it for more visual variety.

Suggested Inference Settings

Param	Value	Notes
`temperature`	0.6	Balance creativity & consistency (lower if quantized)
`top_p`	0.9	Nucleus sampling
`top_k`	40	Optional vocab restriction
`max_new_tokens`	1200–2500	Single-file sites often fit < 1800
`repetition_penalty`	1.08–1.15	Reduces repetitive classes/markup

Prompts that work well

Starter

Make a single-file landing page for "RasterFlow" (GPU video pipeline).
Style: modern tech, muted palette, Tailwind, rounded-xl, subtle gradients.
Sections: navbar, hero (big headline + 2 CTAs), logos row, features (3x cards),
code block (copyable), pricing (3 tiers), FAQ accordion, footer.
Constraints: semantic HTML, no external JS. Return ONLY the HTML code.

Design-strict

Use an 8pt spacing system. Palette: slate with indigo accents.
Typography scale: 14/16/18/24/36/56. Max width: 1200px.
Avoid shadows > md; prefer borders/dividers; keep line-length ~68ch.

Quantization & VRAM (example)

Format	Footprint	Notes
BF16	~8.1 GB	Fastest, best fidelity
GGUF Q5_K_M	~2.9 GB	Great quality/size trade-off
GGUF Q4_K_M	~2.5 GB	Laptop-friendly

Intended Use & Scope

Primary:React, Tailwind, Javascript, Static Site generators, Python Frontend WebUI
Secondary: Component blocks (hero, pricing, FAQ) for manual composition

Limitations

Accessibility: headings/labels are encouraged; ARIA coverage may need review.
Complex widgets: JS kept minimal unless requested; consider post-edit for heavy interactivity.

Ethical Considerations

Use rights-cleared assets when adding logos/images.
Curate prompts responsibly; review outputs before production use.

Training Summary (research preview)

Base: Qwen/Qwen3-4B-Instruct-2507
Objective: Web-only bias; reward semantic structure, spacing rhythm, responsive blocks; improved visual polish vs earlier UIGEN releases.
Data: Curated HTML/CSS/Tailwind snippets, component libraries, synthetic page specs & layout constraints.
Recipe: SFT with format constraints → instruction tuning → preference optimization on style/structure.
Context: effective ~64k; default generations sized for practical single-file pages.

Community

Examples: uigenoutput.tesslate.com
Discord: discord.gg/EcCpcTv93U
Website: tesslate.com

“FX aims to ship what a frontend engineer would: clean structure first, pretty pixels second.” — Tesslate Team

Citation

@misc{tesslate_uigen_fx_4b_2025,
  title   = {UIGEN-FX-4B-Preview: Frontend Engineer-tuned web generation (Research Preview)},
  author  = {Tesslate Team},
  year    = {2025},
  url     = {https://huggingface.co/Tesslate/UIGEN-FX-4B-Preview}
}

Downloads last month: 157

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for QuantFactory/UIGEN-FX-4B-Preview-GGUF

Base model

Qwen/Qwen3-4B-Instruct-2507

Quantized

(243)

this model