Commit History

Fix device placement for tokenizer outputs before model inference
64c014e

jeanbaptdzd commited on

Refactor: Address code shortcomings and align with HF best practices
dc14519

jeanbaptdzd commited on

Remove chat_service.py abstraction layer
c77ec91

jeanbaptdzd commited on

Fix OpenAI API compatibility: support tool_choice='required' and response_format
a82e45b

jeanbaptdzd commited on

feat: Enable tool calls support in OpenAI API
895a63f

jeanbaptdzd commited on

feat: Add rate limiting, stats tracking, and fix critical issues
67befa7

jeanbaptdzd commited on

feat: Add input validation and type hints
f28306b

jeanbaptdzd commited on

refactor: DRY improvements and optimize Dockerfile
16c2a22

jeanbaptdzd commited on

Refactor: Remove RAG, upgrade vLLM 0.9.2, add optimization mode
da484d7

jeanbaptdzd commited on

Add detailed error logging to vLLM provider and router
772dd21

jeanbaptdzd commited on

feat: FastAPI vLLM service with OpenAI-compatible endpoints and PRIIPs extractor
6851411

jeanbaptdzd commited on