Fix device placement for tokenizer outputs before model inference 64c014e jeanbaptdzd commited on 26 days ago
Refactor: Address code shortcomings and align with HF best practices dc14519 jeanbaptdzd commited on 26 days ago
Fix OpenAI API compatibility: support tool_choice='required' and response_format a82e45b jeanbaptdzd commited on 30 days ago
feat: Add rate limiting, stats tracking, and fix critical issues 67befa7 jeanbaptdzd commited on Nov 17
Refactor: Remove RAG, upgrade vLLM 0.9.2, add optimization mode da484d7 jeanbaptdzd commited on Nov 2
feat: FastAPI vLLM service with OpenAI-compatible endpoints and PRIIPs extractor 6851411 jeanbaptdzd commited on Oct 28