vllm.entrypoints.openai.api_server ¶
build_and_serve async ¶
build_and_serve(
renderer_client: RendererClient,
engine_client: EngineClient | None,
listen_address: str,
sock: socket,
args: Namespace,
**uvicorn_kwargs,
) -> Task
Build FastAPI app, initialize state, and start serving.
Returns the shutdown task for the caller to await.
Source code in vllm/entrypoints/openai/api_server.py
build_async_clients_from_engine_args async ¶
build_async_clients_from_engine_args(
engine_args: AsyncEngineArgs,
*,
usage_context: UsageContext = OPENAI_API_SERVER,
disable_frontend_multiprocessing: bool = False,
client_config: dict[str, Any] | None = None,
) -> AsyncIterator[tuple[RendererClient, EngineClient]]
Create a co-located (RendererClient, EngineClient) pair backed by AsyncLLM.
Source code in vllm/entrypoints/openai/api_server.py
init_renderer_state async ¶
init_renderer_state(
renderer_client: RendererClient,
state: State,
args: Namespace,
) -> None
Initialize app state for a render-only server (no EngineClient).
Sets up only the services that are meaningful without an inference engine: models listing, tokenization, and chat/completion rendering.
Source code in vllm/entrypoints/openai/api_server.py
run_server async ¶
Run a single-worker API server.
Source code in vllm/entrypoints/openai/api_server.py
run_server_worker async ¶
Run a single API server worker.
Source code in vllm/entrypoints/openai/api_server.py
setup_server ¶
Validate API server args, set up signal handler, create socket ready to serve.