9.1 CRITICAL
- CVSS version (CVSS): 3.1
- Attack Vector (AV): Network (N)
- Attack Complexity (AC): Low (L)
- Privileges Required (PR): None (N)
- User Interaction (UI): None (N)
- Scope (S): Unchanged (U)
- Confidentiality (C): High (H)
- Integrity (I): None (N)
- Availability (A): High (H)
- Modified Attack Vector (MAV): Network (N)
- Modified Attack Complexity (MAC): Low (L)
- Modified Privileges Required (MPR): None (N)
- Modified User Interaction (MUI): None (N)
- Modified Confidentiality (MC): High (H)
- Modified Scope (MS): Unchanged (U)
- Modified Integrity (MI): None (N)
- Modified Availability (MA): High (H)
by @LeSuisse Activity log
- Created suggestion
- @LeSuisse accepted
- @LeSuisse published on GitHub
vLLM: OpenAI auth bypass
vLLM is an inference and serving engine for large language models (LLMs). From 0.3.0 until 0.22.0, a vulnerability in ASGI web servers and starlette's trust on those web servers enables an authentication bypass of the OpenAI API AuthenticationMiddleware. It allows to use the API without providing the configured VLLM_API_KEY or --api-key. This vulnerability is fixed in 0.22.0.
References
-
https://github.com/vllm-project/vllm/security/advisories/GHSA-94f4-hr76-p5j6 x_refsource_CONFIRM
-
https://github.com/vllm-project/vllm/pull/43426 x_refsource_MISC
-
https://x41-dsec.de/lab/advisories/x41-2026-002-starlette x_refsource_MISC
Affected products
- ==>= 0.3.0, < 0.22.0
Matching in nixpkgs
pkgs.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.pkgsRocm.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.python313Packages.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
Package maintainers
-
@happysalada Raphael Megzari <raphael@megzari.com>
-
@daniel-fahey Daniel Fahey <daniel.fahey+nixpkgs@pm.me>
-
@LunNova Luna Nova <nixpkgs-maintainer@lunnova.dev>
-
@CertainLach Yaroslav Bolyukin <iam@lach.pw>
5.3 MEDIUM
- CVSS version (CVSS): 4.0
- Attack Vector (AV): Network (N)
- Attack Complexity (AC): Low (L)
- Attack Requirement (AT): None (N)
- Privileges Required (PR): None (N)
- User Interaction (UI): Passive (P)
- Vulnerable System Impact Confidentiality (VC): Low (L)
- Vulnerable System Impact Integrity (VI): Low (L)
- Vulnerable System Impact Availability (VA): None (N)
- Subsequent System Impact Confidentiality (SC): None (N)
- Subsequent System Impact Integrity (SI): None (N)
- Subsequent System Impact Availability (SA): None (N)
- Modified Attack Vector (MAV): Network (N)
- Modified Attack Complexity (MAC): Low (L)
- Modified Attack Requirement (MAT): None (N)
- Modified Privileges Required (MPR): None (N)
- Modified User Interaction (MUI): Passive (P)
- Modified Vulnerable System Impact Confidentiality (MVC): Low (L)
- Modified Vulnerable System Impact Integrity (MVI): Low (L)
- Modified Vulnerable System Impact Availability (MVA): None (N)
- Modified Subsequent System Impact Confidentiality (MSC): Negligible (N)
- Modified Subsequent System Impact Integrity (MSI): Negligible (N)
- Modified Subsequent System Impact Availability (MSA): Negligible (N)
- Safety (S): Not Defined (X)
- Automatable (AU): Not Defined (X)
- Recovery (R): Not Defined (X)
- Value Density (V): Not Defined (X)
- Vulnerability Response Effort (RE): Not Defined (X)
- Provider Urgency (U): Not Defined (X)
- Confidentiality Req. (CR): Not Defined (X)
- Integrity Req. (IR): Not Defined (X)
- Availability Req. (AR): Not Defined (X)
- Exploit Maturity (E): Not Defined (X)
by @LeSuisse Activity log
- Created suggestion
- @LeSuisse accepted
- @LeSuisse published on GitHub
vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
References
-
https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4 x_refsource_CONFIRM
-
https://github.com/vllm-project/vllm/pull/44971 x_refsource_MISC
Affected products
- ==>= 0.5.5, < 0.23.1rc0
Matching in nixpkgs
pkgs.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.pkgsRocm.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.python313Packages.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
Package maintainers
-
@happysalada Raphael Megzari <raphael@megzari.com>
-
@daniel-fahey Daniel Fahey <daniel.fahey+nixpkgs@pm.me>
-
@LunNova Luna Nova <nixpkgs-maintainer@lunnova.dev>
-
@CertainLach Yaroslav Bolyukin <iam@lach.pw>
6.5 MEDIUM
- CVSS version (CVSS): 3.1
- Attack Vector (AV): Network (N)
- Attack Complexity (AC): Low (L)
- Privileges Required (PR): Low (L)
- User Interaction (UI): None (N)
- Scope (S): Unchanged (U)
- Confidentiality (C): None (N)
- Integrity (I): None (N)
- Availability (A): High (H)
- Modified Attack Vector (MAV): Network (N)
- Modified Attack Complexity (MAC): Low (L)
- Modified Privileges Required (MPR): Low (L)
- Modified User Interaction (MUI): None (N)
- Modified Confidentiality (MC): None (N)
- Modified Scope (MS): Unchanged (U)
- Modified Integrity (MI): None (N)
- Modified Availability (MA): High (H)
by @LeSuisse Activity log
- Created suggestion
- @LeSuisse accepted
- @LeSuisse published on GitHub
vLLM: OOM Denial of Service via Audio Decompression Bomb
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.23.1rc0, vLLM's /v1/audio/transcriptions endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. This vulnerability is fixed in 0.23.1rc0.
References
-
https://github.com/vllm-project/vllm/security/advisories/GHSA-6pr9-rp53-2pmc x_refsource_CONFIRM
-
https://github.com/vllm-project/vllm/pull/44970 x_refsource_MISC
Affected products
- ==< 0.23.1rc0
Matching in nixpkgs
pkgs.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.pkgsRocm.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.python313Packages.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
Package maintainers
-
@happysalada Raphael Megzari <raphael@megzari.com>
-
@daniel-fahey Daniel Fahey <daniel.fahey+nixpkgs@pm.me>
-
@LunNova Luna Nova <nixpkgs-maintainer@lunnova.dev>
-
@CertainLach Yaroslav Bolyukin <iam@lach.pw>
8.8 HIGH
- CVSS version (CVSS): 3.1
- Attack Vector (AV): Network (N)
- Attack Complexity (AC): Low (L)
- Privileges Required (PR): None (N)
- User Interaction (UI): Required (R)
- Scope (S): Unchanged (U)
- Confidentiality (C): High (H)
- Integrity (I): High (H)
- Availability (A): High (H)
- Modified Attack Vector (MAV): Network (N)
- Modified Attack Complexity (MAC): Low (L)
- Modified Privileges Required (MPR): None (N)
- Modified User Interaction (MUI): Required (R)
- Modified Confidentiality (MC): High (H)
- Modified Scope (MS): Unchanged (U)
- Modified Integrity (MI): High (H)
- Modified Availability (MA): High (H)
by @LeSuisse Activity log
- Created suggestion
- @LeSuisse accepted
- @LeSuisse published on GitHub
vLLM: Dependency Confusion Vulnerability in vLLM Dockerfile
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.1, the vLLM Dockerfile is vulnerable to a dependency confusion attack through the flashinfer-jit-cache package. The package is installed from a custom index (flashinfer.ai/whl/) using --extra-index-url, but the package name was not registered on PyPI, and UV_INDEX_STRATEGY="unsafe-best-match" is set globally. An attacker who registers flashinfer-jit-cache on PyPI with version 0.6.11.post2 can execute arbitrary code as root during the Docker build and backdoor every resulting container image, enabling exfiltration of all user prompts, API credentials, and model data from production vLLM deployments This vulnerability is fixed in 0.22.1.
References
-
https://github.com/vllm-project/vllm/security/advisories/GHSA-jrf6-vqxq-pjv2 x_refsource_CONFIRM
Affected products
- ==< 0.22.1
Matching in nixpkgs
pkgs.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.pkgsRocm.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.python313Packages.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
Package maintainers
-
@happysalada Raphael Megzari <raphael@megzari.com>
-
@daniel-fahey Daniel Fahey <daniel.fahey+nixpkgs@pm.me>
-
@LunNova Luna Nova <nixpkgs-maintainer@lunnova.dev>
-
@CertainLach Yaroslav Bolyukin <iam@lach.pw>
6.9 MEDIUM
- CVSS version (CVSS): 4.0
- Attack Vector (AV): Network (N)
- Attack Complexity (AC): Low (L)
- Attack Requirement (AT): None (N)
- Privileges Required (PR): None (N)
- User Interaction (UI): None (N)
- Vulnerable System Impact Confidentiality (VC): None (N)
- Vulnerable System Impact Integrity (VI): None (N)
- Vulnerable System Impact Availability (VA): Low (L)
- Subsequent System Impact Confidentiality (SC): None (N)
- Subsequent System Impact Integrity (SI): None (N)
- Subsequent System Impact Availability (SA): None (N)
- Modified Attack Vector (MAV): Network (N)
- Modified Attack Complexity (MAC): Low (L)
- Modified Attack Requirement (MAT): None (N)
- Modified Privileges Required (MPR): None (N)
- Modified User Interaction (MUI): None (N)
- Modified Vulnerable System Impact Confidentiality (MVC): None (N)
- Modified Vulnerable System Impact Integrity (MVI): None (N)
- Modified Vulnerable System Impact Availability (MVA): Low (L)
- Modified Subsequent System Impact Confidentiality (MSC): Negligible (N)
- Modified Subsequent System Impact Integrity (MSI): Negligible (N)
- Modified Subsequent System Impact Availability (MSA): Negligible (N)
- Safety (S): Not Defined (X)
- Automatable (AU): Not Defined (X)
- Recovery (R): Not Defined (X)
- Value Density (V): Not Defined (X)
- Vulnerability Response Effort (RE): Not Defined (X)
- Provider Urgency (U): Not Defined (X)
- Confidentiality Req. (CR): Not Defined (X)
- Integrity Req. (IR): Not Defined (X)
- Availability Req. (AR): Not Defined (X)
- Exploit Maturity (E): Not Defined (X)
by @LeSuisse Activity log
- Created suggestion
- @LeSuisse accepted
- @LeSuisse published on GitHub
vLLM: temperature=NaN and temperature=Infinity bypass validation and propagate to GPU kernels
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.23.1rc0, ll temperature validation gates use comparison operators (<, >), which silently evaluate to False for NaN and for positive Infinity in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce undefined behavior or CUDA errors that can crash the inference worker. This vulnerability is fixed in 0.23.1rc0.
References
-
https://github.com/vllm-project/vllm/security/advisories/GHSA-7h4p-rffg-7823 x_refsource_CONFIRM
-
https://github.com/vllm-project/vllm/pull/45116 x_refsource_MISC
Affected products
- ==< 0.23.1rc0
Matching in nixpkgs
pkgs.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.pkgsRocm.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.python313Packages.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
Package maintainers
-
@happysalada Raphael Megzari <raphael@megzari.com>
-
@daniel-fahey Daniel Fahey <daniel.fahey+nixpkgs@pm.me>
-
@LunNova Luna Nova <nixpkgs-maintainer@lunnova.dev>
-
@CertainLach Yaroslav Bolyukin <iam@lach.pw>
6.5 MEDIUM
- CVSS version (CVSS): 3.1
- Attack Vector (AV): Network (N)
- Attack Complexity (AC): High (H)
- Privileges Required (PR): None (N)
- User Interaction (UI): None (N)
- Scope (S): Unchanged (U)
- Confidentiality (C): Low (L)
- Integrity (I): High (H)
- Availability (A): None (N)
- Modified Attack Vector (MAV): Network (N)
- Modified Attack Complexity (MAC): High (H)
- Modified Privileges Required (MPR): None (N)
- Modified User Interaction (MUI): None (N)
- Modified Confidentiality (MC): Low (L)
- Modified Scope (MS): Unchanged (U)
- Modified Integrity (MI): High (H)
- Modified Availability (MA): None (N)
by @LeSuisse Activity log
- Created suggestion
- @LeSuisse accepted
- @LeSuisse published on GitHub
vLLM: Artifact Pin Decay in vLLM allows pinned deployments to load unpinned code, weights, and processors
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.0, vLLM's revision pinning controls do not consistently apply to all artifacts loaded for a model. A deployment that supplies --revision or --code-revision can still load dynamic code, GGUF files, image processors, retrieval side weights, or same-repository subfolder weights/config from an unpinned/default revision. This is a supply-chain integrity issue for pinned vLLM deployments. Operators can believe they are serving a reviewed model revision while vLLM resolves behavior-affecting nested or sibling artifacts outside that reviewed revision. This vulnerability is fixed in 0.22.0.
References
-
https://github.com/vllm-project/vllm/security/advisories/GHSA-3ww4-5jv9-j5gm x_refsource_CONFIRM
-
https://github.com/vllm-project/vllm/pull/42616 x_refsource_MISC
-
https://huntr.com/bounties/3f1e24c0-87d2-4f6c-a705-820f380879ac x_refsource_MISC
Affected products
- ==< 0.22.0
Matching in nixpkgs
pkgs.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.pkgsRocm.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.python313Packages.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
Package maintainers
-
@happysalada Raphael Megzari <raphael@megzari.com>
-
@daniel-fahey Daniel Fahey <daniel.fahey+nixpkgs@pm.me>
-
@LunNova Luna Nova <nixpkgs-maintainer@lunnova.dev>
-
@CertainLach Yaroslav Bolyukin <iam@lach.pw>
5.3 MEDIUM
- CVSS version (CVSS): 3.1
- Attack Vector (AV): Network (N)
- Attack Complexity (AC): Low (L)
- Privileges Required (PR): None (N)
- User Interaction (UI): None (N)
- Scope (S): Unchanged (U)
- Confidentiality (C): Low (L)
- Integrity (I): None (N)
- Availability (A): None (N)
- Modified Attack Vector (MAV): Network (N)
- Modified Attack Complexity (MAC): Low (L)
- Modified Privileges Required (MPR): None (N)
- Modified User Interaction (MUI): None (N)
- Modified Confidentiality (MC): Low (L)
- Modified Scope (MS): Unchanged (U)
- Modified Integrity (MI): None (N)
- Modified Availability (MA): None (N)
by @LeSuisse Activity log
- Created suggestion
- @LeSuisse accepted
- @LeSuisse published on GitHub
vLLM: incomplete CVE-2026-22778 fix leaks PIL repr addresses via Anthropic router
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.23.1rc0, the fix for CVE-2026-22778, which introduced a sanitize_message helper that strips object-repr memory addresses from error messages before they reach the client, is incomplete: several response paths echo str(exc) directly to clients without calling sanitize_message. The unsanitized sites include the Anthropic API router in vllm/entrypoints/anthropic/api_router.py (the POST /v1/messages and POST /v1/messages/count_tokens handlers), the Server-Sent Events streaming converter in vllm/entrypoints/anthropic/serving.py, and the realtime speech-to-text WebSocket in vllm/entrypoints/speech_to_text/realtime/connection.py. These paths catch the exception inside the route coroutine and construct the JSONResponse themselves, bypassing the sanitizing global FastAPI exception handler, and WebSocket frames do not traverse that handler chain at all. Using the same primitive as the parent issue, an unauthenticated attacker can send malformed image bytes through the Anthropic Messages API image content parts so that PIL.Image.open raises an UnidentifiedImageError whose message contains the BytesIO object repr, leaking the heap memory address verbatim in the error.message field of the response body. This vulnerability is fixed in 0.23.1rc0.
References
-
https://github.com/vllm-project/vllm/security/advisories/GHSA-hgg8-fqqc-vfmw x_refsource_CONFIRM
-
https://github.com/vllm-project/vllm/pull/45119 x_refsource_MISC
Affected products
- ==< 0.23.1rc0
Matching in nixpkgs
pkgs.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.pkgsRocm.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.python313Packages.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
Package maintainers
-
@happysalada Raphael Megzari <raphael@megzari.com>
-
@daniel-fahey Daniel Fahey <daniel.fahey+nixpkgs@pm.me>
-
@LunNova Luna Nova <nixpkgs-maintainer@lunnova.dev>
-
@CertainLach Yaroslav Bolyukin <iam@lach.pw>
7.5 HIGH
- CVSS version (CVSS): 3.1
- Attack Vector (AV): Network (N)
- Attack Complexity (AC): High (H)
- Privileges Required (PR): None (N)
- User Interaction (UI): Required (R)
- Scope (S): Unchanged (U)
- Confidentiality (C): High (H)
- Integrity (I): High (H)
- Availability (A): High (H)
- Modified Attack Vector (MAV): Network (N)
- Modified Attack Complexity (MAC): High (H)
- Modified Privileges Required (MPR): None (N)
- Modified User Interaction (MUI): Required (R)
- Modified Confidentiality (MC): High (H)
- Modified Scope (MS): Unchanged (U)
- Modified Integrity (MI): High (H)
- Modified Availability (MA): High (H)
by @LeSuisse Activity log
- Created suggestion
- @LeSuisse accepted
- @LeSuisse published on GitHub
vLLM: Security Check Bypass via assert Statement in Activation Function Loading Allows Arbitrary Code Execution
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.0, an assert-based security check in vLLM's activation function loading allows any unauthenticated attacker to achieve arbitrary code execution on the server by publishing a malicious HuggingFace model, when vLLM runs in Python optimized mode (python -O or PYTHONOPTIMIZE=1). This vulnerability is fixed in 0.22.0.
References
-
https://github.com/vllm-project/vllm/security/advisories/GHSA-q8gq-377p-jq3r x_refsource_CONFIRM
-
https://huntr.com/bounties/dcb05b04-e625-41e7-adbc-bbae0cc2d64c x_refsource_MISC
Affected products
- ==< 0.22.0
Matching in nixpkgs
pkgs.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.pkgsRocm.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
pkgs.python313Packages.vllm
High-throughput and memory-efficient inference and serving engine for LLMs
Package maintainers
-
@happysalada Raphael Megzari <raphael@megzari.com>
-
@daniel-fahey Daniel Fahey <daniel.fahey+nixpkgs@pm.me>
-
@LunNova Luna Nova <nixpkgs-maintainer@lunnova.dev>
-
@CertainLach Yaroslav Bolyukin <iam@lach.pw>