CVE-2026-53923
MEDIUM
5.3
CVSS 4.0
Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Metadata
Severity & Metrics
5.3
MEDIUM CVSS 4.0
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N
Affected products (1)
| Vendor | Product | Platform | Versions |
|---|---|---|---|
| vllm-project | vllm | — | >= 0.5.5, < 0.23.1rc0 |
Weakness (CWE)
CVSS scores (1)
| Score | Severity | Version | Source | Vector |
|---|---|---|---|---|
| 5.3 | MEDIUM | 4.0 | cna | CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N |
References (3)
- https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4 https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4
- https://github.com/vllm-project/vllm/pull/44971 https://github.com/vllm-project/vllm/pull/44971
- https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e