Back to overview

CVE-2026-53923

MEDIUM
5.3
CVSS 4.0
Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.

Metadata

CVE ID
CVE-2026-53923
State
PUBLISHED
Assigner
GitHub_M
Reserved
2026-06-11 15:46 UTC
Published
2026-06-22 21:55 UTC
Last updated
2026-06-22 21:55 UTC
Primary CWE
CWE-681
CWE-681: Incorrect Conversion between Numeric Types
Vendor / Product
vllm-project / vllm
Sources
cve.org  ·  NVD

Severity & Metrics

5.3 MEDIUM CVSS 4.0
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N
Affected products (1)
VendorProductPlatformVersions
vllm-project vllm >= 0.5.5, < 0.23.1rc0
Weakness (CWE)
CWESourceDescription
CWE-200 cna CWE-200: Exposure of Sensitive Information to an Unauthorized Actor
CWE-681 cna CWE-681: Incorrect Conversion between Numeric Types
CVSS scores (1)
ScoreSeverityVersionSourceVector
5.3 MEDIUM 4.0 cna CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N
References (3)
Back to overview