NVIDIA Vice President Says AI Computing Costs Far Exceed Employee Salaries, Both Sides Debate Economics

May 28, 2026 397 approx.2min News Factory Verified

AI算力成本 NVIDIA高管言论 AI经济性质疑

NVIDIA Vice President Bryan Catanzaro recently stated that his team's AI computing costs have far exceeded employee salaries. This information has been verified through multiple sources.

Why Computing Costs Exceed Labor Costs

AI system operation relies on large GPU clusters. Taking current mainstream training workflows as an example, a single training run for a 100-billion-parameter model requires thousands of high-performance GPUs running continuously for weeks. Electricity costs, data center cooling, and hardware depreciation constitute the main expenses. Catanzaro's statement refers to the cumulative consumption during daily inference and iteration phases, rather than a single experiment.

Employee salaries are fixed costs, while computing demand scales linearly with model size and user volume. As of May 2026 data, the daily electricity cost plus depreciation of a single GPU card is already close to the monthly salary of engineers in some regions. After large-scale deployment, the total gap widens further.

Technical Cost Breakdown

Computing expenses can be broken down into three parts: hardware procurement, electricity and cooling, and software scheduling efficiency. Hardware iterates every 18 months, and declining utilization of older cards leads to hidden waste. Electricity costs are influenced by data center location; while liquid cooling solutions can reduce PUE, the upfront modification costs are high. If scheduling algorithms fail to fully utilize GPUs, idle rates directly amplify the cost per task.

Hardware depreciation: Mainstream training cards have a lifespan of about 3 years
Electricity share: Some clusters already account for over 40% of total cost
Utilization rate: Unoptimized clusters commonly see idle rates above 30%

Impact on the Industry

Changes in cost structure are forcing companies to reassess AI project priorities. Startups are more inclined to call external APIs rather than build their own clusters to avoid fixed asset investments. Large tech companies are accelerating in-house chip and liquid cooling technology R&D, aiming to reduce per-token inference cost by an order of magnitude.

In 2026, some cloud service providers have adjusted their pricing strategies, charging higher fees for GPU instances with long-term occupancy. This move directly reflects the scarcity of underlying computing power.

Future Optimization Paths

Major directions for reducing computing costs include model compression, sparse computing, and new architectures. Model distillation can reduce parameter count within a controllable loss of accuracy. Sparse attention mechanisms only activate necessary computational units. New chips are optimized for specific operators, achieving higher theoretical peak utilization than general-purpose GPUs.