Vista and Cambium Launch First Hybrid AI Inference Cloud

Vista Equity Partners and its infrastructure platform Cambium are placing a $500 million bet that the AI inference market doesn't belong to Nvidia alone. On Monday, the firms launched Vector Core Compute, what they're calling the world's first inference cloud that dynamically routes workloads across CPUs, GPUs, and specialized AI chips called RDUs (Reconfigurable Dataflow Units). The pitch: match or beat GPU performance on most enterprise AI tasks while cutting costs by 40%.

The announcement arrives as enterprises grapple with spiraling AI infrastructure bills and inference — the production phase of AI where models answer queries at scale — accounts for a growing share of cloud budgets. Vector's architecture sidesteps the GPU-only approach that's made Nvidia the trillion-dollar incumbent, instead treating silicon as a workload-specific resource.

"We're not anti-GPU," said Vista Managing Director Brian Sheth in a statement. "We're anti-waste. Most inference jobs don't need a $30,000 H100 to run efficiently. They need the right chip for the job, and a layer smart enough to route them there."

Vector launches with six data centers across North America and commitments from 14 Vista portfolio companies — including marketing automation platform Marketo, analytics firm Alliant, and edtech giant PowerSchool — to migrate inference workloads over the next 18 months. That gives Vector an initial customer base representing over $200 million in annual AI compute spend, according to company figures.

The Silicon Mix Strategy

Vector's technical foundation rests on SambaNova's RDU chips, which the company claims deliver better throughput-per-watt than GPUs for transformer-based models under 70 billion parameters. The platform layers an orchestration engine — developed in-house by Cambium — on top of that hardware, analyzing incoming inference requests and routing them to CPUs for lightweight tasks, RDUs for mid-range transformer workloads, and GPUs only when necessary.

That routing logic is the product differentiation. Vector's orchestrator evaluates model size, batch requirements, latency tolerance, and input token count in real time, then assigns the job to the cheapest chip capable of meeting service-level agreements. The company estimates that 60-70% of enterprise inference requests can run on CPUs or RDUs without degrading user experience.

"The hyperscalers sell you GPU instances because that's what they have," said Cambium CEO Ashwin Ballal. "We're chip-agnostic. If a customer's chatbot can run on an Intel Xeon at one-tenth the cost of an A100, why wouldn't they?"

Vector's initial hardware deployment includes 2,000 SambaNova RDUs, 5,000 Nvidia A100 and H100 GPUs, and dedicated CPU clusters optimized for inference. The company plans to add AMD Instinct MI300 accelerators and Cerebras wafer-scale engines by Q4 2026, further diversifying the silicon mix.

Vista's Portfolio Play

The Vector launch leverages Vista's $100 billion in assets under management and 80-plus software portfolio companies as a built-in distribution channel. Vista's playbook — acquire enterprise software companies, centralize back-office functions, and push portfolio-wide infrastructure standards — positions Vector as the default inference provider across the firm's holdings.

Fourteen Vista companies have committed to pilot programs starting in Q3 2026, with migration roadmaps targeting 50-80% of inference workloads within 12 months. Those companies currently spend an estimated $200-250 million annually on AI inference across AWS, Azure, and Google Cloud, according to interviews with portfolio CIOs.

The scale matters. If Vector successfully migrates even half that spend, it creates a revenue base exceeding $100 million annually — before adding external customers. Vista's strategy mirrors Amazon's path with AWS: build infrastructure to serve internal needs, then sell excess capacity externally once the unit economics prove out.

Vista isn't the first PE firm to verticalize infrastructure. Blackstone's QTS and Brookfield's data center portfolio follow similar logic. But Vector represents the first large-scale attempt to build a portfolio-wide AI cloud optimized for inference rather than training.

Provider	Architecture	Primary Use Case	Estimated Cost per 1M Tokens
AWS Bedrock (GPU)	Nvidia A100/H100	Training + Inference	$15-25
Google Vertex AI	TPU v5 + GPU	Training + Inference	$12-20
Azure OpenAI Service	Nvidia A100	Inference	$10-18
Vector Core Compute	CPU + RDU + GPU	Inference	$6-12 (est.)

Vector's pricing starts at $0.006 per 1,000 tokens for CPU-routed inference and scales to $0.012 for GPU-accelerated workloads, according to early customer contracts reviewed by this publication. That undercuts hyperscaler inference pricing by 30-50% depending on model size and latency requirements.

Why the Timing Works

The launch capitalizes on a structural shift in AI economics. Through 2024, most enterprise AI budgets went to training: buying or renting GPUs to fine-tune foundation models. But as models commoditize and enterprises move from pilots to production, inference costs now dominate. OpenAI's API business, for example, generates over 80% of revenue from inference calls, not training.

The RDU Gambit

Vector's reliance on SambaNova's RDUs introduces technical and strategic risk. RDUs are dataflow architectures optimized for specific AI workloads — particularly transformers and large language models — but they're not general-purpose. If a customer's workload doesn't map cleanly to dataflow execution, performance craters.

SambaNova claims its SN40L chip delivers 500 teraflops of AI performance at half the power draw of comparable GPUs, but third-party benchmarks remain sparse. The company has raised over $1 billion in venture funding and counts several large enterprises as customers, but it's still a distant challenger to Nvidia's CUDA ecosystem.

If RDUs underperform in production or SambaNova stumbles as a company, Vector's differentiation evaporates. The platform becomes just another GPU cloud with higher operational complexity.

Vector is hedging that risk by deploying multiple chip types and maintaining flexibility to swap silicon as performance and economics evolve. But the orchestration layer — Cambium's proprietary routing engine — only becomes defensible if it consistently delivers better price-performance than static GPU deployments.

"The orchestrator is the moat," said Ballal. "The chips change every 18 months. The routing logic that saves customers 40% doesn't."

What About the Hyperscalers?

Amazon, Google, and Microsoft aren't standing still. AWS launched Inferentia2 chips in 2023, custom silicon designed specifically for inference workloads. Google's TPU v5 offers similar cost advantages for TensorFlow models. Azure is deploying AMD MI300 accelerators alongside Nvidia hardware.

But hyperscalers face a structural disadvantage: they sell infrastructure, not outcomes. A customer running inference on AWS pays for instance hours, whether those hours are efficiently utilized or not. Vector's model charges per token processed, aligning cost with actual usage and creating pressure to optimize every workload.

Competitive Landscape

Vector enters a crowded market. Specialized inference providers like Replicate, Baseten, and Modal have raised hundreds of millions to build similar platforms, though most rely primarily on GPUs rather than hybrid architectures. CoreWeave, backed by Nvidia and valued at $19 billion, operates GPU-only clouds optimized for inference but at premium pricing.

The real competition comes from hyperscalers with embedded customer relationships. Migrating inference workloads off AWS or Azure requires rewriting deployment pipelines, retraining DevOps teams, and accepting vendor risk from a startup. That friction is why most enterprises default to their existing cloud provider even when pricing isn't competitive.

Vector's edge is Vista's portfolio. Those 80 companies aren't evaluating Vector against a blank slate — they're being steered toward it by the same firm that owns them. That captive demand solves the cold-start problem most infrastructure startups face.

But it also limits Vector's TAM. If the business never escapes Vista's orbit and relies entirely on portfolio companies, it's a cost-optimization play, not a platform business. The next 18 months will reveal whether Vector can sign customers who don't share a cap table with its parent company.

Economics and Scaling

Vector's initial $500 million capital commitment covers data center buildout, chip procurement, and 24 months of operating expenses. The company projects breakeven at $150 million in annual recurring revenue, assuming 60% gross margins — comparable to established cloud infrastructure providers.

Reaching that threshold depends on customer concentration. If Vector signs 20-30 large enterprises each spending $5-10 million annually, the unit economics work. If growth comes from hundreds of smaller customers, customer acquisition costs and support overhead could delay profitability.

Metric	Year 1 (2026)	Year 3 (2028)	Year 5 (2030)
Customer Count	14 (Vista portfolio)	50-75	150-200
Annual Inference Volume	50B tokens	500B tokens	2T+ tokens
Revenue (est.)	$30-50M	$200-300M	$750M-1B
Data Centers	6 (North America)	15 (NA + Europe)	25+ (Global)

Vector plans to add European data centers in Q1 2027 and Asia-Pacific facilities by 2028, contingent on hitting revenue milestones. International expansion introduces latency and data residency challenges that could complicate the routing logic — CPU inference works when the chip is geographically close to the user, but round-tripping to a distant RDU cluster negates the cost savings.

Cambium declined to disclose Vector's current utilization rates or whether the platform is processing live production workloads beyond pilot programs. The first real test arrives in Q4 2026, when Marketo plans to migrate 80% of its AI-powered email recommendation engine to Vector infrastructure.

What the Market Thinks

Analyst reactions split along predictable lines. Infrastructure bulls see Vector as a viable challenge to hyperscaler inference margins. Skeptics note that most "GPU killer" chips have failed to dislodge Nvidia, and enterprise inertia favors incumbents.

"Vista has the distribution and capital to make this work in the near term," said Patrick Moorhead, founder of Moor Insights & Strategy. "The question is whether the orchestration layer delivers enough value to justify the operational complexity. If you save 40% but add three months of migration work, most CIOs pass."

SambaNova's involvement raises questions about exclusivity and chip supply. If Vector's success depends on prioritized access to RDUs and SambaNova simultaneously sells those chips to other cloud providers, Vector loses its architectural advantage. The companies have not disclosed whether Vector holds exclusive rights to SambaNova capacity or pricing tiers.

One portfolio CIO, speaking on background, called the Vector mandate "a Vista tax with better PR." The executive noted that his team would have preferred to negotiate volume discounts directly with AWS rather than migrate to untested infrastructure. "We'll do the pilot because we have to, but if latency or reliability slips, we're back on Bedrock in a week."

Open Questions

Vector's launch leaves several strategic ambiguities unresolved. Will the platform eventually support model training, or remain inference-only? Can it profitably serve small and mid-market customers, or does the orchestration complexity require enterprise deal sizes?

Most importantly: does Vector eventually spin out and pursue external funding, or remain a captive Vista utility? If it stays internal, the business model caps out once Vista's portfolio companies are fully migrated. If it spins out, it competes head-to-head with better-capitalized rivals and loses its guaranteed customer base.

Vista has a track record of building internal platforms — including Vista Consulting Group and Vista Credit Partners — that remain wholly owned subsidiaries rather than independent entities. That structure works when the goal is cost arbitrage across the portfolio, not building a standalone platform business.

For now, Vector exists in the in-between: funded like a startup, positioned like a platform, but operating primarily as portfolio infrastructure. Whether that's a transitional phase or the permanent model is the story's next chapter.