A.I. Stuff

cpursley · Jul 16, 2025

RCA · Jul 16, 2025

cpursley said:

That's kind of amazing, even if the lyrics about GPS and so on are a bit weird for anybody but a computer program with a virtual mind.

djd09 · Jul 16, 2025

https://twitter.com/x/status/1945422280965669260

Digger88 · Jul 22, 2025

The scale at which Elon is accelerating is mind blowing there are 110k of these GB200s training for a soon to be released upgrade. Currently, Grok4 is running on a mixture of old and new.

The current Colossus supercluster, used to train models like Grok 3, consists of 100,000 NVIDIA H100 GPUs. The planned expansion for future Grok models involves adding 110,000 NVIDIA GB200 superchips, each containing 2 Blackwell B200 GPUs (for a total of 220,000 B200 GPUs in that phase).To compare their computational power for AI training:- NVIDIA states that a Blackwell B200 GPU delivers 4 times the training performance of an H100 GPU on large-scale GPT models, accounting for factors like Tensor Core efficiency, memory bandwidth, and precision support.- Therefore, each GB200 superchip (with 2 B200 GPUs) provides the equivalent training compute of 8 H100 GPUs (2 × 4).- The full 110,000 GB200s thus offer the equivalent of 880,000 H100 GPUs (110,000 × 8).This makes the GB200-based cluster 8.8 times more powerful than the current 100,000 H100 setup (880,000 ÷ 100,000 = 8.8).Note that this comparison focuses on training throughput; for inference, the multiplier could be significantly higher (up to 30× per NVIDIA claims), but training is the primary bottleneck for developing advanced models like future Grok versions. Power efficiency also improves with Blackwell, but the question centers on overall power (i.e., compute capability).

Digger88 · Jul 22, 2025

He doesn't like to finish second evidently

https://twitter.com/x/status/1947701807389515912

A.I. Stuff

cpursley

Senior Member

RCA

Elite Member

djd09

Elite Member

Digger88

Elite Member

Digger88

Elite Member