User Tools

Site Tools


ai:generalinfo

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
ai:generalinfo [2024/07/13 13:57] Wulf Rajekai:generalinfo [2024/07/23 19:56] (current) – [Hardware] Wulf Rajek
Line 215: Line 215:
  
 GPT-3 175B model: Microsoft built a supercomputer with 285,000 CPU codes and 10,000 Nvidia V100 GPUs [[https://news.microsoft.com/source/features/innovation/openai-azure-supercomputer/|exclusively for OpenAI]], hosted in Azure.  Researchers calculated that to train GPT-3 OpenAI could have needed 34 days using 1024 A100 GPUs costing $5M just in compute time based on 175 billion parameters if the A100 had been available at the time. GPT-3 175B model: Microsoft built a supercomputer with 285,000 CPU codes and 10,000 Nvidia V100 GPUs [[https://news.microsoft.com/source/features/innovation/openai-azure-supercomputer/|exclusively for OpenAI]], hosted in Azure.  Researchers calculated that to train GPT-3 OpenAI could have needed 34 days using 1024 A100 GPUs costing $5M just in compute time based on 175 billion parameters if the A100 had been available at the time.
 +
 +Llama 3.1 used 16,000 Nvidia H100 GPUs to train the [[https://venturebeat.com/ai/meta-unleashes-its-most-powerful-ai-model-llama-3-1-with-405b-parameters/|405B model]]. 
  
 ===== Evaluation ===== ===== Evaluation =====
  
 https://www.philschmid.de/llm-evaluation https://www.philschmid.de/llm-evaluation
ai/generalinfo.1720875442.txt.gz · Last modified: by Wulf Rajek