User Tools

Site Tools


ai:generalinfo

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ai:generalinfo [2024/06/01 22:16] – [General info] Wulf Rajekai:generalinfo [2024/07/23 19:56] (current) – [Hardware] Wulf Rajek
Line 19: Line 19:
 Embedding models:  Embedding models: 
 Embedding models are used to represent your documents using a sophisticated numerical representation. Embedding models take text as input, and return a long list of numbers used to capture the semantics of the text. These embedding models have been trained to represent text this way, and help enable many applications, including search! Embedding models are used to represent your documents using a sophisticated numerical representation. Embedding models take text as input, and return a long list of numbers used to capture the semantics of the text. These embedding models have been trained to represent text this way, and help enable many applications, including search!
 +[[https://www.marktechpost.com/2024/05/28/nv-embed-nvidias-groundbreaking-embedding-model-dominates-mteb-benchmarks/|NV Embed model]]
  
 Frameworks/GUIs: Frameworks/GUIs:
Line 35: Line 36:
 - Tensors: A tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. Tensors are a generalisation of scalars and vectors; a scalar is a zero rank tensor, and a vector is a first rank tensor. The rank (or order) of a tensor is defined by the number of directions (and hence the dimensionality of the array) required to describe it. It can be thought of as a multidimensional numerical array. An example of a tensor would be 1000 video frames of 640×480 size.  - Tensors: A tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. Tensors are a generalisation of scalars and vectors; a scalar is a zero rank tensor, and a vector is a first rank tensor. The rank (or order) of a tensor is defined by the number of directions (and hence the dimensionality of the array) required to describe it. It can be thought of as a multidimensional numerical array. An example of a tensor would be 1000 video frames of 640×480 size. 
 - Vector: Vectors are used to represent both the input data (features) and the output data (labels or   predictions). Each data point is represented as a feature vector, where each component of the vector corresponds to a specific feature or attribute of the data. - Vector: Vectors are used to represent both the input data (features) and the output data (labels or   predictions). Each data point is represented as a feature vector, where each component of the vector corresponds to a specific feature or attribute of the data.
-- Temperature: Temperature in AI settings control the diversity and creativity of the generated text. Value ranges between 0 and 2 depending on model. Higher temperature values result in more diverse outputs but less logically coherent, while lower values lead to more focused and deterministic text. +- Temperature: Temperature in AI settings control the diversity and creativity of the generated text. Value ranges between 0 and 2 depending on model. Higher temperature values result in more diverse outputs but less logically coherent, while lower values lead to more focused and deterministic text
 +- [[https://neo4j.com/blog/graphrag-manifesto/|GraphRAG]] RAG using a graph database instead or in addition to a vector database to return additional knowledge context to an LLM to answer a question or request
 ===== Evolution of AI ===== ===== Evolution of AI =====
  
Line 214: Line 216:
 GPT-3 175B model: Microsoft built a supercomputer with 285,000 CPU codes and 10,000 Nvidia V100 GPUs [[https://news.microsoft.com/source/features/innovation/openai-azure-supercomputer/|exclusively for OpenAI]], hosted in Azure.  Researchers calculated that to train GPT-3 OpenAI could have needed 34 days using 1024 A100 GPUs costing $5M just in compute time based on 175 billion parameters if the A100 had been available at the time. GPT-3 175B model: Microsoft built a supercomputer with 285,000 CPU codes and 10,000 Nvidia V100 GPUs [[https://news.microsoft.com/source/features/innovation/openai-azure-supercomputer/|exclusively for OpenAI]], hosted in Azure.  Researchers calculated that to train GPT-3 OpenAI could have needed 34 days using 1024 A100 GPUs costing $5M just in compute time based on 175 billion parameters if the A100 had been available at the time.
  
 +Llama 3.1 used 16,000 Nvidia H100 GPUs to train the [[https://venturebeat.com/ai/meta-unleashes-its-most-powerful-ai-model-llama-3-1-with-405b-parameters/|405B model]]. 
  
 +===== Evaluation =====
 +
 +https://www.philschmid.de/llm-evaluation
ai/generalinfo.1717276572.txt.gz · Last modified: by Wulf Rajek