ai:generalinfo
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
ai:generalinfo [2024/04/23 00:37] – removed - external edit (Unknown date) 127.0.0.1 | ai:generalinfo [2024/07/23 19:56] (current) – [Hardware] Wulf Rajek | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== AI Info ====== | ||
+ | ===== General info ===== | ||
+ | |||
+ | Large language models vs small language models: | ||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | |||
+ | Transformer Model file types: | ||
+ | GGUF: binary model format optimised for quick loading and saving. Uses GGML as executor. used by llama.cpp framework. successor file format to GGML, GGMF and GGJT | ||
+ | PyTorch: Models usually trained in PyTorch. Format can be converted into GGUF format. (https:// | ||
+ | Safetensors: | ||
+ | |||
+ | Instruct/ | ||
+ | |||
+ | Embedding models: | ||
+ | Embedding models are used to represent your documents using a sophisticated numerical representation. Embedding models take text as input, and return a long list of numbers used to capture the semantics of the text. These embedding models have been trained to represent text this way, and help enable many applications, | ||
+ | [[https:// | ||
+ | |||
+ | Frameworks/ | ||
+ | llama.cpp: https:// | ||
+ | GGML: Tensor library for machine learning https:// | ||
+ | GPT4All: https:// | ||
+ | |||
+ | ===== Terminology ===== | ||
+ | - Inference: the process that a trained machine learning model uses to draw conclusions from brand new data. (coming up with a solution to a question) | ||
+ | - Q2/Q5/etc: These are quantisation standards used by llama.cpp/ | ||
+ | - SOTA: SOTA is an acronym for State-Of-The-Art. It refers to the best models that can be used for achieving the results in an AI-specific task. | ||
+ | - GPT: Generative Pre Trained | ||
+ | - Parameters: 7B/ | ||
+ | - LoRA (Low-Rank Adaptation) is a fine-tuning method developed by MS researchers in 2021. LoRA is a type of Parameter-efficient Fine-tuning (PEFT). | ||
+ | RAG: Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources. (using databases, files etc) | ||
+ | - Tensors: A tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. Tensors are a generalisation of scalars and vectors; a scalar is a zero rank tensor, and a vector is a first rank tensor. The rank (or order) of a tensor is defined by the number of directions (and hence the dimensionality of the array) required to describe it. It can be thought of as a multidimensional numerical array. An example of a tensor would be 1000 video frames of 640×480 size. | ||
+ | - Vector: Vectors are used to represent both the input data (features) and the output data (labels or | ||
+ | - Temperature: | ||
+ | - [[https:// | ||
+ | ===== Evolution of AI ===== | ||
+ | |||
+ | Traditional AI: focuses on analyzing historical data and making future numeric predictions (' | ||
+ | |||
+ | Generative AI: allows computers to produce brand-new outputs that are often indistinguishable from human-generated content. Can become toxic or show harmful behaviour (early ChatGPT like systems) | ||
+ | |||
+ | Generative AI with RLHF: Reinforcement Learning from Human Feedback (RLHF) is a method to adjust the training of the AI with human preferences. Feedback collected from humans for tasks/ | ||
+ | Red-teaming attempts to trigger toxic responses to train an AI to avoid them, but requires teams and time to test and come up with triggers. A new AI trained to be curious to find triggers can automate this. Safe AIs were tested and almost 200 toxic responses were triggered quickly. https:// | ||
+ | |||
+ | Constitutional AI: provides a transparent method of reducing the toxicity and harmful behaviour exhibited by generative language models. It uses a set of rules or principles that act as a " | ||
+ | |||
+ | https:// | ||
+ | |||
+ | Possible next step/future is structured approach: Symbolica is trying to remove the unknowable black-box in Generative AI's decision making with more rigorous, scientific foundation. They' | ||
+ | https:// | ||
+ | |||
+ | Objective driven AI: https:// | ||
+ | |||
+ | ===== The Big Players ===== | ||
+ | |||
+ | ^Company^Source^Model^Type/ | ||
+ | |OpenAI|Closed|ChatGPT3.5/ | ||
+ | |OpenAI|Closed|DALL-E/ | ||
+ | |OpenAI|Open|[[https:// | ||
+ | |Anthropic|Closed|Claude/ | ||
+ | |Stability.AI|Open|[[https:// | ||
+ | |Stability.AI|Open|[[https:// | ||
+ | |Microsoft|Closed|[[https:// | ||
+ | |Microsoft|Open|[[https:// | ||
+ | |Microsoft|Open|[[https:// | ||
+ | |Google|Free|[[https:// | ||
+ | |Google|Free|[[https:// | ||
+ | |Meta|Open|[[https:// | ||
+ | |Meta|Closed|Meta AI|Imagine (image) and chat (Llama)| | ||
+ | |MidJourney|Closed|MidJourney|Image generator AI| | ||
+ | |Falcon AI|Open|FalconLLM|chat AI https:// | ||
+ | |Mistral|Open|Mixtral and others|trained on Llama2 70B and outperforming it as well as ChatGPT 3.5 with faster inference | | ||
+ | |Mistral|Open|[[https:// | ||
+ | |Reka|Closed|Reka Core|on par with GPT-4 and Claude 3 Opus| | ||
+ | |Alibaba|Open|[[https:// | ||
+ | |HuggingFace|Open|[[https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | |||
+ | Open=Open Source, Free=Closed source with local use, Closed=Closed source and only free or paid cloud usage. | ||
+ | |||
+ | Not confirmed info about copilot (enterprise? | ||
+ | Microsoft 365 Copilot is grounded against your tenant data, while ChatGPT (including the Pro version) is not. Grounded means - having access to the data and using it as its " | ||
+ | In addition, Microsoft 365 Copilot is not getting trained on your company data and information. | ||
+ | Whenever you prompt/send a query, it has to look for the information. | ||
+ | Once the response was given, Copilot forgets about what it had just found to ensure customer data stays within the customer' | ||
+ | |||
+ | |||
+ | ===== Features / Use ===== | ||
+ | |||
+ | Free versions: | ||
+ | |||
+ | Claude 3 Sonnet, free, no live internet access, data to Aug 2023, Human writing style, verbose, largest context window (memory of chat) of 200000 default, up to 1m. Needs google account and phone number verification. Images can be uploaded. No image generation. | ||
+ | |||
+ | Gemini 1, free, needs google account. Live interner access. Good for programming, | ||
+ | |||
+ | Chatgpt 3.5, | ||
+ | |||
+ | Mistral 7B / Orca, on par with llama2 70B model for local use, outperforms GPT-4 in some tasks, good for text analysis. (https:// | ||
+ | |||
+ | Copilot, based on chatgpt 4 with Dall-E and with MS enhancements and integration with MS infrastructure. Live internet access. Text chat without account, image generation requires personal MS account. | ||
+ | |||
+ | ===== News / Bugs / others ===== | ||
+ | |||
+ | https:// | ||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | |||
+ | ===== Image/Video AIs ===== | ||
+ | |||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | |||
+ | Cerule: https:// | ||
+ | |||
+ | ===== Redbox / Claude ===== | ||
+ | |||
+ | Redbox is a framework in development by the UK government to assist civil servants. It's meant to be used primarily with Anthropic' | ||
+ | |||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | |||
+ | ===== text summarisers ===== | ||
+ | |||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | -> .env file support fork https:// | ||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | |||
+ | ===== llama/ | ||
+ | |||
+ | ollama is a compatibility framework for llama.cpp translating chat requests for the models. | ||
+ | llama.cpp is a framework running llama models locally. | ||
+ | llama / llama2 are models from Meta | ||
+ | |||
+ | https:// | ||
+ | https:// | ||
+ | |||
+ | ===== QAnything ===== | ||
+ | |||
+ | https:// | ||
+ | |||
+ | ===== PrivateGPT ===== | ||
+ | |||
+ | see [[ai: | ||
+ | |||
+ | |||
+ | ===== single file AI llamafile ===== | ||
+ | |||
+ | llamafiles are single file bundles of an AI model and web based GUI as well as OpenAI compatible API working on windows/ | ||
+ | |||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | https:// | ||
+ | |||
+ | ===== Coding AI ===== | ||
+ | |||
+ | https:// | ||
+ | |||
+ | ===== Hardware ===== | ||
+ | |||
+ | Nvidia Tesla H100 ' | ||
+ | Nvidia B100 and B200 announced to succeed the H100 | ||
+ | Nvidia Tesla A100 ' | ||
+ | Nvidia Tesla A100 ' | ||
+ | Nvidia Tesla V100 ' | ||
+ | AMD MI300X : cost around USD10, | ||
+ | Intel Gaudi 3 announced for Q3/24 claiming 1.5% performance of Nvidia H100 | ||
+ | |||
+ | Ollama reqs: | ||
+ | Any modern CPU with at least 4 cores is recommended. | ||
+ | For running 13B models, a CPU with at least 8 cores is recommended. | ||
+ | You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. | ||
+ | |||
+ | Real world stats: | ||
+ | CPU only, mistral-7b-openorca.Q5_K_M model, 32GB RAM, Intel Core i5-1135G7 @ 2.40GHz (NUC11) | ||
+ | Prompt eval 7.5-8 tokens/ | ||
+ | 58 seconds model load time. | ||
+ | |||
+ | |||
+ | Examples of computing power required to generate/ | ||
+ | [[https:// | ||
+ | |||
+ | GPT-3 175B model: Microsoft built a supercomputer with 285,000 CPU codes and 10,000 Nvidia V100 GPUs [[https:// | ||
+ | |||
+ | Llama 3.1 used 16,000 Nvidia H100 GPUs to train the [[https:// | ||
+ | |||
+ | ===== Evaluation ===== | ||
+ | |||
+ | https:// |