Lecture

Large Language Models

21 Mar 2024 • Richard Kuo

Introduction to LLMs, Deep LLM, Time-series LLMs, Applications, etc.

History of LLMs

A Survey of Large Language Models
Since the introduction of Transformer model in 2017, large language models (LLMs) have evolved significantly.
ChatGPT saw 1.6B visits in May 2023.
Meta also released three versions of LLaMA-2 (7B, 13B, 70B) free for commercial use in July.

大型語言模型(>10B)的時間軸

計算記憶體的成長與Transformer大小的關係

AI and Memory Wall

Large Language Models

Open LLM Leaderboard

Transformer

Paper: Attention Is All You Need

ChatGPT

ChatGPT: Optimizing Language Models for Dialogue
ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022.

SOLAR-10.7B ~ Depth Upscaling

Paper: SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Code: https://huggingface.co/upstage/SOLAR-10.7B-v1.0
Depth-Upscaled SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model.
Leveraging state-of-the-art instruction fine-tuning methods, including supervised fine-tuning (SFT) and direct preference optimization (DPO), researchers utilized a diverse set of datasets for training. This fine-tuned model, SOLAR-10.7B-Instruct-v1.0, achieves a remarkable Model H6 score of 74.20, boasting its effectiveness in single-turn dialogue scenarios.

Qwen (通义千问)

model: Qwen/Qwen1.5-7B-Chat
Blog: Introducing Qwen1.5
Code: https://github.com/QwenLM/Qwen1.5
Kaggle: https://www.kaggle.com/code/rkuo2000/llm-qwen1-5

Yi (零一万物)

model: 01-ai/Yi-6B-Chat
Paper: CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
Paper: Yi: Open Foundation Models by 01.AI

Orca-Math

Paper: Orca-Math: Unlocking the potential of SLMs in Grade School Math
Dataset: https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k

Breeze (達哥)

model: MediaTek-Research/Breeze-7B-Instruct-v0_1
Paper: Breeze-7B Technical Report
Blog: Breeze-7B: 透過 Mistral-7B Fine-Tune 出來的繁中開源模型

Bialong (白龍)

Bilingual transfer learning based on QLoRA and zip-tie embedding
model: INX-TEXT/Bailong-instruct-7B

TAIDE

model: taide/TAIDE-LX-7B-Chat

TAIDE-LX-7B: 以 LLaMA2-7b 為基礎，僅使用繁體中文資料預訓練 (continuous pretraining)的模型，適合使用者會對模型進一步微調(fine tune)的使用情境。因預訓練模型沒有經過微調和偏好對齊，可能會產生惡意或不安全的輸出，使用時請小心。
TAIDE-LX-7B-Chat: 以 TAIDE-LX-7B 為基礎，透過指令微調(instruction tuning)強化辦公室常用任務和多輪問答對話能力，適合聊天對話或任務協助的使用情境。TAIDE-LX-7B-Chat另外有提供4 bit 量化模型，量化模型主要是提供使用者的便利性，可能會影響效能與更多不可預期的問題，還請使用者理解與注意。

Gemma

model: google/gemma-1.1-7b-it
Blog: Gemma: Introducing new state-of-the-art open models
Kaggle: https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora

Claude 3

Introducing the next generation of Claude

InflectionAI

Blog: Inflection AI 發表新基礎模型「Inflection-2.5 」，能力逼近 GPT-4！

Phind-70B

Blog: Introducing Phind-70B – closing the code quality gap with GPT-4 Turbo while running 4x faster
Blog: Phind - 給工程師的智慧搜尋引擎
Phind-70B is significantly faster than GPT-4 Turbo, running at 80+ tokens per second to GPT-4 Turbo’s ~20 tokens per second. We’re able to achieve this by running NVIDIA’s TensorRT-LLM library on H100 GPUs, and we’re working on optimizations to further increase Phind-70B’s inference speed.

LlaMA-3

model: meta-llama/Meta-Llama-3-8B-Instruct
Code: https://github.com/meta-llama/llama3/

Phi-3

model: microsoft/Phi-3-mini-4k-instruct”
Blog: Introducing Phi-3: Redefining what’s possible with SLMs

Octopus v4

model: NexaAIDev/Octopus-v4
Paper: Octopus v4: Graph of language models
Code: https://github.com/NexaAI/octopus-v4
design demo

LLM running locally

ollama

ollama -v
ollama
ollama pull llava
ollama run llava

Code: Github
Examples:

langchain-python-rag-privategpt

PrivateGPT

Code: https://github.com/zylon-ai/private-gpt

LM Studio

GPT4All

chmod +x gpt4all-installer-linux.run
./gpt4all-installer-linux.run
cd ~/gpt4all
./bin/chat

GPT4FREE

pip install g4f

Deep LLM

Deep Language Networks

Paper: Joint Prompt Optimization of Stacked LLMs using Variational Inference
Code: https://github.com/microsoft/deep-language-networks

Constitutional AI

Paper: Constitutional AI: Harmlessness from AI Feedback Two key phases:

Supervised Learning Phase (SL Phase)
- Step1: The learning starts using the samples from the initial model
- Step2: From these samples, the model generates self-critiques and revisions
- Step3: Fine-tine the original model with these revisions
Reinforcement Learning Phase (RL Phease)
- Step1. The model uses samples from the fine-tuned model.
- Step2. Use a model to compare the outputs from samples from the initial model and the fine-tuned model
- Step3. Decide which sample is better. (RLHF)
- Step4. Train a new “preference model” from the new dataset of AI preferences. This new “prefernece model” will then be used to re-train the RL (as a reward signal). It is now the RLHAF (Reinforcement Learning from AI feedback)