Large Language Models

Introduction to LLMs, Deep LLM, Time-series LLMs, Applications, etc.


History of LLMs

A Survey of Large Language Models
Since the introduction of Transformer model in 2017, large language models (LLMs) have evolved significantly.
ChatGPT saw 1.6B visits in May 2023.
Meta also released three versions of LLaMA-2 (7B, 13B, 70B) free for commercial use in July.

大型語言模型(>10B)的時間軸


計算記憶體的成長與Transformer大小的關係

AI and Memory Wall


Large Language Models

Open LLM Leaderboard

Transformer

Paper: Attention Is All You Need

ChatGPT

ChatGPT: Optimizing Language Models for Dialogue
ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022.


GPT4

Paper: GPT-4 Technical Report
Paper: From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Blog: GPT-4 Code Interpreter: The Next Big Thing in AI


LLaMA

Paper: LLaMA: Open and Efficient Foundation Language Models
Blog: Building a Million-Parameter LLM from Scratch Using Python
Kaggle: LLM LLaMA from scratch


BloombergGPT

Paper: BloombergGPT: A Large Language Model for Finance
Blog: Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance


Pythia

Paper: Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Dataset:
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Datasheet for the Pile
Code: Pythia: Interpreting Transformers Across Time and Scale


MPT-7B

model: mosaicml/mpt-7b-chat
Code: https://github.com/mosaicml/llm-foundry
Blog: Announcing MPT-7B-8K: 8K Context Length for Document Understanding
Blog: Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs


Falcon-40B

model: tiiuae/falcon-40b
Paper: The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only


Orca

Paper: Orca: Progressive Learning from Complex Explanation Traces of GPT-4


OpenLLaMA

model: openlm-research/open_llama_3b_v2
model: openlm-research/open_llama_7b_v2
Code: https://github.com/openlm-research/open_llama
Kaggle: https://www.kaggle.com/code/rkuo2000/llm-openllama


Vicuna

model: lmsys/vicuna-7b-v1.5
Paper: Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Code: https://github.com/lm-sys/FastChat


LLaMA-2

model: meta-llama/Llama-2-7b-chat-hf
Paper: Llama 2: Open Foundation and Fine-Tuned Chat Models
Code: https://github.com/facebookresearch/llama


Sheared LLaMA

model_name = “princeton-nlp/Sheared-LLaMA-1.3B”, princeton-nlp/Sheared-LLaMA-2.7B | princeton-nlp/Sheared-Pythia-160m
Paper: Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Code: https://github.com/princeton-nlp/LLM-Shearing


Neural-Chat-7B (Intel)

model_name = “Intel/neural-chat-7b-v3-1”
Blog: Intel neural-chat-7b Model Achieves Top Ranking on LLM Leaderboard!


Mistral

model_name = “mistralai/Mistral-7B-Instruct-v0.2”
Paper: Mistral 7B
Code: https://github.com/mistralai/mistral-src
Kaggle: https://www.kaggle.com/code/rkuo2000/llm-mistral-7b-instruct


Mistral 8X7B

model: mistralai/Mixtral-8x7B-v0.1
Paper: Mixtral of Experts


Starling-LM

model: Nexusflow/Starling-LM-7B-beta
Paper: RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Blog: Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF


Zephyr

model: HuggingFaceH4/zephyr-7b-beta
Paper: Zephyr: Direct Distillation of LM Alignment
Kaggle: https://www.kaggle.com/code/rkuo2000/llm-zephyr-7b
Blog: Zephyr-7B : HuggingFace’s Hyper-Optimized LLM Built on Top of Mistral 7B


Orca 2

model: microsoft/Orca-2-7b
Paper: https://arxiv.org/abs/2311.11045
Blog: Microsoft’s Orca 2 LLM Outperforms Models That Are 10x Larger


BlueLM (VIVO)

model: vivo-ai/BlueLM-7B-Chat-4bits
Code: https://github.com/vivo-ai-lab/BlueLM/


Taiwan-LLM (優必達+台大)

model: yentinglin/Taiwan-LLM-7B-v2.1-chat
Paper: TAIWAN-LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model
Blog: 專屬台灣!優必達攜手台大打造「Taiwan LLM」,為何我們需要本土化的AI?
Code: https://github.com/MiuLab/Taiwan-LLM


Phi-2 (Transformer with 2.7B parameters)

model: microsoft/phi-2
Blog: Phi-2: The surprising power of small language models
Kaggle: https://www.kaggle.com/code/rkuo2000/llm-phi-2


Mamba

model: Q-bert/Mamba-130M
Paper: Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Kaggle: https://www.kaggle.com/code/rkuo2000/llm-mamba-130m
Kaggle: https://www.kaggle.com/code/rkuo2000/llm-mamba-3b


SOLAR-10.7B ~ Depth Upscaling

Paper: SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Code: https://huggingface.co/upstage/SOLAR-10.7B-v1.0
Depth-Upscaled SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model.
Leveraging state-of-the-art instruction fine-tuning methods, including supervised fine-tuning (SFT) and direct preference optimization (DPO), researchers utilized a diverse set of datasets for training. This fine-tuned model, SOLAR-10.7B-Instruct-v1.0, achieves a remarkable Model H6 score of 74.20, boasting its effectiveness in single-turn dialogue scenarios.


Qwen (通义千问)

model: Qwen/Qwen1.5-7B-Chat
Blog: Introducing Qwen1.5
Code: https://github.com/QwenLM/Qwen1.5
Kaggle: https://www.kaggle.com/code/rkuo2000/llm-qwen1-5


Yi (零一万物)

model: 01-ai/Yi-6B-Chat
Paper: CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
Paper: Yi: Open Foundation Models by 01.AI


Orca-Math

Paper: Orca-Math: Unlocking the potential of SLMs in Grade School Math
Dataset: https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k


Breeze (達哥)

model: MediaTek-Research/Breeze-7B-Instruct-v0_1
Paper: Breeze-7B Technical Report
Blog: Breeze-7B: 透過 Mistral-7B Fine-Tune 出來的繁中開源模型


Bialong (白龍)

Bilingual transfer learning based on QLoRA and zip-tie embedding
model: INX-TEXT/Bailong-instruct-7B


TAIDE

model: taide/TAIDE-LX-7B-Chat

  • TAIDE-LX-7B: 以 LLaMA2-7b 為基礎,僅使用繁體中文資料預訓練 (continuous pretraining)的模型,適合使用者會對模型進一步微調(fine tune)的使用情境。因預訓練模型沒有經過微調和偏好對齊,可能會產生惡意或不安全的輸出,使用時請小心。
  • TAIDE-LX-7B-Chat: 以 TAIDE-LX-7B 為基礎,透過指令微調(instruction tuning)強化辦公室常用任務和多輪問答對話能力,適合聊天對話或任務協助的使用情境。TAIDE-LX-7B-Chat另外有提供4 bit 量化模型,量化模型主要是提供使用者的便利性,可能會影響效能與更多不可預期的問題,還請使用者理解與注意。

Gemma

model: google/gemma-1.1-7b-it
Blog: Gemma: Introducing new state-of-the-art open models
Kaggle: https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora


Claude 3

Introducing the next generation of Claude


InflectionAI

Blog: Inflection AI 發表新基礎模型「Inflection-2.5 」,能力逼近 GPT-4!


Phind-70B

Blog: Introducing Phind-70B – closing the code quality gap with GPT-4 Turbo while running 4x faster
Blog: Phind - 給工程師的智慧搜尋引擎
Phind-70B is significantly faster than GPT-4 Turbo, running at 80+ tokens per second to GPT-4 Turbo’s ~20 tokens per second. We’re able to achieve this by running NVIDIA’s TensorRT-LLM library on H100 GPUs, and we’re working on optimizations to further increase Phind-70B’s inference speed.


LlaMA-3

model: meta-llama/Meta-Llama-3-8B-Instruct
Code: https://github.com/meta-llama/llama3/


Phi-3

model: microsoft/Phi-3-mini-4k-instruct”
Blog: Introducing Phi-3: Redefining what’s possible with SLMs


Octopus v4

model: NexaAIDev/Octopus-v4
Paper: Octopus v4: Graph of language models
Code: https://github.com/NexaAI/octopus-v4
design demo


LLM running locally

ollama

ollama -v
ollama
ollama pull llava
ollama run llava

Code: Github
Examples:


PrivateGPT

Code: https://github.com/zylon-ai/private-gpt


LM Studio


GPT4All

chmod +x gpt4all-installer-linux.run
./gpt4all-installer-linux.run
cd ~/gpt4all
./bin/chat


GPT4FREE

pip install g4f


Deep LLM

Deep Language Networks

Paper: Joint Prompt Optimization of Stacked LLMs using Variational Inference
Code: https://github.com/microsoft/deep-language-networks


Constitutional AI

Paper: Constitutional AI: Harmlessness from AI Feedback Two key phases:

  1. Supervised Learning Phase (SL Phase)
    • Step1: The learning starts using the samples from the initial model
    • Step2: From these samples, the model generates self-critiques and revisions
    • Step3: Fine-tine the original model with these revisions
  2. Reinforcement Learning Phase (RL Phease)
    • Step1. The model uses samples from the fine-tuned model.
    • Step2. Use a model to compare the outputs from samples from the initial model and the fine-tuned model
    • Step3. Decide which sample is better. (RLHF)
    • Step4. Train a new “preference model” from the new dataset of AI preferences. This new “prefernece model” will then be used to re-train the RL (as a reward signal). It is now the RLHAF (Reinforcement Learning from AI feedback)

Chain-of-Thought Prompting

Paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models


ReAct Prompting

Paper: ReAct: Synergizing Reasoning and Acting in Language Models
Code: https://github.com/ysymyth/ReAct


Tree-of-Thoughts

Paper: Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Code: https://github.com/princeton-nlp/tree-of-thought-llm
Code: https://github.com/kyegomez/tree-of-thoughts


Tabular CoT

Paper: Tab-CoT: Zero-shot Tabular Chain of Thought
Code: https://github.com/Xalp/Tab-CoT


Survey of Chain-of-Thought

Paper: A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future


Chain-of-Thought Hub

Paper: Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models’ Reasoning Performance
Code: https://github.com/FranxYao/chain-of-thought-hub


Everything-of-Thoughts

Paper: Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation
Code: https://github.com/microsoft/Everything-of-Thoughts-XoT


R3

Paper: Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Code: https://github.com/WooooDyy/LLM-Reverse-Curriculum-RL


Time-series LLM

Paper: Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
**Papers: https://github.com/DaoSword/Time-Series-Forecasting-and-Deep-Learning

Time-LLM

Paper: Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
Code: https://github.com/KimMeen/Time-LLM


Time-LLM

Paper: Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
Code: https://github.com/KimMeen/Time-LLM


TimeGPT-1

Paper: TimeGPT-1


TEMPO

Paper: TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting
Code: https://github.com/liaoyuhua/tempo-pytorch


Lag-LLaMA

Paper: Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
Blog: From RNN/LSTM to Temporal Fusion Transformers and Lag-Llama
Code: https://github.com/time-series-foundation-models/lag-llama
Colab:


Timer

Paper: Timer: Transformers for Time Series Analysis at Scale


Applications

FunSearch

DeepMind發展用LLM解困難數學問題的方法


Automatic Evaluation

Paper: Can Large Language Models Be an Alternative to Human Evaluation?

Paper: A Closer Look into Automatic Evaluation Using Large Language Models
Code: https://github.com/d223302/A-Closer-Look-To-LLM-Evaluation


BrainGPT

Paper: DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation
Blog: New Mind-Reading “BrainGPT” Turns Thoughts Into Text On Screen


Designing Silicon Brains using LLM

Paper: Designing Silicon Brains using LLM: Leveraging ChatGPT for Automated Description of a Spiking Neuron Array


MyGirlGPT

Code: https://github.com/Synthintel0/MyGirlGPT


Robotic Manipulation

Paper: Language-conditioned Learning for Robotic Manipulation: A Survey
Paper: Human Demonstrations are Generalizable Knowledge for Robots


ALTER-LLM

Paper: From Text to Motion: Grounding GPT-4 in a Humanoid Robot “Alter3”



This site was last updated June 29, 2024.