Model FineTuning

Model Training

Three Phases of Model Training

Blog: 解密 LLM 訓練三部曲:深入解析 SFT 與關鍵的 RLHF 技術

  • 第一階段 (Self-Supervised Pre-Training):Pre-trained LLM
  • 第二階段 (Supervised Fine-Tuning):SFT LLM
  • 第三階段 (Reinforcement Learning from Human Feedback):Reward Model 與 Final Model

Blog: RLHF: Reinforcement Learning from Human Feedback


Pre-Train & Alignment (SFT, RLHF)


Post-Training & Forgetting


Build a Large Language Model (From Scratch)

Book: 讓 AI 好好說話!從頭打造 LLM (大型語言模型) 實戰秘笈

Build A Reasoning Model (From Scratch)


Model Fine-Tuning

Open-Source AI Cookbook


SmolVLM with TRL (for ChartQA)

Blog: Fine-tuning SmolVLM with TRL on a consumer GPU
Model: HuggingFaceTB/SmolVLM-Instruct
Dataset: HuggingFaceM4/ChartQA


SmolVLM with TRL (for Invoice Parser)

Prompt: Fine-tuning Invoice Parser
Model: HuggingFaceTB/SmolVLM-Instruct
Dataset: mychen76/invoices-and-receipts_ocr_v1


VLM with DPO (for Super GPT-4V)

Blog: 使用 TRL 對視覺語言模型進行偏好最佳化
Model: HuggingFaceM4/idefics2-8b
Dataset: openbmb/RLAIF-V-Dataset


LLM with GRPO (for GSM8K)

Blog: Advanced GRPO Fine-tuning for Mathematical Reasoning with Multi-Reward Training
Model: Qwen/Qwen2.5-3B-Instruct
Dataset: openai/gsm8k


VLM with GRPO (for Reasoning)

Blog: Post training a VLM for reasoning with GRPO using TRL
Model: Qwen/Qwen2.5-VL-3B-Instruct
Dataset: lmms-lab/multimodal-open-r1-8k-verified


Intelligence Benchmarks

OpenAI Evals


SWE-Bench

Paper: SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Code: https://github.com/SWE-bench/SWE-bench


MLE-Bench

Paper: MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Code: https://github.com/openai/mle-bench


SWE-Lancer

Paper: SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

PaperBench

Paper: PaperBench: Evaluating AI’s Ability to Replicate AI Research
Code: https://github.com/openai/frontier-evals


SWE-Bench Pro

Paper: SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
Code: https://github.com/scaleapi/SWE-bench_Pro-os


GPDval

Paper: GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks
Dataset: https://huggingface.co/datasets/openai/gdpval

  • 220 real-world knowledge tasks across 44 occupations.
  • Each task consists of a text prompt and a set of supporting reference files.

ToolOrchestra

Paper: ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Code: https://github.com/NVlabs/ToolOrchestra



This site was last updated December 19, 2025.