GenAI

Generative AI

生成式人工智慧職缺

AI 教材
AIGC 教材
GenAI-projects 教材

範例程式: git clone https://github.com/rkuo2000/GenAI


1. Text-to-Image

Image Creators

Bing-Create tutorial

Midjourney

Leonardo.ai

civitai

SeaArt.ai

TensorArt

<img width="50%" height="50%" src="https://github.com/rkuo2000/GenAI/raw/main/assets/Tensor.Art_Flux_girl.png"

OpenArt.ai

goenhance.ai

fluxpro.ai

SD 3.5

ComfyUI Now Supports Stable Diffusion 3.5!


ComfyUI

ComfyUI


本地部署Flux.1 最強文生圖大模型! Comfyui 一鍵安裝,簡單又方便

Flux1-dev-fp8 model files

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
mv ~/Downloads/flux1-dev-fp8.safetensors ~/ComfyUI/models/unet/
mv ~/Downloads/t5xxl_fp8_e4m3fn.safetensors ~/ComfyUI/models/clip/
mv ~/Downloads/clip_l.safetensors ~/ComfyUI/models/clip/
mv ~/Downloads/ae.safetensors ~/ComfyUI/models/vae/
python main.py
  1. open Browser at http:127.0.0.1:8188

  2. drag flux_dev_fp8_example.png to browser window to generate the work-flow chart

  1. edit text in CLIP Text Encode (Positive Prompt)
  2. click Queue Prompt to generate image

examples:

pretty Asian woman was holding the flowers in her hands, Korean Model, real photo style, full body shot.

One girl, long hair, model, white background, white shirt, khaki Capri pants, khaki loafers, sitting on a stool, lazy pose, slightly tilting head, smiling, Asian beauty, loose-ting clothes, inting clothes , slightly raised foot, half-body shot, Canon R5 camera style, blurred background, indoor, natural light, some sunlight shining on the face,9 : 16.


WebUI

Stable Diffusion WebUI


AI繪畫(Stable Diffusion),在WebUI Forge和ComfyUI使用

  1. Download WebUI-Forge
  2. Decompress 7z x webui_forge_cu124_torch24.7z
  3. Rename mv webui_forge_cu124_torch24 WebUI-Forge
  4. Run ./webui.sh

Krita

安裝與 ComfyUI 工作流匯入(建築景觀與室內設計應用)

FLUX.1[dev]模型在Krita完美整合


2. Text-to-3D

gTranslate + SDXL-Lightning + TripoSR + Blender


Image-to-3D

Zero123+++


TripoSR

Kaggle: https://www.kaggle.com/code/rkuo2000/triposr


Depth Pro

Code: https://github.com/apple/ml-depth-pro Kaggle: https://www.kaggle.com/code/rkuo2000/depth-pro


3. Text-to-Video

Tune-A-Video


Open-VCLIP


Dynamic Scene Transformer (DyST)


Text-to-Motion-Retrieval


Stable Video Diffusion

SV4D
SV4D was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution


Runway Gen3

Gen-3 Alpha Prompting Guide


Imagine.Art

<img width="50%" height="50%" src="https://github.com/rkuo2000/GenAI/raw/main/assets/ImagineArt_flying_cat_wearing_superman_suit.png"


RenderNet AI


SORA


Meta MovieGen


4. Text-to-Avatar

GAN 教材

HeyGen

sample

Hedra

Tutorial

LivePortrait

Tutorial

Demo


MuskTalk

ComfyUI-MuseTalk
<video src=https://github.com/TMElyralab/MuseTalk/assets/163980830/b2a879c2-e23a-4d39-911d-51f0343218e4 controls preload></video>


artflow.ai

Charactor Builder


5. Text-to-Song

Suno 教學

Tuneform

Specterr

Vizzy


ChatGPT(作詞) + SunoAI(作曲) + RVC WebUI (轉換人聲)

RVC-WebUI開源專案教學

RVC WebUI


Generative Speech


6. Text-to-Speech


7. Audio-to-Text (ASR)

webkitSpeechRecognition

Blog: 語音辨識API

asr.html

Google Speech Demo


Whisper


local ASR+LLM Server running on GPU

  1. run server on local PC (with GPU): python whisper_llm_server.py
  2. Generate audio file: python ../gTTS.py "Hello, how are you?" en
  3. Post Audio to Server: python post_audio.py

8. Text-to-Text (LLMs)

Large Language Models 教材
Prompt Engineering 教材

git clone https://github.com/rkuo2000/GenAI
cd GenAI/Text-to-Text

local LLM Server & Client


Colab running LLM Server


Colab running ASR+LLM Server

  1. Open colab to run pyngrok_Whisper_LLM_Server.ipynb on Colab T4
  2. Generate audio file: python ../gTTS.py "Hello, how are you?" en
  3. Post Audio to Server: python post_audio.py

Ollama

ollama library

ollama list
ollama run llama3.2

ollama chat/generate

ollama speak


LM Studio


Gemini API

get Gemini API Key


gemini.html


Gemini_Talk App 教學

MIT App Inventor 2 example for using Google Gemini

(三星手機使用三星文字轉語音引擎應用程式, 語言設繁體中文會講不出話, 要改成簡體中文, 或使用英文)


9. LLM Fine-Tuning

LLM Fine-Tuning 教材

PEFT

fine-tune-gemma-7b-it-for-sentiment-analysis
fine-tune-llama-3-for-sentiment-analysis

LoRA

fine-tune-gemma-models-in-keras-using-lora

exmaples


10. Image-to-Text (VLM)


examples


VLM servers

For running server, (use one of the following)

  1. python llava_server.py
  2. python llava_next_server.py
  3. python phi3-vision_server.py

For running client, (post image & text to VLM server)
python post_imgtxt.py images/barefeet1.jpg


ASR + VLM servers

  1. python whisper_llava_server.py
  2. python ../gTTS.py "這是什麼有名的台南美食?" zh (TTS)
  3. python post_imgau.py (client)

Gemini API


11. RAG

RAG 教材

Sampe Codes


RAG Builder


12. Agent

Agent 教材

openai/swarm

Kaggle: rkuo2000/swarm-llama3-groq
Colab: colab_Swarm_Llama3_Groq.ipynb


參考書籍

LLM 大型語言模型的絕世祕笈


最強 AI 投資分析:打造自己的股市顧問機器人,股票趨勢分析×年報解讀×選股推薦×風險管理