AI Music & Voice

08 Dec 2023 • Richard Kuo

This introduction includes Music Seperationm, Deep Singer, Voice Conversion, Voice Cloning, etc.

Music Seperation

Blog: Jukebox
model modified from VQ-VAE-2 Paper: Jukebox: A Generative Model for Music
Colab: Interacting with Jukebox

Blog: Microsoft’s AI generates voices that sing in Chinese and English
Paper: DeepSinger: Singing Voice Synthesis with Data Mined From the Web
Demo: DeepSinger: Singing Voice Synthesis with Data Mined From the Web

The alignment model based on the architecture of automatic speech recognition

The architecture of the singing model

The inference process of singing voice synthesis

Paper: An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning

Blog: Voice Cloning Using Deep Learning

Blog: Deep Voice 3: Scaling Text to Speech with Convolutional Sequence Learning
Paper: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Code: r9y9/deepvoice3_pytorch
Code: Kyubyong/deepvoice3

Paper: Neural Voice Cloning with a Few Samples
Code: SforAiDl/Neural-Voice-Cloning-With-Few-Samples

Blog: Voice Cloning: Corentin’s Improvisation On SV2TTS
Paper: Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Code: CorentinJ/Real-Time-Voice-Cloning

Synthesizer : The synthesizer is Tacotron2 without Wavenet

SV2TTS Toolbox

Paper: MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
Code: marcoppasini/MelGAN-VC

Paper: Vocoder-free End-to-End Voice Conversion with Transformer Network
Code: kaen2891/kaen2891.github.io

Paper: ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network
Demo: ConVoice: Real-Time Zero-Shot Voice Style Transfer

This site was last updated June 29, 2024.