AI Music & Voice
This introduction includes Music Seperationm, Deep Singer, Voice Conversion, Voice Cloning, etc.
Music Seperation
Spleeter
Paper: Spleeter: A FAST AND STATE-OF-THE ART MUSIC SOURCE
SEPARATION TOOL WITH PRE-TRAINED MODELS
Code: deezer/spleeter
Wave-U-Net
Paper: Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation
Code: f90/Wave-U-Net
Hyper Wave-U-Net
Paper: Improving singing voice separation with the Wave-U-Net using Minimum Hyperspherical Energy
Code: jperezlapillo/hyper-wave-u-net
MHE regularisation:
Demucs
Paper: Music Source Separation in the Waveform Domain
Code: facebookresearch/demucs
Deep Singer
OpenAI Jukebox
Blog: Jukebox
model modified from VQ-VAE-2
Paper: Jukebox: A Generative Model for Music
Colab: Interacting with Jukebox
DeepSinger
Blog: Microsoft’s AI generates voices that sing in Chinese and English
Paper: DeepSinger: Singing Voice Synthesis with Data Mined From the Web
Demo: DeepSinger: Singing Voice Synthesis with Data Mined From the Web
The alignment model based on the architecture of automatic speech recognition
The architecture of the singing model
The inference process of singing voice synthesis
Voice Conversion
Paper: An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Blog: Voice Cloning Using Deep Learning
Deep Voice 3
Blog: Deep Voice 3: Scaling Text to Speech with Convolutional Sequence Learning
Paper: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Code: r9y9/deepvoice3_pytorch
Code: Kyubyong/deepvoice3
Neural Voice Cloning
Paper: Neural Voice Cloning with a Few Samples
Code: SforAiDl/Neural-Voice-Cloning-With-Few-Samples
SV2TTS
Blog: Voice Cloning: Corentin’s Improvisation On SV2TTS
Paper: Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Code: CorentinJ/Real-Time-Voice-Cloning
Synthesizer : The synthesizer is Tacotron2 without Wavenet
SV2TTS Toolbox
MelGAN-VC
Paper: MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
Code: marcoppasini/MelGAN-VC
Vocoder-free End-to-End Voice Conversion
Paper: Vocoder-free End-to-End Voice Conversion with Transformer Network
Code: kaen2891/kaen2891.github.io
ConVoice
Paper: ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network
Demo: ConVoice: Real-Time Zero-Shot Voice Style Transfer
This site was last updated June 29, 2024.