AI Music & Voice

This introduction includes Music Seperationm, Deep Singer, Voice Conversion, Voice Cloning, etc.


Music Seperation

Spleeter

Paper: Spleeter: A FAST AND STATE-OF-THE ART MUSIC SOURCE SEPARATION TOOL WITH PRE-TRAINED MODELS
Code: deezer/spleeter


Wave-U-Net

Paper: Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation
Code: f90/Wave-U-Net


Hyper Wave-U-Net

Paper: Improving singing voice separation with the Wave-U-Net using Minimum Hyperspherical Energy
Code: jperezlapillo/hyper-wave-u-net
MHE regularisation:


Demucs

Paper: Music Source Separation in the Waveform Domain
Code: facebookresearch/demucs


Deep Singer

OpenAI Jukebox

Blog: Jukebox
model modified from VQ-VAE-2 Paper: Jukebox: A Generative Model for Music
Colab: Interacting with Jukebox


DeepSinger

Blog: Microsoft’s AI generates voices that sing in Chinese and English
Paper: DeepSinger: Singing Voice Synthesis with Data Mined From the Web
Demo: DeepSinger: Singing Voice Synthesis with Data Mined From the Web

The alignment model based on the architecture of automatic speech recognition

The architecture of the singing model

The inference process of singing voice synthesis


Voice Conversion

Paper: An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning

Blog: Voice Cloning Using Deep Learning


Deep Voice 3

Blog: Deep Voice 3: Scaling Text to Speech with Convolutional Sequence Learning
Paper: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Code: r9y9/deepvoice3_pytorch
Code: Kyubyong/deepvoice3


Neural Voice Cloning

Paper: Neural Voice Cloning with a Few Samples
Code: SforAiDl/Neural-Voice-Cloning-With-Few-Samples


SV2TTS

Blog: Voice Cloning: Corentin’s Improvisation On SV2TTS
Paper: Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Code: CorentinJ/Real-Time-Voice-Cloning

Synthesizer : The synthesizer is Tacotron2 without Wavenet

SV2TTS Toolbox


MelGAN-VC

Paper: MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
Code: marcoppasini/MelGAN-VC


Vocoder-free End-to-End Voice Conversion

Paper: Vocoder-free End-to-End Voice Conversion with Transformer Network
Code: kaen2891/kaen2891.github.io


ConVoice

Paper: ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network
Demo: ConVoice: Real-Time Zero-Shot Voice Style Transfer



This site was last updated June 29, 2024.