A state-of-the-art web UI for Retrieval-based Voice Conversion (RVC) — featuring fast inference, model downloading, voice splitting, training, real-time conversion, and a full command-line interface.
Note
Advanced RVC Inference will no longer receive frequent updates. Going forward, development will focus mainly on security patches, dependency updates, and occasional feature improvements. This is because the project is already stable and mature with limited room for further improvements. Pull requests are still welcome and will be reviewed.
| Category | Details |
|---|---|
| Voice Inference | Single & batch audio conversion, TTS synthesis, pitch shifting, F0 autotune, formant shifting, audio cleaning, and Whisper-based transcription |
| Audio Separation | Vocal/instrumental isolation using UVR5 models (MDX-Net, Roformer, BS-Roformer), karaoke separation, reverb removal, and denoising |
| Real-Time Conversion | Live microphone voice conversion with VAD (Voice Activity Detection) and low-latency processing |
| Training Pipeline | End-to-end training from dataset creation (YouTube/local), preprocessing, feature extraction, and model training with overtraining detection |
| Model Management | Download models from URLs (HuggingFace, direct links), create .index files, model format conversion, and reference set creation |
| Extra Tools | F0 extraction, voice fusion, SRT subtitle generation, model info reader, and configurable settings |
| CLI | Full command-line interface for all operations — rvc-cli with subcommands for inference, separation, training, and more |
| Downloads Tab | Built-in model and asset downloader accessible directly from the web UI |
Advanced RVC Inference supports an extensive range of pitch extraction algorithms:
Standard Methods:
rmvpe · crepe-full · fcpe · harvest · pyin · hybrid
Extended Methods (30+):
mangio-crepe-tiny/small/medium/large/full · crepe-tiny/small/medium/large/full · fcpe-legacy · fcpe-previous · rmvpe-clipping · rmvpe-medfilt · hpa-rmvpe · hpa-rmvpe-medfilt · dio · yin · swipe · piptrack · penn · mangio-penn · djcm · swift · pesto · and more
Hybrid Methods (combine two algorithms):
hybrid[pm+dio] · hybrid[pm+crepe-tiny] · hybrid[pm+crepe] · hybrid[pm+fcpe] · hybrid[pm+rmvpe] · hybrid[crepe-tiny+crepe] · hybrid[dio+crepe] · and more combinations
rmvpeis the recommended default for most use cases, offering the best balance of speed and accuracy.
- Python 3.10, 3.11, or 3.12
- PyTorch ≥ 2.3.1 (with CUDA support recommended for GPU acceleration)
- FFmpeg installed and available in your system PATH
pip install git+https://github.com/ArkanDash/Advanced-RVC-Inference.gitpip install git+https://github.com/ArkanDash/Advanced-RVC-Inference.git
pip install onnxruntime-gpugit clone https://github.com/ArkanDash/Advanced-RVC-Inference.git
cd Advanced-RVC-Inference
pip install -r requirements.txtClick the badge below to open the notebook directly in Colab — everything installs and runs with a single click:
Launch the Gradio web UI — this is the easiest way to get started:
# Using the GUI entry point
rvc-gui
# Or via Python module
python -m advanced_rvc_inference.app.gui
# With a public share link
python -m advanced_rvc_inference.app.gui --shareThe web interface will be available at https://localhost:7860 by default.
The rvc-cli tool provides full access to all features directly from the terminal. For the complete command reference, see the CLI Guide.
# Show all available commands
rvc-cli --help# Basic conversion
rvc-cli infer -m model.pth -i input.wav -o output.wav
# With pitch shift (one octave up = +12 semitones)
rvc-cli infer -m model.pth -i input.wav -p 12 -o output.wav
# With a specific F0 method and format
rvc-cli infer -m model.pth -i input.wav --f0_method crepe-full -f flac# Separate vocals from instrumental
rvc-cli uvr -i song.mp3
# Use a specific UVR model
rvc-cli uvr -i song.mp3 --model BS-Roformer# Download from HuggingFace or direct URL
rvc-cli download -l "https://un5nj085u7ht3exwhj5g.julianrbryant.com/user/model/resolve/main/model.pth"# Show system info, GPU status, and installed models
rvc-cli info
rvc-cli list-models
rvc-cli list-f0-methodsThe Gradio web interface is organized into several tabs, each dedicated to a specific workflow:
The main workspace for voice conversion. Supports single file conversion, batch processing on folders, audio separation (UVR5), Whisper-based transcription, and TTS synthesis. Fine-tune parameters like pitch shift, filter radius, index rate, F0 method, formant shifting, audio cleaning, and more.
Perform live voice conversion using your microphone. Configure input/output devices, pitch, and conversion parameters for real-time processing with minimal latency.
Complete training pipeline accessible from the web UI:
- Create Dataset — Build training data from YouTube URLs or local audio files, with optional vocal separation and cleaning
- Create Reference — Generate reference audio sets for improved inference quality
- Train — Train RVC models with configurable epochs, batch size, optimizer, overtraining detection, and more
Built-in model and asset downloader. Paste URLs from HuggingFace or other sources to download models directly into the correct directory.
Additional utilities:
- Model Reader — Inspect model metadata and configuration
- Model Converter — Convert between model formats (v1/v2, PyTorch/ONNX)
- F0 Extract — Extract pitch contours from audio files
- Fusion — Blend two voice models together
- SRT Generator — Create subtitle files from audio
- Settings — Configure application preferences
Advanced-RVC-Inference/
├── advanced_rvc_inference/
│ ├── app/
│ │ ├── gui.py # Main entry point & Gradio app
│ │ └── tabs/
│ │ ├── inference/ # Inference, separation, TTS, Whisper
│ │ ├── realtime/ # Real-time mic conversion
│ │ ├── training/ # Dataset creation, extraction, training
│ │ ├── downloads/ # Model downloader tab
│ │ └── extra/ # Extra tools (fusion, SRT, settings, etc.)
│ ├── api/
│ │ └── cli.py # Full CLI interface (rvc-cli)
│ ├── configs/ # Model configs (v1, v2, ringformer, etc.)
│ ├── core/ # Core utilities (UI, process, restart)
│ ├── library/ # ML backends (predictors, embedders, ONNX)
│ ├── rvc/
│ │ ├── infer/ # Inference engine & audio conversion
│ │ ├── realtime/ # Real-time voice conversion
│ │ └── train/ # Preprocessing, extraction, training
│ ├── uvr/ # UVR5 audio separation library
│ └── utils/ # Shared variables & utilities
├── Advanced-RVC.ipynb # Google Colab notebook
├── rvc-cli.sh # CLI wrapper script
├── requirements.txt # Python dependencies
└── pyproject.toml # Package configuration
Make sure you have the CUDA toolkit installed and PyTorch built with CUDA support:
# Install PyTorch with CUDA 11.8
pip install torch torchvision torchaudio --index-url https://un5n68bzwetvqf6gvupwuh4k1eja2.julianrbryant.com/whl/cu118
# Install PyTorch with CUDA 12.1
pip install torch torchvision torchaudio --index-url https://un5n68bzwetvqf6gvupwuh4k1eja2.julianrbryant.com/whl/cu121Verify your GPU is detected:
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"FFmpeg is required for audio processing. Install it via your package manager:
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Windows — download from https://un5pe2rk7afd6zm5.julianrbryant.com/download.html and add to PATHIf you encounter OOM errors during inference or training, try enabling memory checkpointing:
- CLI: Add
--checkpointingto your command - Web UI: Enable the "Checkpointing" toggle in the inference tab
- Reduce batch size during training
# If FAISS fails on Python 3.12+
pip install faiss-cpu --upgrade
# If ONNX Runtime causes issues on macOS
pip install onnxruntime --upgrade
# For NVIDIA GPUs, ensure the GPU variant of ONNX Runtime
pip install onnxruntime-gpuContributions are welcome! Whether it's bug fixes, new features, or documentation improvements, feel free to open a pull request. Please ensure your changes pass any existing tests and follow the project's coding conventions.
The use of the converted voice for the following purposes is strictly prohibited:
- Criticizing or attacking individuals
- Advocating for or opposing specific political positions, religions, or ideologies
- Publicly displaying strongly stimulating expressions without proper zoning
- Selling of voice models and generated voice clips
- Impersonation of the original owner of the voice with malicious intentions
- Fraudulent purposes that lead to identity theft or fraudulent phone calls
This project builds upon the work of several open-source repositories and their contributors:
| Repository | Owner | Purpose |
|---|---|---|
| Vietnamese-RVC | Phạm Huỳnh Anh | Core RVC implementation |
| Applio | IAHispano | UI/UX inspiration & components |
| python-audio-separator | Nomad Karaoke | UVR5 audio separation |
| whisper | OpenAI | Speech-to-text transcription |
| BigVGAN | Nvidia | Vocoder implementation |
This project is licensed under the MIT License — see the LICENSE file for details.
- GitHub: ArkanDash/Advanced-RVC-Inference
- Discord: Join the community
- CLI Guide: Wiki - CLI Guide
- Issues: Report a bug