MagicMixTTS
$25
$25
https://schema.org/InStock
usd
MercyfulKing
Overview:
MagicMix TTS is a desktop application designed for synthesizing text into speech using various Text-to-Speech (TTS) engines. It provides a user-friendly graphical interface (GUI) to manage voices, input text, control playback, and save the generated audio. Its standout feature is the ability to use different TTS technologies, including the unique voice mixing capabilities of the Kokoro engine and the voice cloning features of OpenVoice V2 with multi-lingual base speakers via MeloTTS.
Core Features:
Text-to-Speech Synthesis: Converts provided text into audible speech.
Multiple TTS Engine Support: Integrates different TTS engines, offering diverse voice options and capabilities.
Real-time Audio Playback: Streams generated audio for immediate playback.
Audio Output Saving: Saves the generated speech as a WAV audio file.
Playback Controls: Offers standard controls including Start, Stop, Pause, and Resume.
Voice Management: Allows selection and configuration of voices depending on the chosen engine.
Speech Speed Control: Allows adjusting speech speed using a slider.
Text Input Flexibility: Accepts text via direct typing, pasting, or uploading from files.
User-Friendly Interface: Provides a graphical interface built with PySide6 for intuitive operation.
Supported TTS Engines:
1. Kokoro Engine:
Voice Mixing: Allows selecting multiple built-in voices and assigning weights (using sliders) to create unique voice mixes ("Voice Formulas").
Predefined Voices: Comes with a set of predefined voices categorized by accent/language (US, UK, ES, FR, IN, IT, BR) and gender.
Favorites System: Users can save frequently used voice mixes (formulas) with custom aliases for easy recall. Favorites can be loaded and deleted.
Streaming Playback: Leverages RealtimeTTS for efficient streaming.
Speech Speed Control: Allows adjusting speech speed using a slider.
2. OpenVoice V2 Engine:
Voice Cloning: Specializes in cloning voices from reference audio files. Users select a single `.wav` file to define the target voice timbre.
Multi-Lingual/Accent Base: Uses MeloTTS to provide the base speech in various languages/accents (English US/UK/AU/IN, Spanish, French, Chinese, Japanese, Korean). Select the desired base language/accent from a dropdown.
Custom Voice Support: Allows users to upload their own `.wav` files to be used as reference voices. *Note: User provided '.wav' files should be minimum 10 seconds of speech.
Chunked Generation: Generates audio in sentence chunks internally for efficient processing and streaming playback.
Pause/Resume: Supports pausing (typically after the current sentence finishes generating) and resuming playback.
Speech Speed Control: Allows adjusting speech speed using a slider.
Voice Management Details:
Kokoro: Multi-select voices with sliders. Favorites - Save as Favorite Mix or Delete via UI.
OpenVoice V2: Single-select reference `.wav`. Upload New Voice or Delete via UI. Select base language/accent via dropdown.
Input Options:
Direct Input: Type or paste text directly into the main text area.
File Upload: Load text content from various file formats:
-- Plain Text (`.txt`)
-- PDF (`.pdf`)
-- Microsoft Word Documents (`.docx`, `.doc`)
Clear Text: A button is provided to quickly clear the input text area.
Playback Controls:
Start Playback: Initiates the TTS process and audio playback using the selected engine, voice(s), and input text.
Stop/Cancel Playback: Immediately halts audio playback and cancels the ongoing TTS generation process.
Pause Playback: Temporarily suspends audio playback. For OpenVoice V2, pausing typically occurs after the current sentence is generated and played.
Resume Playback: Continues playback from where it was paused.
Output:
Real-time Audio: Audio is played through the system's default sound output device.
Saved WAV Files: Every successful playback session automatically saves the generated audio to a `.wav` file.
-- Location: Files are saved in an `output` directory.
-- Naming: Filenames are automatically generated based on the voice name(s) and a timestamp (e.g., `Bella_20231027_103000.wav` or `CustomVoice_20231027_103500.wav`).
-- Access: A dedicated "Open Output Folder" button in the MixMenu allows easy access to the saved audio files.
User Interface (UI):
Slide-Out Menu ("MixMenu"): A collapsible side panel containing:
-- Engine selection dropdown.
-- Voice selection list (adapts to the selected engine).
-- Kokoro-specific features (Favorites dropdown, Save/Delete Favorite buttons.
-- OpenVoice V2-specific features (Select Base Language/Accent dropdown, Upload/Delete New Voice button).
-- "Open Output Folder" button.
-- Speech Speed slider
Visual Feedback: Buttons change appearance (color, text) to indicate state (enabled, disabled, playing, stopping, pausing).
Theme: Features a dark color scheme for comfortable viewing.
Technical Requirements:
- OS: Windows 10/11 recommended.
- CPU: Modern multi-core recommended (i5/Ryzen 5 or better) (Kororo on CPU, faster performance compared to OpenVoice).
- RAM: 8GB, 16GB+ recommended
- GPU: **NVIDIA GPU w/ CUDA (6GB+ VRAM Recommended)** for significantly faster performance, especially OpenVoice V2. CPU-only operation is possible but much slower.
- Storage: ~12 GB free space (app, models, outputs). SSD recommended.
MixMagicTTS installer files - just click the exe file.
Add to wishlist