MagicMixTTS Pro (DEMO/TRIAL VERSION)

==================================
MagicMixTTS Pro (DEMO/TRIAL VERSION)
==================================

DEMO/TRIAL VERSION LIMITATIONS:

- Speech to Text: Disabled.
- File Saving: Disabled.
- AI Chat: Disabled
- Speed Slider: Disabled.
- Upload New Voices: Disabled.
- Voices per Engine: Limit of 2.
- Character limit: 200 characters.

Overview:

MagicMixTTS Pro is a versatile desktop application for synthesizing text into high-quality speech across a wide range of languages and accents. It features a user-friendly, tabbed interface to manage voices, control playback, and organize generated audio. Its standout feature is the integration of multiple TTS engines, offering unique capabilities from the creative voice mixing of the Kokoro engine to the advanced voice cloning and multilingual synthesis of OpenVoice and CoquiTTS. The application is now enhanced with an AI Chat Assistant for generating conversational text and an Audio Library for managing your creations, making it a comprehensive tool for global voice content creation. Additionally, MagicMixTTS Pro includes a Speech-to-Text feature integrated into both the Text to Speech and AI Chat tabs, allowing users to input text via voice for seamless interaction.

Core Features:

Tabbed Interface: Separate, dedicated workspaces for Text-to-Speech, AI Chat, and the Audio Library.
Multiple TTS Engines: Integrates Kokoro, OpenVoice, F5-TTS, StyleTTS2, Chatterbox, and CoquiTTS.
Multilingual and Multi-Accent Support: Generate speech in numerous languages and accents. Selectable base languages in OpenVoice V2 (e.g., Spanish, French, Chinese, Japanese, Korean) and broad language support in CoquiTTS allow for international voice generation.
AI Chat Assistant: Generate text using powerful local AI models (GGUF format).
Audio & Model Library Management: Play, rename, and delete your saved audio files and manage cached AI models from within the app.
Real-time Audio Playback: Streams generated audio for immediate playback.
Audio Output Saving: Saves generated speech as a WAV audio file.
Advanced Playback Controls: Offers Start, Stop, and Pause/Resume controls.
Voice Management: Allows selection and configuration of voices for each engine.
Speech Speed Control: Adjust speech speed for most engines.
Text Input Flexibility: Accepts text via direct typing, pasting, or uploading from .txt, .pdf, and .docx files.

AI Chat Assistant:

Local AI Models: Run GGUF AI models locally on your machine. Pre-configured for models like Phi-4 Mini and Llama-3.2.
Custom Model Support: Load your own custom GGUF models from HuggingFace repositories.
Interactive Chat: Engage in conversation with the AI in a familiar chat interface.
System Prompt: Set a system prompt to guide the AI's responses, enhancing context and relevance. (i.e., "You are a helpful assistant.", "You are an experienced writer.", etc.)
Parameter Control: Adjust AI response creativity with a Temperature slider and manage conversation memory with a selectable Context Size.
Seamless Integration:

AI Response -> TTS Input: Instantly send the AI's response to the main TTS tab for modification or synthesis in a different voice.
Speak AI Response: Toggle on to have the AI's responses automatically spoken using the currently selected TTS engine and voice.
Speech-to-Text Input: Use the integrated microphone button to input text via voice directly into the AI Chat tab.
Start Conversation: Speak directly into the AI Chat tab and have a dialog with the AI. (Note: Speak into the microphone, when silence is detected your text will be sent to the AI.)

Audio Library & Model Cache:
Dual-Pane View: The "Library" tab is split into two sections for comprehensive asset management.
Audio Library (Top Pane):

Centralized Management: Provides a list of all your generated audio files.
Detailed View: View file name, creation date, and audio length at a glance.
Direct Playback: Play, pause, and stop any audio file directly from the library list.
File Operations: Right-click or use dedicated buttons to Play, Rename or Delete files.

AI Model Cache (Bottom Pane):

Disk Space Management: Lists all downloaded GGUF models from the AI Chat tab, showing their repository name and disk size.
Delete Cached Models: Safely delete entire cached model repositories to free up disk space. This is especially useful as AI models can be very large.

Supported TTS Engines:

Kokoro Engine:
Voice Mixing: Select multiple built-in voices and assign weights (using sliders) to create unique voice mixes ("Voice Formulas").
Predefined Voices: Comes with a set of predefined voices categorized by country (US, UK, ES, FR, IN, IT, BR) accent/language and gender icons.
Favorites System: Save frequently used voice mixes (formulas) with custom aliases for easy recall.
Spoken Word Highlighting: Visually highlights words in the input text as they are spoken.
Speech Speed Control: Allows adjusting speech speed using a slider.
Speech-to-Text Input: Use the integrated microphone button to input text via voice directly into the Text-to-Speech tab.

OpenVoice V2 Engine:
Voice Cloning: Specializes in cloning voices from reference audio files. Users select a single .wav file to define the target voice timbre.
Multi-Lingual/Accent Base: Uses MeloTTS to provide the base speech in various languages/accents (English US/UK/AU/IN, Spanish, French, Chinese, Japanese, Korean).
Custom Voice Support: Allows users to upload their own .wav files to be used as reference voices. *Note: User provided '.wav' should be 10 seconds of speech for optimum results.
Speech Speed Control: Allows adjusting speech speed using a slider.

F5-TTS Engine:
Robust Synthesis & Cloning: Provides reliable text-to-speech output and supports voice cloning from user-provided .wav reference audio files. *Note: User provided '.wav' should be 10 seconds of speech for optimum results.
Open Model: Utilizes an Apache 2.0 licensed model (OpenF5-TTS-Base) for broad usability.
Speech Speed Control: Allows adjusting speech speed using a slider.

StyleTTS2 Engine:
Expressive & Stylistic Synthesis: Generates highly natural, expressive, and emotionally rich speech.
Advanced Voice Cloning: Clone voices from short reference .wav files, capturing both timbre and speaking style. *Note: User provided '.wav' should be 10 seconds of speech for optimum results.
Fine-grained Control: Adjust Alpha (timbre), Beta (prosody), Diffusion Steps (audio refinement), and Embedding Scale (style strength) for detailed customization.
Multi-speaker Support: Works with a wide range of English voices and accents.
Speech Speed Control: Allows adjusting speech speed using a slider.

Chatterbox Engine:
Voice Cloning: Clones voices from reference audio files (.wav). Select a single .wav file. *Note: User provided '.wav' should be 10 seconds of speech for optimum results.
Parameters: Adjust Exaggeration (emotion/style strength), CFG Weight (guidance scale), and Temperature (randomness) via sliders.
Custom Voice Support: Allows users to upload their own .wav files.
General Use:

The default settings (exaggeration=0.5, cfg_weight=0.5, temperature=0.8) work well for most prompts.
Exaggeration controls the emotional intensity of the speech. Higher values (0.7 or above) produce more expressive speech, while lower values (0.3 or below) yield more neutral tones.
CFG Weight adjusts how closely the output follows the prompt. If the reference speaker has a fast speaking style, lowering cfg_weight to around 0.3 can improve pacing.

Expressive or Dramatic Speech:

For more expressive or dramatic speech, increase the exaggeration to around 0.7 or higher and lower cfg_weight values (e.g. ~0.3).
Higher exaggeration tends to speed up speech; reducing cfg_weight helps compensate with slower, more deliberate pacing.

CoquiTTS Engine:
High-Quality Multilingual Synthesis: Supports many languages and accents using open-source CoquiTTS models.
Voice Cloning: Clone voices from user-provided .wav reference files. *Note: User provided '.wav' should be 10 seconds of speech for optimum results.
Streaming Playback: Real-time audio streaming for immediate feedback.
Speech Speed Control: Adjustable speech speed for flexible output.
Disclaimer: The CoquiTTS model is licensed under the Coqui Public Model License, which prohibits commercial use of the model and its outputs. By installing this application, you agree to Coqui's license terms. The creator of this software does not distribute the CoquiTTS model with the application bundle. The application code is licensed under the Mozilla Public License Version 2.0, but you must comply with the Coqui model license for any use of CoquiTTS.

Technical Requirements:

OS: Windows 10/11 recommended.
CPU: Modern multi-core recommended (i5/Ryzen 5 or better).
RAM: 8GB minimum, 16GB+ Strongly Recommended for optimal performance with multiple engines and AI Chat.
GPU: NVIDIA GPU w/ CUDA (8GB+ VRAM Recommended) for significantly faster performance with all engines.
Storage: ~15 GB free space (application, all models, dependencies, and outputs). SSD recommended.

Name a fair price:

I want this!

5 downloads

MixMagicTTS DEMO/TRIAL VERSION installer files - just click the exe file.

No refunds allowed