OpenSpeech: Self-Hosted Text-to-Speech Made Simple¶
OpenSpeech is a powerful, self-hosted web application that transforms text into natural-sounding speech using OpenAI's TTS API. With support for long-form content, multi-format audio export, and an intuitive web interface, OpenSpeech makes text-to-speech conversion accessible and efficient for personal and professional use.
Whether you need to convert articles, documents, or notes into audio format, OpenSpeech provides a seamless experience with advanced features like automatic audio stitching, storage management, and dark mode support.
GitHub Repository: OpenSpeech
Docker Image: ghcr.io/binuengoor/openspeech
Overview¶
Title: Convert Text to Speech with OpenSpeech - Your Personal TTS Service
Description: A self-hosted Node.js application for converting text to high-quality speech using OpenAI's API, with support for long documents, multiple voices, and flexible audio formats.
Tags: Text-to-Speech, TTS, OpenAI, Docker, Audio Processing, Self-Hosted
Key Features¶
- Long-Form Text Support: Automatically splits text longer than 4096 characters into chunks and seamlessly stitches them together
- Multiple Voice Options: Access all OpenAI TTS voices with language filtering and easy selection
- Flexible Audio Formats: Export in MP3, Opus, AAC, FLAC, WAV, or PCM formats
- Document Upload: Convert Word documents (.docx) and PDF files directly to speech
- Audio Stitching: Combine multiple audio chunks into a single file using FFmpeg
- Storage Management: Built-in storage monitoring with automatic cleanup options
- Dark Mode: Eye-friendly interface with automatic theme detection
- Speed Control: Adjust playback speed from 0.25x to 4.0x
- Custom Endpoints: Support for OpenAI-compatible TTS services
- File Management: Download, play, and manage generated audio files directly in the interface
Installation/Setup¶
Option 1: Using Docker (Recommended)¶
Step 1: Pull the Docker Image
docker pull ghcr.io/binuengoor/openspeech:latest
Step 2: Run the Container
docker run -d \
-p 3000:3000 \
-v ./data:/app/data \
-v ./output:/app/output \
-e NODE_ENV=production \
--name openspeech \
ghcr.io/binuengoor/openspeech:latest
Step 3: Access the Application
Open your browser and navigate to http://localhost:3000
Option 2: Using Docker Compose¶
Step 1: Create docker-compose.yml
services:
openspeech:
image: ghcr.io/binuengoor/openspeech:latest
container_name: openspeech
ports:
- "3000:3000"
volumes:
- ./data:/app/data
- ./output:/app/output
environment:
- NODE_ENV=production
restart: unless-stopped
Step 2: Start the Service
docker-compose up -d
Option 3: Build from Source¶
Step 1: Clone the Repository
git clone https://github.com/binuengoor/OpenSpeech.git
cd OpenSpeech
Step 2: Install Dependencies
npm install
Step 3: Configure Settings Copy the example settings file:
cp data/settings.example.json data/settings.json
Step 4: Start the Application
npm start
The application will be available at http://localhost:3000
Usage Guide¶
Initial Configuration¶
- Open Settings: Click the ⚙️ settings icon in the sidebar
- Enter API Key: Add your OpenAI API key
- Select Service Type: Choose OpenAI or a custom endpoint
- Test Connection: Verify your settings work correctly
- Save Settings: Your configuration is stored securely
Converting Text to Speech¶
- Enter Text: Type or paste your text into the main input area (supports any length)
- Select Voice: Choose from available voices and filter by language if desired
- Adjust Speed: Set playback speed between 0.25x and 4.0x
- Choose Format: Select your preferred audio format (MP3, Opus, AAC, etc.)
- Enable Stitching: Toggle "Combine audio chunks" for seamless long-form audio
- Generate: Click "Generate Speech" and wait for processing
- Play/Download: Use the built-in player or download the audio file
Document Upload¶
- Click Upload Button: Located above the text input area
- Select File: Choose a .docx or .pdf document
- Automatic Extraction: Text is extracted and populated in the input field
- Generate Speech: Follow the standard text-to-speech workflow
Storage Management¶
Monitor your storage usage in the sidebar: - View used space and file count - Set maximum storage limits - Enable automatic cleanup when threshold is reached - Manually delete individual files or clear all at once
Configuration Options¶
Environment Variables¶
# API Configuration
OPENAI_API_KEY=your-api-key-here
CUSTOM_ENDPOINT=https://your-custom-endpoint.com
SERVICE_TYPE=openai
# Storage Settings
MAX_STORAGE_MB=500
AUTO_CLEANUP=true
CLEANUP_THRESHOLD=90
# Server Configuration
PORT=3000
NODE_ENV=production
Advanced Settings¶
- Custom Endpoints: Use OpenAI-compatible TTS services
- Storage Limits: Prevent disk space issues with configurable limits
- Auto-Cleanup: Automatically remove old files when storage threshold is reached
- Administrator Locks: Lock certain settings to prevent user modification
Technical Details¶
Architecture - Backend: Node.js with Express.js - Frontend: Vanilla JavaScript with modern CSS - Audio Processing: FFmpeg for format conversion and concatenation - Document Parsing: Mammoth (DOCX) and pdf-parse (PDF) - Storage: Local file system with metadata tracking
Multi-Platform Support - Docker images built for both linux/amd64 and linux/arm64 - Compatible with Intel/AMD and ARM-based systems (including Apple Silicon) - All dependencies are pure JavaScript or have pre-built binaries
Security Features - Secure API key storage - Environment variable protection - Sensitive data excluded from version control - Production-ready with NODE_ENV optimizations
Example Workflow¶
Scenario: Converting a 10,000-character article to speech
- Paste your article into the text input
- System automatically detects it will be split into ~3 chunks
- Select "Alloy" voice with 1.0x speed
- Choose MP3 format
- Enable "Combine audio chunks"
- Click "Generate Speech"
- OpenSpeech processes each chunk and stitches them together
- Play the seamless audio directly in the browser or download for offline use
Additional Information¶
Requirements - Docker (recommended) or Node.js 18+ - OpenAI API key with TTS access - FFmpeg (included in Docker image) - Modern web browser
Browser Compatibility - Chrome/Edge 90+ - Firefox 88+ - Safari 14+
Performance - Chunk processing: ~2-5 seconds per chunk - Audio stitching: ~1-2 seconds for typical files - Storage-efficient with automatic cleanup options
Customization - Adjust chunk size for different use cases - Modify audio quality settings - Configure storage policies - Set default voice and speed preferences
Troubleshooting¶
API Key Issues
- Verify your OpenAI API key has TTS access
- Check for proper key format (starts with sk-
)
- Test connection using the built-in connection test
Storage Problems - Monitor storage usage in the sidebar - Enable automatic cleanup to prevent disk full errors - Manually delete old files from the file management panel
Audio Quality - Use higher bitrate formats (FLAC, WAV) for best quality - Adjust speed settings for better comprehension - Test different voices to find the best match for your content
Future Enhancements¶
- Support for additional TTS providers (Azure, Google Cloud)
- Batch processing for multiple documents
- Audio editing and trimming capabilities
- Playlist creation and management
- API endpoint for programmatic access
Transform your text into professional-quality speech with OpenSpeech. Deploy in minutes, customize to your needs, and enjoy unlimited text-to-speech conversion on your own infrastructure. Get started today! 🎙️