Pair a Custom AI Voice with Your nocensor.ai Character
Chris · · 9 min read

The Gap Between a Face and a Voice
An AI character with a face and no voice has half an identity. The image generation pipeline handles appearance; without a parallel audio system, text replies, live calls, and greeting audio all fall back to platform defaults or stay silent entirely. nocensor.ai's custom AI voice character pairing assigns a persistent ElevenLabs voice directly to the character record, so every mode that generates audio draws from the same source.
A character built on nocensor.ai carries two layers of identity that move together: the visual output generated by the LoRA determines how images and video frames look, and the voice assignment determines what the character sounds like when it speaks. Neither is tied to a hardcoded system character — both can be customized independently and saved to the character permanently.
What Voice + LoRA Pairing Adds to Your AI Character

A character LoRA on nocensor.ai encodes visual identity: facial structure, skin tone, hair, and the stylistic traits that make one generated image recognizable as "the same person" as another. Without a voice assignment, that identity stays purely visual — text chat has no audio component, voice calls use whatever default voice the system falls back to, and greetings play nothing.
Assigning a custom AI voice character changes what the character can do. With a paired voice:
- Text chat responses can be read aloud using the assigned voice model, with optional auto-play on new messages
- Voice calls route through the ElevenLabs ConvAI agent tied to that voice, maintaining the character's vocal personality through a real-time conversation
- Greetings — the introductory audio clip that plays when a user opens a conversation — are generated from the character's voice model, not a generic system voice
The pairing is stored on the character record. Once set, the voice follows the character across every feature that supports audio output — no per-session reconfiguration required.
How nocensor.ai Connects a Voice to a Character LoRA

The connection between voice and visual character is maintained at the character level in nocensor.ai's database. Each character record holds an elevenlabs_voice_id — a reference to a voice created in ElevenLabs Voice Design or cloned from a sample. When any audio-generating feature fires for that character — voice calls, greeting generation, or text-to-speech in chat — it reads the voice ID from the record and passes it directly to the ElevenLabs API.
The ElevenLabs ConvAI agent associated with the character runs from a separate elevenlabs_agent_id field. For voice calls specifically, the agent handles real-time conversation, interruption detection, and response generation. The voice model assigned to that agent determines the audio output. When nocensor.ai pairs a voice to a character, it updates both fields together — the voice ID for TTS and greetings, and the agent configuration for live calls.
The visual output (LoRA-generated images) and the audio output (ElevenLabs voice model) are independent systems linked only through the shared character record. Changing the LoRA leaves the voice unchanged; updating the voice ID leaves the image model unchanged. Both can evolve independently as the character's visual or vocal identity is refined.
Choosing the Right Voice for Your AI Character

ElevenLabs provides access to a voice library covering hundreds of pre-made options across accent, gender, age, and tone. nocensor.ai passes voice IDs from this library directly — any voice available through ElevenLabs Voice Design or cloned from a reference sample works in the pairing. The question is which voice fits the character's intended personality.
Accent and regional variety. A character built to roleplay as a British academic reads differently when the voice carries a received pronunciation accent versus a neutral American broadcast voice. ElevenLabs' library separates voices by language and region — filtering by accent before selecting a voice cuts down the candidate list substantially.
Age and register. ElevenLabs classifies voices by approximate speaker age and vocal register (conversational, narrative, authoritative, expressive). A character intended for casual intimate chat benefits from a conversational-register voice at a natural speaking pace. A character positioned as a mentor or guide suits a measured, lower-register option. Mismatched register — an authoritative voice on a playful character — creates cognitive dissonance that users notice within a few exchanges.
Stability vs. expressiveness. ElevenLabs exposes two sliders that affect how a voice performs in real-time: stability (higher = more consistent, lower = more varied between tokens) and similarity boost (how closely output matches the reference). For voice calls, higher stability produces more predictable responses under latency. For recorded greetings, lower stability and higher expressiveness can sound more natural because the output is generated once and played back, not streamed live.
Language match. If the character is intended for French-language interactions — nocensor.ai supports French across the interface — the paired voice must natively support French TTS output. ElevenLabs multilingual models handle this, but the selected voice must be flagged as supporting the target language to avoid degraded output quality.
Setting Up Voice Pairing in nocensor.ai

Voice pairing is configured from the Characters section on nocensor.ai. The character detail view exposes the voice configuration alongside the LoRA assignment and other character settings.
The process involves three steps: selecting an ElevenLabs voice from the available library (or entering a voice ID directly if the voice was created in a separate ElevenLabs account), saving the voice assignment to the character record, and optionally generating a test greeting to verify the voice output before finalizing the pairing.
The greeting generation step is the fastest way to hear what the assigned voice actually sounds like in the context of the character's persona. The greeting text is passed through the character's system prompt before audio generation, so the output reflects the character's personality and speaking style rather than a generic TTS test phrase. If the voice does not fit after hearing the greeting, the ID can be changed without affecting any other character settings.
After the voice is saved, nocensor.ai propagates the assignment to the ElevenLabs ConvAI agent configuration. Voice calls initiated for that character use the updated agent automatically — no manual agent editing in the ElevenLabs dashboard is required.
Voice Consistency Across Chats, Voice Calls, and Greetings

A custom AI voice character on nocensor.ai is not a one-feature option. The same voice ID surfaces across three separate interaction modes, each of which handles audio differently.
Chat TTS generates a complete audio clip from the text response after it is fully generated. The voice model processes the entire response text at once, producing a single audio file that plays back in the chat interface. This mode produces the most natural intonation because the model has the full sentence structure before generating audio — no mid-sentence pauses from streaming latency.
Voice calls stream audio in real time through the ElevenLabs ConvAI agent. The agent listens, responds, and speaks using the character's assigned voice, creating a live telephone-style conversation rather than a turn-based text exchange. Latency is a function of network conditions and ElevenLabs server load — nocensor.ai introduced regional routing for Europe and India that reduces round-trip time for users outside North America.
Greetings are pre-generated audio files played when a user opens a conversation with a character for the first time in a session. The greeting text is rendered through the character's voice model at greeting-generation time and stored as a clip rather than generated on demand. Greetings load instantly as a result — conversation open triggers a playback request for the stored file, not a live API call to ElevenLabs.
All three modes read from the same voice ID on the character record. If the voice is changed, subsequent chat TTS requests, new voice call sessions, and regenerated greetings all use the updated voice. Existing stored greeting clips retain the old voice until explicitly regenerated.
Custom LoRA Voices vs. System Character Voices

nocensor.ai's system characters — the pre-built companions available without a custom LoRA — come with pre-assigned voice configurations. System character voices are set by nocensor.ai and are not adjustable by users; the voice is considered part of the character's established identity. System characters are designed as complete packages, and voice is part of that package.
Custom characters — those built from a user-trained LoRA or assembled with a custom face model — have no default voice. The character becomes fully voiced only after a voice ID is assigned. Until that point, chat operates in text-only mode, and voice calls are unavailable for that character.
The distinction matters when choosing between using a system character versus building a custom one. System characters offer instant voice capability with no setup. Custom characters require the voice pairing step but give complete control over both the visual appearance (through the LoRA) and the voice (through the ElevenLabs voice library). For users building a long-running AI companion with a specific aesthetic and personality — rather than sampling the platform's ready-made options — the custom path produces a character where face, voice, and persona are all user-chosen rather than fixed at platform level.
A practical middle ground: start with a system character to explore the voice call and chat features without configuration, then build a custom LoRA once the interaction style that fits is clear. The voice pairing step adds roughly five minutes to the character setup process — the majority of the time is selecting a voice from the ElevenLabs library, not the configuration itself.
Conclusion
Voice and face are the two primary identity signals that make an AI character feel like a persistent entity rather than a stateless response generator. nocensor.ai's pairing system stores both at the character level and routes each to the appropriate output pipeline — visual generation through the LoRA, audio through ElevenLabs — without requiring reconfiguration per session.
The effect is most pronounced for characters used across multiple interaction modes: text replies, live calls, and greeting audio all resolve to the same voice model, producing a character that sounds consistent regardless of how the conversation is accessed. That consistency is what separates a custom AI voice character from a character that only looks distinct but sounds generic. Both LoRA configuration and voice selection are available from the Characters section.