Enroll voice for identification
Enroll Voice
Create a unique voiceprint for identity verification
POST
Enroll voice for identification
Documentation Index
Fetch the complete documentation index at: https://docs.aurigin.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The/voiceid/enroll endpoint creates a unique, non-reversible voiceprint (embedding) for a user or speaker. This voiceprint can later be used to verify if a voice sample belongs to the same speaker.
Voice Storage: Voice prints and audio files are not stored in the first implementation. You must store the returned embedding vector yourself for future verification.
Authentication
Your API key for authentication
Request Parameters
Clean voice sample to enrollMinimum duration: >= 10 secondsSupported formats: WAV, MP3, AAC, FLAC, OGG, M4ABest practices:
- Use high-quality audio recordings
- Record in a quiet environment
- Ensure clear speech without background noise
- Use consistent recording conditions for best results
Optional user identifier for tracking and organizationExample:
"user_12345" or "john.doe@example.com"Response
Voice embedding vector (256-dimensional array of floating-point numbers)Dimension: 256 floats
Encoding: float32
Usage: Store this array securely for future verification callsThis is the voiceprint that uniquely identifies the speaker’s voice characteristics.
Time taken to process the audio and generate the embedding in seconds
Model version used to generate the embeddingExample:
"titanet-large"Whether deepfake detection was performed during enrollmenttrue: The voice sample was checked for authenticity during enrollment
false: Deepfake detection was skipped
Example Request
Example Response
200 - Success
400 - Audio Too Short
401 - Unauthorized
422 - Invalid Format
Error Codes
| Code | Description | Solution |
|---|---|---|
| 400 | Validation error | Check audio duration (>= 10s) and format |
| 401 | Unauthorized | Verify API key is valid and active |
| 422 | Unsupported format | Convert to supported audio format |
| 500 | Processing failed | Retry or contact support if persists |
Best Practices
Recording Quality
Recording Quality
- Use high-quality microphones in quiet environments
- Record at least 10-15 seconds for better accuracy
- Avoid background noise, echo, or distortion
- Use consistent recording conditions across enrollments
Storage & Security
Storage & Security
- Store embedding vectors securely (encrypted database)
- Never store raw audio files unless required by compliance
- Associate embeddings with user IDs in your system
- Implement access controls for voiceprint data
Multiple Enrollments
Multiple Enrollments
- Consider enrolling multiple samples per user for better accuracy
- Average multiple embeddings or use the best quality sample
- Re-enroll periodically if voice characteristics change
Vector Storage
Vector Storage
- Store as JSON array or binary format (e.g., NumPy array)
- Ensure precision is maintained (float32)
- Consider compression for large-scale deployments
- Implement versioning if model changes occur
Use Cases
Customer Authentication
Enroll customer voices during account setup for voice-based authentication
Fraud Prevention
Verify caller identity in call centers to prevent account takeover
Access Control
Secure access to sensitive systems using voice biometrics
Compliance
Meet regulatory requirements for identity verification
Related Endpoints
Verify Voice Identity
Use the enrolled voiceprint to verify if a voice sample matches
Authorizations
Body
multipart/form-data
Upload a clean voice sample (>= 10 seconds) to generate a voice embedding.
Response
Voice embedding generated successfully
Voice embedding vector (256-dimensional)
Example:
[
-0.021047543734312057,
-0.024001557379961014,
-0.011682840064167976
]Time taken to process the audio and generate the embedding in seconds
Example:
13.586525440216064
Model version used to generate the embedding
Example:
"titanet-large"
Whether deepfake detection was performed during enrollment
Example:
true