Skip to main content
POST
/
voiceid
/
enroll
Enroll voice for identification
curl --request POST \
  --url https://api.aurigin.ai/v1/voiceid/enroll \
  --header 'Content-Type: multipart/form-data' \
  --header 'x-api-key: <api-key>' \
  --form audio_file='@example-file' \
  --form 'user_id=<string>'
{
  "embedding": [
    -0.021047543734312057,
    -0.024001557379961014,
    -0.011682840064167976,
    0.01862935535609722,
    0.030680431053042412,
    0.012365893460810184,
    0.022932840511202812,
    -0.02653215453028679,
    0.03283798322081566,
    0.02938467636704445,
    0.016921166330575943,
    0.00456651858985424,
    0.004668207839131355,
    0.019676126539707184,
    -0.026682903990149498,
    0.007073606364428997,
    0.02817155048251152,
    -0.01307824905961752,
    -0.005019648000597954,
    0.014194189570844173,
    -0.003651274833828211,
    -0.013507496565580368,
    0.0035653149243444204,
    0.012558531947433949,
    -0.008742935955524445,
    -0.01317399088293314,
    -0.02584591507911682,
    -0.04106520116329193,
    0.015779513865709305,
    -0.012857845053076744,
    -0.023655345663428307,
    0.008520219475030899,
    -0.004576820880174637,
    0.01908191852271557,
    -0.006375580094754696,
    -0.01133985910564661,
    -0.024633223190903664,
    -0.006668175105005503,
    -0.023163946345448494,
    -0.027255428954958916,
    -0.010428240522742271,
    0.04642370715737343,
    -0.004443565383553505,
    -0.028582191094756126,
    0.012907243333756924,
    -0.0254207793623209,
    -0.009619798511266708,
    -0.0018211244605481625,
    0.013937097042798996,
    0.0079690245911479,
    -0.009932221844792366,
    0.010009655728936195,
    0.014746781438589096,
    -0.01825474575161934,
    0.02056814543902874,
    -0.015499094501137733,
    0.014383708126842976,
    0.018852459266781807,
    0.04513779282569885,
    0.0038257490377873182,
    -0.016881395131349564,
    0.008533456362783909,
    0.046820636838674545,
    -0.0068024941720068455,
    -0.0074867974035441875,
    -0.009024945087730885,
    -0.010750235989689827,
    0.014376042410731316,
    0.00045313822920434177,
    -0.011689898557960987,
    0.010902578942477703,
    -0.0036655054427683353,
    -0.040745947510004044,
    0.008332440629601479,
    -0.007740195374935865,
    0.018386486917734146,
    0.014798511750996113,
    -0.0181171502918005,
    -0.0034209152217954397,
    0.030579103156924248,
    -0.0024134316481649876,
    0.01719391718506813,
    0.01807386614382267,
    0.01511380448937416,
    -0.00795607827603817,
    0.028305811807513237,
    -0.00027965829940512776,
    -0.007430190220475197,
    -0.0031233076006174088,
    -0.011055421084165573,
    0.0068847681395709515,
    0.0034877152647823095,
    -0.02300192229449749,
    -0.02018667571246624,
    0.0021206610836088657,
    0.005942602641880512,
    0.01252029649913311,
    0.04239252582192421,
    -0.01830153912305832,
    -0.012159381993114948,
    -0.018324917182326317,
    -0.023927999660372734,
    0.006663294974714518,
    -0.0029664169996976852,
    0.010410767048597336,
    0.0032423739321529865,
    -0.0066847712732851505,
    0.010331455618143082,
    0.016090447083115578,
    -0.007336160633713007,
    -0.0020536279771476984,
    -0.00913260318338871,
    -0.0005651549436151981,
    0.06328430026769638,
    -0.01791723072528839,
    -0.014669392257928848,
    -0.005876319482922554,
    -0.0239340141415596,
    -0.01594495214521885,
    0.01723731867969036,
    0.02702360972762108,
    0.009339643642306328,
    0.02721589058637619,
    0.0026582039427012205,
    -0.005184155888855457,
    -0.024671178311109543,
    0.002516157692298293,
    0.023639554157853127,
    -0.005006634164601564,
    -0.024672605097293854,
    -0.004195123910903931,
    -0.02357458509504795,
    0.004583366215229034,
    -0.01516904216259718,
    -0.019425570964813232,
    -0.04407735541462898,
    0.05062895640730858,
    -0.00015267534763552248,
    0.022621162235736847,
    -0.009689589962363243,
    0.014427929185330868,
    0.012994356453418732,
    -0.019761765375733376,
    0.01788155920803547,
    -0.004554769955575466,
    0.000334456650307402,
    0.011400381103157997,
    0.0038625148590654135,
    0.012409945949912071,
    -0.007220016792416573,
    -0.042426858097314835,
    -0.020146694034337997,
    -0.002771963831037283,
    0.016492728143930435,
    -0.003412176389247179,
    0.03808671608567238,
    -0.020053472369909286,
    0.0017303947824984789,
    0.009819664061069489,
    0.04932665452361107,
    0.005291991867125034,
    -0.0247158482670784,
    -0.019858039915561676,
    0.028651162981987,
    0.024598274379968643,
    -0.023336008191108704,
    -0.014375682920217514,
    0.014731616713106632,
    -0.011884283274412155,
    -0.024553121998906136,
    -0.04059327021241188,
    0.015357284806668758,
    0.026866627857089043,
    -0.0030987237114459276,
    -0.0018339564558118582,
    -0.02220608852803707,
    0.0018473020754754543,
    0.03432437777519226,
    0.005599610507488251,
    -0.020381037145853043,
    0.03150622174143791,
    0.020770778879523277,
    -0.005547627341002226,
    0.032829105854034424,
    0.020428897812962532,
    -0.007288635242730379,
    -0.002440565498545766,
    -0.018194381147623062,
    0.005998202133923769,
    -0.018759630620479584,
    0.009391428902745247,
    0.006601202767342329
  ],
  "processing_time": 13.586525440216064,
  "model_version": "titanet-large",
  "deepfake_checked": true
}

Overview

The /voiceid/enroll endpoint creates a unique, non-reversible voiceprint (embedding) for a user or speaker. This voiceprint can later be used to verify if a voice sample belongs to the same speaker.
Voice Storage: Voice prints and audio files are not stored in the first implementation. You must store the returned embedding vector yourself for future verification.

Authentication

X-Api-Key
string
required
Your API key for authentication

Request Parameters

audio_file
file
required
Clean voice sample to enrollMinimum duration: >= 10 secondsSupported formats: WAV, MP3, AAC, FLAC, OGG, M4ABest practices:
  • Use high-quality audio recordings
  • Record in a quiet environment
  • Ensure clear speech without background noise
  • Use consistent recording conditions for best results
user_id
string
Optional user identifier for tracking and organizationExample: "user_12345" or "john.doe@example.com"

Response

embedding
array
required
Voice embedding vector (256-dimensional array of floating-point numbers)Dimension: 256 floats Encoding: float32 Usage: Store this array securely for future verification callsThis is the voiceprint that uniquely identifies the speaker’s voice characteristics.
processing_time
number
required
Time taken to process the audio and generate the embedding in seconds
model_version
string
required
Model version used to generate the embeddingExample: "titanet-large"
deepfake_checked
boolean
required
Whether deepfake detection was performed during enrollmenttrue: The voice sample was checked for authenticity during enrollment false: Deepfake detection was skipped

Example Request

curl -X POST "https://api.aurigin.ai/v1/voiceid/enroll" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -F "audio_file=@user_voice_sample.wav" \
  -F "user_id=john_doe"

Example Response

200 - Success
{
  "embedding": [
    -0.021047543734312057,
    -0.024001557379961014,
    -0.011682840064167976,
    0.01862935535609722,
    0.030680431053042412,
    0.012365893460810184,
    0.022932840511202812,
    -0.02653215453028679,
    0.03283798322081566,
    0.02938467636704445,
    0.016921166330575943,
    0.00456651858985424,
    0.004668207839131355,
    0.019676126539707184,
    -0.026682903990149498,
    0.007073606364428997,
    0.02817155048251152,
    -0.01307824905961752,
    -0.005019648000597954,
    0.014194189570844173,
    -0.003651274833828211,
    -0.013507496565580368,
    0.0035653149243444204,
    0.012558531947433949,
    -0.008742935955524445,
    -0.01317399088293314,
    -0.02584591507911682,
    -0.04106520116329193,
    0.015779513865709305,
    -0.012857845053076744,
    -0.023655345663428307,
    0.008520219475030899,
    -0.004576820880174637,
    0.01908191852271557,
    -0.006375580094754696,
    -0.01133985910564661,
    -0.024633223190903664,
    -0.006668175105005503,
    -0.023163946345448494,
    -0.027255428954958916,
    -0.010428240522742271,
    0.04642370715737343,
    -0.004443565383553505,
    -0.028582191094756126,
    0.012907243333756924,
    -0.0254207793623209,
    -0.009619798511266708,
    -0.0018211244605481625,
    0.013937097042798996,
    0.0079690245911479,
    -0.009932221844792366,
    0.010009655728936195,
    0.014746781438589096,
    -0.01825474575161934,
    0.02056814543902874,
    -0.015499094501137733,
    0.014383708126842976,
    0.018852459266781807,
    0.04513779282569885,
    0.0038257490377873182,
    -0.016881395131349564,
    0.008533456362783909,
    0.046820636838674545,
    -0.0068024941720068455,
    -0.0074867974035441875,
    -0.009024945087730885,
    -0.010750235989689827,
    0.014376042410731316,
    0.00045313822920434177,
    -0.011689898557960987,
    0.010902578942477703,
    -0.0036655054427683353,
    -0.040745947510004044,
    0.008332440629601479,
    -0.007740195374935865,
    0.018386486917734146,
    0.014798511750996113,
    -0.0181171502918005,
    -0.0034209152217954397,
    0.030579103156924248,
    -0.0024134316481649876,
    0.01719391718506813,
    0.01807386614382267,
    0.01511380448937416,
    -0.00795607827603817,
    0.028305811807513237,
    -0.00027965829940512776,
    -0.007430190220475197,
    -0.0031233076006174088,
    -0.011055421084165573,
    0.0068847681395709515,
    0.0034877152647823095,
    -0.02300192229449749,
    -0.02018667571246624,
    0.0021206610836088657,
    0.005942602641880512,
    0.01252029649913311,
    0.04239252582192421,
    -0.01830153912305832,
    -0.012159381993114948,
    -0.018324917182326317,
    -0.023927999660372734,
    0.006663294974714518,
    -0.0029664169996976852,
    0.010410767048597336,
    0.0032423739321529865,
    -0.0066847712732851505,
    0.010331455618143082,
    0.016090447083115578,
    -0.007336160633713007,
    -0.0020536279771476984,
    -0.00913260318338871,
    -0.0005651549436151981,
    0.06328430026769638,
    -0.01791723072528839,
    -0.014669392257928848,
    -0.005876319482922554,
    -0.0239340141415596,
    -0.01594495214521885,
    0.01723731867969036,
    0.02702360972762108,
    0.009339643642306328,
    0.02721589058637619,
    0.0026582039427012205,
    -0.005184155888855457,
    -0.024671178311109543,
    0.002516157692298293,
    0.023639554157853127,
    -0.005006634164601564,
    -0.024672605097293854,
    -0.004195123910903931,
    -0.02357458509504795,
    0.004583366215229034,
    -0.01516904216259718,
    -0.019425570964813232,
    -0.04407735541462898,
    0.05062895640730858,
    -0.00015267534763552248,
    0.022621162235736847,
    -0.009689589962363243,
    0.014427929185330868,
    0.012994356453418732,
    -0.019761765375733376,
    0.01788155920803547,
    -0.004554769955575466,
    0.000334456650307402,
    0.011400381103157997,
    0.0038625148590654135,
    0.012409945949912071,
    -0.007220016792416573,
    -0.042426858097314835,
    -0.020146694034337997,
    -0.002771963831037283,
    0.016492728143930435,
    -0.003412176389247179,
    0.03808671608567238,
    -0.020053472369909286,
    0.0017303947824984789,
    0.009819664061069489,
    0.04932665452361107,
    0.005291991867125034,
    -0.0247158482670784,
    -0.019858039915561676,
    0.028651162981987,
    0.024598274379968643,
    -0.023336008191108704,
    -0.014375682920217514,
    0.014731616713106632,
    -0.011884283274412155,
    -0.024553121998906136,
    -0.04059327021241188,
    0.015357284806668758,
    0.026866627857089043,
    -0.0030987237114459276,
    -0.0018339564558118582,
    -0.02220608852803707,
    0.0018473020754754543,
    0.03432437777519226,
    0.005599610507488251,
    -0.020381037145853043,
    0.03150622174143791,
    0.020770778879523277,
    -0.005547627341002226,
    0.032829105854034424,
    0.020428897812962532,
    -0.007288635242730379,
    -0.002440565498545766,
    -0.018194381147623062,
    0.005998202133923769,
    -0.018759630620479584,
    0.009391428902745247,
    0.006601202767342329
  ],
  "processing_time": 13.586525440216064,
  "model_version": "titanet-large",
  "deepfake_checked": true
}
400 - Audio Too Short
{
  "error": "validation_error",
  "message": "Audio file too short: 8.5s (minimum 10.0s required)",
  "status": 400,
  "correlation_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
401 - Unauthorized
{
  "error": "unauthorized",
  "message": "Invalid or inactive API key",
  "status": 401,
  "correlation_id": "8b738992-c4eb-4f19-870a-900e6830d147"
}
422 - Invalid Format
{
  "error": "unsupported_format",
  "message": "Provided audio format is not supported",
  "status": 422,
  "correlation_id": "48cd2c61-9cbe-4761-8a8f-f65aabbd22b9"
}

Error Codes

CodeDescriptionSolution
400Validation errorCheck audio duration (>= 10s) and format
401UnauthorizedVerify API key is valid and active
422Unsupported formatConvert to supported audio format
500Processing failedRetry or contact support if persists

Best Practices

  • Use high-quality microphones in quiet environments
  • Record at least 10-15 seconds for better accuracy
  • Avoid background noise, echo, or distortion
  • Use consistent recording conditions across enrollments
  • Store embedding vectors securely (encrypted database)
  • Never store raw audio files unless required by compliance
  • Associate embeddings with user IDs in your system
  • Implement access controls for voiceprint data
  • Consider enrolling multiple samples per user for better accuracy
  • Average multiple embeddings or use the best quality sample
  • Re-enroll periodically if voice characteristics change
  • Store as JSON array or binary format (e.g., NumPy array)
  • Ensure precision is maintained (float32)
  • Consider compression for large-scale deployments
  • Implement versioning if model changes occur

Use Cases

Customer Authentication

Enroll customer voices during account setup for voice-based authentication

Fraud Prevention

Verify caller identity in call centers to prevent account takeover

Access Control

Secure access to sensitive systems using voice biometrics

Compliance

Meet regulatory requirements for identity verification

Authorizations

x-api-key
string
header
required

Body

multipart/form-data

Upload a clean voice sample (>= 10 seconds) to generate a voice embedding.

audio_file
file
required

Clean voice sample (>= 10 seconds)

user_id
string

Optional user identifier for tracking

Response

Voice embedding generated successfully

embedding
number<float>[]

Voice embedding vector (256-dimensional)

Example:
[
-0.021047543734312057,
-0.024001557379961014,
-0.011682840064167976
]
processing_time
number<float>

Time taken to process the audio and generate the embedding in seconds

Example:

13.586525440216064

model_version
string

Model version used to generate the embedding

Example:

"titanet-large"

deepfake_checked
boolean

Whether deepfake detection was performed during enrollment

Example:

true