Skip to main content
POST
/
predict
AI-voice detection
curl --request POST \
  --url https://api.aurigin.ai/v1/predict \
  --header 'Content-Type: multipart/form-data' \
  --header 'x-api-key: <api-key>' \
  --form file='@example-file' \
  --form device=api \
  --form model=apollo-4-2026-01-16 \
  --form threshold=0.5
{
  "prediction_id": "pred_9b6ff057a7f7",
  "global": {
    "score": 0.975,
    "confidence": 0.95,
    "result": "spoofed",
    "reason": null
  },
  "segments": [
    {
      "start": 0,
      "end": 5,
      "scores": [
        0.98
      ],
      "confidence": 0.96,
      "result": "spoofed"
    },
    {
      "start": 5,
      "end": 10,
      "scores": [
        0.97
      ],
      "confidence": 0.94,
      "result": "spoofed"
    }
  ],
  "model": "apollo-4-2026-01-16",
  "processing_time": 1.23,
  "audio_duration": 10,
  "warnings": []
}

Authorizations

x-api-key
string
header
required

Body

Provide either a direct file upload or a presigned URL. Include device to tag the caller and optionally choose a model.

file
file
required
device
enum<string>

Optional device type making the request

Available options:
macos,
windows,
web_app,
api
model
enum<string> | null

Optional model version to run. Currently only apollo-4-2026-01-16 is available for this endpoint.

Available options:
apollo-4-2026-01-16
prediction_id
string

Optional custom prediction identifier

silence_threshold
number
default:80

Maximum silence percentage allowed per segment before it is skipped (0.0-100.0). Segments where silence exceeds this threshold are excluded from model inference. Lower values are stricter (skip more segments); higher values are more permissive. Set to 100 to analyze virtually all segments regardless of silence.

Required range: 0 <= x <= 100
threshold
number
default:0.5

Decision threshold for classifying audio as bonafide or spoofed (0.0-1.0). Lower values bias toward higher spoof detection; higher values reduce false positives. Confidence scores are derived from how far the raw score is from this threshold.

Required range: 0 <= x <= 1

Response

Global verdict plus per-segment predictions and confidence scores.

prediction_id
string

Unique identifier for this prediction

global
object
segments
object[]

Array of segment-level predictions. Each segment represents a 5-second window of the audio.

model
string

Model version used for prediction

processing_time
number<float>

Time taken to process the audio in seconds

audio_duration
number<float>

Duration of the audio file in seconds

warnings
string[]

Array of warning messages, if any