Analyze Video Endpoint
POST /api/analyze-video
Create a new video-analyzing task. The server responds immediately with a video_id
; analysis happens asynchronously.
Request Format
- Content-Type:
multipart/form-data
Parameters
Field | Type | Required | Default | Notes |
---|---|---|---|---|
file | binary | ✔︎ | — | Video to analyze (any common container/codec) |
visual_analytics | bool | ✖︎ | false | Enables visual analytics |
Response Format
Success Response (200 OK)
{
"video_id": "a1b2c3d4-e5f6-e7",
"status": "pending"
}
Analysis Types
- Speech Analytics (Always included): Provides comprehensive speech analysis including transcript, filler words, pauses, sentiment, and speaking patterns
- Visual Analytics (Optional): When
visual_analytics=true
, adds gesture analysis, posture detection, eye contact patterns, and facial expression analysis
Example: cURL
# example request to analyze-video endpoint
curl -X POST "https://api.aiframe.ai/api/analyze-video" \
-H "X-API-Key: YOUR_API_KEY" \
-F file=@"/path/to/video.mov" \
-F "visual_analytics=true"
Sample Output
When you retrieve the completed analysis results via the Get Status endpoint, the result
field will contain detailed analytics data:
Speech Analytics Output (Always Included)
{
"Audio Section": "-----------**********************----------------",
"transcript": "Transcript of video",
"summary": {
"duration_sec": 47.4,
"filler_word_count": 11,
"words_per_minute": 155.7,
"avg_word_duration": 0.269,
"speech_to_pause_ratio": 0.699,
"emotion": "neutral",
"speaking_style": {
"style": "neutral",
"readability_grade": 4.416428571428572,
"lexical_diversity": 0.4148148148148148
}
},
"filler_words": {
"count": 11,
"instances": [
{
"word": "uhm",
"start": 4.42,
"end": 4.74,
"confidence": 0.0
}
]
},
"pauses": {
"total": 4,
"long_pauses": 4,
"time_series": [
{
"start": 2.58,
"end": 3.98,
"duration": 1.4,
"is_long": true
}
]
},
"sentiment_words": [
{
"time": 7.5,
"word": "importantly",
"compound": 0.3182
}
],
"pos": {
"NOUN": [
{
"word": "video",
"timestamps": [
{
"start": 1.06,
"end": 1.38
}
]
}
],
"VERB": [],
"ADP": [],
"ADJ": [],
"ADV": [],
"INTJ": []
}
}
Visual Analytics Output (When visual_analytics=true)
{
"Video Section": "-----------**********************----------------",
"gesture_summary": {
"open_hand": 0,
"pinch_or_point": 0,
"none": 0
},
"posture_summary": {
"raised_hand": 0,
"hands_on_hips": 313,
"left_hand_high": 0,
"right_hand_high": 0,
"left_hand_forward": 1456,
"right_hand_forward": 1456
},
"merged_visual_time_series": [
{
"time": 0.0,
"pose_detected": true,
"raised_hand": false,
"hands_on_hips": true,
"left_hand_level_label": "low",
"right_hand_level_label": "low",
"left_hand_height": 0.0712,
"right_hand_height": 0.0591,
"left_hand_depth": -0.8676,
"right_hand_depth": -1.0745,
"gesture_left": "open_hand",
"gesture_right": "open_hand",
"eye_openness": 0.009172727272727272,
"gaze_direction": null,
"emotion": "happy",
"gaze_direction_x": "center",
"gaze_direction_y": "center",
"gaze_angle_x": 2.4581818181818185,
"gaze_angle_y": -0.06000000000000001,
"mouth_width": 0.08699408173561096,
"mouth_open": 2.0802021026611328e-05,
"smile_ratio": 3990.1842874762315
}
],
"gesture_summary_word_sync": {
"filler_summary": {
"hands_on_hips": 0.36363636363636365,
"left_hand_level_label": "low",
"right_hand_level_label": "low",
"gesture_left": "open_hand",
"gesture_right": "open_hand",
"emotion": "happy",
"frame_times": [4.43, 5.9, 6.5, 8.13]
},
"pause_summary": {},
"sentiment_positive_summary": {},
"sentiment_negative_summary": {}
}
}
Key Metrics Explained
Speech Analytics:
duration_sec
: Total video duration in secondsfiller_word_count
: Number of filler words detected (um, uh, etc.)words_per_minute
: Speaking pacespeech_to_pause_ratio
: Ratio of speech time to pause timereadability_grade
: Complexity level of language usedlexical_diversity
: Vocabulary richness (unique words / total words)
Visual Analytics:
gesture_summary
: Count of different hand gestures detectedposture_summary
: Time spent in different postureseye_openness
: Eye engagement level (0-1 scale)gaze_direction_x/y
: Where the person is looking (left/center/right, up/center/down)smile_ratio
: Facial expression positivitygesture_summary_word_sync
: Correlates gestures with specific speech patterns
Retrieving Results
Once the analysis is complete, use the Get Status endpoint to retrieve the results:
- The
result
field will contain the analysis data - Speech analytics are always included
- Visual analytics are included if requested during submission
For detailed response formats and error codes, please see the API Reference (Swagger).