Qwen3-VL - Vision Language Model
Powered by Qwen3-VL-235B-A22B-Instruct on ZeroGPU.
Capabilities:
- Image understanding and VQA
- Video analysis and description
- OCR and text extraction
- Multi-frame temporal reasoning
API Endpoints for EagleEye:
POST /call/api_analyze_image- Single image analysisPOST /call/api_analyze_video- Video analysisPOST /call/api_analyze_frames- Multi-frame analysis
API Usage for EagleEye Integration
Image Analysis
from gradio_client import Client
client = Client("magboola/qwen3vl-zerogpu")
result = client.predict(
image_base64="base64_encoded_image",
prompt="What is in this image?",
task="vqa",
max_tokens=1024,
api_name="/api_analyze_image"
)
Video Analysis
result = client.predict(
video_url="https://example.com/video.mp4",
prompt="Describe what happens in this video.",
task="describe",
max_tokens=2048,
fps=1.0,
api_name="/api_analyze_video"
)
Multi-Frame Analysis
import json
result = client.predict(
frames_base64=json.dumps(["frame1_b64", "frame2_b64", ...]),
timestamps=json.dumps([0.0, 1.0, 2.0, ...]),
prompt="What action is being performed?",
max_tokens=512,
api_name="/api_analyze_frames"
)