Multimodal AI Data Infrastructure
The only open source Python library providing incremental storage, transformation, indexing, and orchestration of multimodal data.
$pip install pixeltable
Examples
ESSENTIALS
USER STORIES
ADVANCED
Key Features
Multimodal Storage
Images, videos, audio, docs
Incremental Updates
Only process what changed
Vector Search
Built-in similarity search
Versioning
Time travel & lineage
Monday 9:00 AM: New autonomous vehicle data arrives from fleet
2TB processed, 50K frames prioritized in 30 minutes vs. 3 days manually
ml-engineer_example.py
1# Connect to existing data sources without migration2vehicles = pxt.create_table('fleet.raw_data', {3 'video': pxt.Video, # S3 references, no data movement4 'sensor_metadata': pxt.Json, # From existing RDBMS5 'route_id': pxt.String,6 'weather': pxt.String7})89# Import weekend's data (2TB of video + metadata)10vehicles.insert_from_s3('s3://fleet-data/2025-01-06/')11vehicles.sync_metadata_from_db('postgresql://fleet_db/sensor_readings')1213# Automatic frame extraction with YOLOX detection14frames = pxt.create_view('fleet.frames', vehicles,15 iterator=FrameIterator.create(video=vehicles.video, fps=1))1617frames.add_computed_column(18 detections=yolox(frames.frame, model_id='yolox_l', threshold=0.6)19)2021# Quality assessment for annotation priority22@pxt.udf23def annotation_priority(detections: dict, weather: str) -> float:24 edge_cases = ['fog', 'rain', 'construction']25 weather_mult = 2.0 if weather in edge_cases else 1.026 confidence_penalty = 1.0 - detections.get('avg_confidence', 0.8)27 return weather_mult * confidence_penalty2829frames.add_computed_column(30 priority=annotation_priority(frames.detections, vehicles.weather)31)3233# Send high-priority frames to Label Studio34high_priority = frames.where(frames.priority > 1.5)35pxt.io.sync_label_studio_project(36 ls_project_name='highway-fog-annotations',37 view=high_priority,38 config=label_studio_config39)