Iterate on Your DataNot Your Infrastructure
1# Video → frames → detections → export2import pixeltable as pxt3from pixeltable.iterators import FrameIterator4from pixeltable.functions import yolox, openai56# 01 Acquire — create table with multimodal types7videos = pxt.create_table('ml.videos', {8 'video': pxt.Video,9 'title': pxt.String,10 'source': pxt.String,11})1213# 02 Enrich — extract frames, detect objects, describe14frames = pxt.create_view('ml.frames', videos,15 iterator=FrameIterator.create(video=videos.video, fps=1)16)17frames.add_computed_column(18 detections=yolox(frames.frame, model_id='yolox_s', threshold=0.5)19)20frames.add_computed_column(21 caption=openai.vision(22 prompt='Describe this frame in one sentence.',23 image=frames.frame, model='gpt-4o-mini'24 )25)2627# 03 Curate — filter and query enriched data28results = frames.where(frames.caption.like('%person%')).order_by(29 frames.pos_msec30).select(frames.frame, frames.detections, frames.caption).collect()3132# 04 Export — to ML-ready formats33from pixeltable.io import export_parquet34export_parquet(frames, 'training_data/')35df = frames.select(frames.frame, frames.detections).collect().to_pandas()
| Column | Type | Computed With |
|---|---|---|
| video | Video | |
| title | String | |
| source | String |
| Column | Type | Computed With |
|---|---|---|
| frame | Image | FrameIterator(fps=1) |
| detections | Json | yolox(frame) |
| caption | String | openai.vision(frame) |
Multimodal Data, Made Simple
Video, audio, images, and documents as first-class data types — with storage, orchestration, and retrieval unified under one table interface.
Replace Complexitywith One-Liners
Every capability that used to require a separate system is now a single function call.
| Capability | Traditional Approach | What You Write |
|---|---|---|
| Video storage | S3 bucket + IAM + upload scripts | pxt.create_table(..., {'video': pxt.Video}) |
| Frame extraction | FFmpeg scripts + output management | FrameIterator.create(video=..., fps=1) |
| Object detection | Model serving + GPU management + batch scripts | add_computed_column(detections=yolox(...)) |
| Vision descriptions | API client + retry logic + rate limiting + result storage | add_computed_column(description=openai.vision(...)) |
| Audio extraction | FFmpeg + temp file management | add_computed_column(audio=extract_audio(...)) |
| Transcription | Whisper API client + chunking + storage | add_computed_column(transcript=transcribe(...)) |
| Image search | CLIP embedding + Pinecone + sync scripts | add_embedding_index('frame', embedding=clip.using(...)) |
| Text search | Another embedding pipeline + another index | add_embedding_index('description', string_embed=...) |
| Orchestration | Airflow DAG + dependency config + monitoring | Automatic — insert triggers everything |
| Versioning | Custom tracking across all services | Automatic — table.history(), table.revert() |
| Incremental updates | Custom diffing logic per service | Automatic — only new/changed rows process |
The Anatomy of a Multimodal AI App
Clone the Starter Kit and ship a full-stack multimodal AI app in minutes. Upload docs, images, and videos — search across all of them — chat with an 8-step tool-calling agent.
1import pixeltable as pxt2from pixeltable.functions import openai, image as pxt_image3from pixeltable.functions.anthropic import messages, invoke_tools4from pixeltable.functions.document import document_splitter5from pixeltable.functions.huggingface import sentence_transformer, clip6from pixeltable.functions.video import extract_audio, frame_iterator7import config, functions89pxt.create_dir("app", if_exists="ignore")10sentence_embed = sentence_transformer.using(model_id=config.EMBEDDING_MODEL_ID)1112# ── 1. Document Pipeline ─────────────────────────────────────13documents = pxt.create_table("app.documents", {14 "document": pxt.Document, "timestamp": pxt.Timestamp,15})16chunks = pxt.create_view("app.chunks", documents,17 iterator=document_splitter(document=documents.document,18 separators="page, sentence", metadata="title, heading, page"),19)20chunks.add_embedding_index("text", string_embed=sentence_embed)2122@pxt.query23def search_documents(query_text: str):24 sim = chunks.text.similarity(query_text)25 return chunks.where(sim > 0.5).order_by(sim, asc=False).limit(20)2627# ── 2. Image Pipeline ────────────────────────────────────────28images = pxt.create_table("app.images", {29 "image": pxt.Image, "timestamp": pxt.Timestamp,30})31images.add_computed_column(32 thumbnail=pxt_image.b64_encode(pxt_image.thumbnail(images.image, size=(320, 320)))33)34images.add_embedding_index("image",35 embedding=clip.using(model_id=config.CLIP_MODEL_ID))3637# ── 3. Video Pipeline ────────────────────────────────────────38videos = pxt.create_table("app.videos", {39 "video": pxt.Video, "timestamp": pxt.Timestamp,40})41video_frames = pxt.create_view("app.video_frames", videos,42 iterator=frame_iterator(video=videos.video, keyframes_only=True))43video_frames.add_embedding_index("frame",44 embedding=clip.using(model_id=config.CLIP_MODEL_ID))45videos.add_computed_column(audio=extract_audio(videos.video, format="mp3"))46# audio → Whisper transcription → sentence splitting → embedding (chained views)4748# ── 4. Chat History ──────────────────────────────────────────49chat_history = pxt.create_table("app.chat_history", {50 "role": pxt.String, "content": pxt.String,51 "conversation_id": pxt.String, "timestamp": pxt.Timestamp,52})53chat_history.add_embedding_index("content", string_embed=sentence_embed)5455# ── 5. Agent Pipeline (8-step tool-calling workflow) ─────────56tools = pxt.tools(functions.web_search, search_video_transcripts)5758agent = pxt.create_table("app.agent", {59 "prompt": pxt.String, "timestamp": pxt.Timestamp,60 "initial_system_prompt": pxt.String,61 "final_system_prompt": pxt.String,62 "max_tokens": pxt.Int, "temperature": pxt.Float,63})6465# Step 1: Initial LLM call with tool selection66agent.add_computed_column(67 initial_response=messages(68 model=config.CLAUDE_MODEL_ID,69 messages=[{"role": "user", "content": agent.prompt}],70 tools=tools, tool_choice=tools.choice(required=True),71 )72)73# Step 2: Execute selected tools74agent.add_computed_column(tool_output=invoke_tools(tools, agent.initial_response))75# Step 3: Parallel RAG context retrieval76agent.add_computed_column(doc_context=search_documents(agent.prompt))77agent.add_computed_column(image_context=search_images(agent.prompt))78agent.add_computed_column(video_frame_context=search_video_frames(agent.prompt))79agent.add_computed_column(chat_memory_context=search_chat_history(agent.prompt))80# Steps 4-6: Assemble multimodal context + final messages81agent.add_computed_column(82 multimodal_context=functions.assemble_context(83 agent.prompt, agent.tool_output, agent.doc_context, agent.chat_memory_context,84 )85)86agent.add_computed_column(87 final_messages=functions.assemble_final_messages(88 agent.history_context, agent.multimodal_context,89 image_context=agent.image_context, video_frame_context=agent.video_frame_context,90 )91)92# Step 7: Final LLM reasoning93agent.add_computed_column(94 final_response=messages(model=config.CLAUDE_MODEL_ID, messages=agent.final_messages)95)96# Step 8: Extract answer97agent.add_computed_column(answer=agent.final_response.content[0].text)
5 data pipelines + 8-step agent workflow — all declarative. One file replaces hundreds of lines of glue code.
Developing with AI Tools
Pixeltable's declarative API means AI coding assistants get it right on the first try. Ten lines of code gives you a persistent, versioned, incrementally-optimized pipeline.
Your Backend for Multimodal AI
pip install pixeltableYour entire AI data stack| Instead of ... | Pixeltable gives you ... |
|---|---|
| PostgreSQL / MySQL | pxt.create_table()— schema is Python, versioned automatically |
| Pinecone / Weaviate / Qdrant | add_embedding_index()— one line, stays in sync |
| S3 / boto3 / blob storage | pxt.Image / Video / Audio / Document— native types with caching |
| Airflow / Prefect / Celery | Computed columns— trigger on insert — no orchestrator needed |
| LangChain / LlamaIndex (RAG) | @pxt.query + .similarity()— computed column chaining |
| pandas / polars (multimodal) | .sample(), add_computed_column()— prototype to production |
| DVC / MLflow / W&B | history(), revert(), time travel— built-in snapshots |
| Custom retry / rate-limit / caching | Built into every AI integration— results cached, only new rows recomputed |
What Can You Build?
Pixeltable's primitives compose into any multimodal AI workflow
Everything You Need to Know
Common questions about building with Pixeltable
Every Era of DataGets an Owner
Oracle for relational. Snowflake for analytics. Databricks for batch.
The multimodal data plane is next.