CS 153

Amit Jain from Luma AI on Unified Intelligence Systems

In week three of CS 153, the instructor hosts Amit Jain from Luma to discuss ?Unified Intelligence Systems? as a follow-up to a prior lecture on visual intelligence. Jain recounts his Apple work on LiDAR for projects including Titan and Vision Pro, and how early exploration of generative models and differentiable 3D led to founding Luma with an initial focus on large-scale 3D capture. Luma then shifted to generative video in 2023 to leverage the scale of internet video data, releasing the Dream Machine model in March 2024 and rapidly reaching millions of users, while building preference-based feedback loops and human annotation pipelines. Jain explains Luma?s multimodal AI factory?pretraining, post-training, deployment, and reinforcement learning?its security constraints for studio clients, and a move toward unified transformer architectures that jointly reason across text, images, video, and audio to enable end-to-end creative and professional workflows.

2026-04-17
Link to episode

Andreas Blattmann from Black Forest Labs on Frontier Visual Intelligence Systems

In this CS 153 ?Frontier Systems? session, Anjney Midha welcomes Andreas Blattmann, co-founder of Black Forest Labs and co-creator of Stable Diffusion, for a discussion on the visual intelligence frontier and how frontier AI ?factories? scale. Blattmann recounts his path from mechanical engineering to a Heidelberg PhD lab, developing latent diffusion to train image generators efficiently and enabling Stable Diffusion?s 2022 release. They contrast earlier unimodal content-creation models with today?s push toward unified multimodal systems spanning images, video, and audio, plus action prediction for computer use and robotics, emphasizing observation and interaction loops. Using Flux as a case study, they cover pre-training, mid-training, post-training, distillation for speed, customer feedback driving image editing and character consistency, and why open weights enable customization. They also discuss Self Flow for multimodal alignment, safety guardrails, EU compliance, data labeling strategies, diffusion vs autoregressive tradeoffs, and skepticism about explicit 3D representations.

2026-04-11
Link to episode

Mati Staniszewski from ElevenLabs on The Future of Voice Systems

In week two of CS 153 ("AI Coachella"), Anjney Midha interviews Mati Staniszewski, founder and CEO of ElevenLabs, tracing the company?s origins from an early Discord text-to-speech bot to a fast-growing frontier audio and speech platform. Mati explains ElevenLabs? initial focus on solving AI dubbing inspired by Poland?s single-voice film narration, the shift to prioritizing emotional, natural-sounding text-to-speech for creators, and the evolution from cascaded pipelines (transcription, translation/LLM, and speech generation) toward real-time voice agents. They discuss tradeoffs between cascaded versus fused multimodal systems, efforts to detect and convey emotion, safety and voice authentication limits, on-device model deployment, collaboration with teams like Sesame, and business lessons on PLG plus enterprise deployment, team structure, pricing from customer value, and growth to over $430M revenue with ~450 employees.

2026-04-10
Link to episode

Anjney Midha from AMP PBC on Frontier Systems

Anjney Midha opens the quarter of Stanford?s CS 153 Frontier Systems by framing the course as a speaker-led ?AI Coachella,? emphasizing relationships, fun, and ?obsessing over what you love? as a life heuristic. He introduces his background and the course goal of real-world preparedness, then outlines the modern AI stack from capital and data centers through chips, cloud, models, applications, and governance. Midha reviews how AI development has industrialized?especially reinforcement learning and continuous post-training?and argues that ?context? and verifiable feedback loops determine where progress accelerates and where value accrues, citing examples like IDE access conflicts and sovereign AI needs. He then deep-dives on compute infrastructure, showing how capabilities and revenue correlate with compute buildouts, why GPU prices can rise, how infrastructure cycles resemble past commodity booms, and why compute remains non-fungible without standards and institutions.

2026-04-09
Link to episode

Good podcast

Subscribe

Website

Episodes

Amit Jain from Luma AI on Unified Intelligence Systems

Andreas Blattmann from Black Forest Labs on Frontier Visual Intelligence Systems

Mati Staniszewski from ElevenLabs on The Future of Voice Systems

Anjney Midha from AMP PBC on Frontier Systems