Machine Learning Street Talk (MLST)

Eiso Kant (CTO poolside) - Superhuman Coding Is Coming!

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Eiso Kant, CTO of poolside AI, discusses the company's approach to building frontier AI foundation models, particularly focused on software development. Their unique strategy is reinforcement learning from code execution feedback which is an important axis for scaling AI capabilities beyond just increasing model size or data volume. Kant predicts human-level AI in knowledge work could be achieved within 18-36 months, outlining poolside's vision to dramatically increase software development productivity and accessibility.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

Eiso Kant:

https://x.com/eisokant

https://poolside.ai/

TRANSCRIPT:

https://www.dropbox.com/scl/fi/szepl6taqziyqie9wgmk9/poolside.pdf?rlkey=iqar7dcwshyrpeoz0xa76k422&dl=0

TOC:

1. Foundation Models and AI Strategy

[00:00:00] 1.1 Foundation Models and Timeline Predictions for AI Development

[00:02:55] 1.2 Poolside AI's Corporate History and Strategic Vision

[00:06:48] 1.3 Foundation Models vs Enterprise Customization Trade-offs

2. Reinforcement Learning and Model Economics

[00:15:42] 2.1 Reinforcement Learning and Code Execution Feedback Approaches

[00:22:06] 2.2 Model Economics and Experimental Optimization

3. Enterprise AI Implementation

[00:25:20] 3.1 Poolside's Enterprise Deployment Strategy and Infrastructure

[00:26:00] 3.2 Enterprise-First Business Model and Market Focus

[00:27:05] 3.3 Foundation Models and AGI Development Approach

[00:29:24] 3.4 DeepSeek Case Study and Infrastructure Requirements

4. LLM Architecture and Performance

[00:30:15] 4.1 Distributed Training and Hardware Architecture Optimization

[00:33:01] 4.2 Model Scaling Strategies and Chinchilla Optimality Trade-offs

[00:36:04] 4.3 Emergent Reasoning and Model Architecture Comparisons

[00:43:26] 4.4 Balancing Creativity and Determinism in AI Models

[00:50:01] 4.5 AI-Assisted Software Development Evolution

5. AI Systems Engineering and Scalability

[00:58:31] 5.1 Enterprise AI Productivity and Implementation Challenges

[00:58:40] 5.2 Low-Code Solutions and Enterprise Hiring Trends

[01:01:25] 5.3 Distributed Systems and Engineering Complexity

[01:01:50] 5.4 GenAI Architecture and Scalability Patterns

[01:01:55] 5.5 Scaling Limitations and Architectural Patterns in AI Code Generation

6. AI Safety and Future Capabilities

[01:06:23] 6.1 Semantic Understanding and Language Model Reasoning Approaches

[01:12:42] 6.2 Model Interpretability and Safety Considerations in AI Systems

[01:16:27] 6.3 AI vs Human Capabilities in Software Development

[01:33:45] 6.4 Enterprise Deployment and Security Architecture

CORE REFS (see shownotes for URLs/more refs):

[00:15:45] Research demonstrating how training on model-generated content leads to distribution collapse in AI models, Ilia Shumailov et al. (Key finding on synthetic data risk)

[00:20:05] Foundational paper introducing Word2Vec for computing word vector representations, Tomas Mikolov et al. (Seminal NLP technique)

[00:22:15] OpenAI O3 model's breakthrough performance on ARC Prize Challenge, OpenAI (Significant AI reasoning benchmark achievement)

[00:22:40] Seminal paper proposing a formal definition of intelligence as skill-acquisition efficiency, François Chollet (Influential AI definition/philosophy)

[00:30:30] Technical documentation of DeepSeek's V3 model architecture and capabilities, DeepSeek AI (Details on a major new model)

[00:34:30] Foundational paper establishing optimal scaling laws for LLM training, Jordan Hoffmann et al. (Key paper on LLM scaling)

[00:45:45] Seminal essay arguing that scaling computation consistently trumps human-engineered solutions in AI, Richard S. Sutton (Influential "Bitter Lesson" perspective)

2025-04-02
Link to episode

The Compendium - Connor Leahy and Gabriel Alfour

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Connor Leahy and Gabriel Alfour, AI researchers from Conjecture and authors of "The Compendium," joinus for a critical discussion centered on Artificial Superintelligence (ASI) safety and governance. Drawing from their comprehensive analysis in "The Compendium," they articulate a stark warning about the existential risks inherent in uncontrolled AI development, framing it through the lens of "intelligence domination"?where a sufficiently advanced AI could subordinate humanity, much like humans dominate less intelligent species.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT + REFS + NOTES:

https://www.dropbox.com/scl/fi/p86l75y4o2ii40df5t7no/Compendium.pdf?rlkey=tukczgf3flw133sr9rgss0pnj&dl=0

https://www.thecompendium.ai/

https://en.wikipedia.org/wiki/Connor_Leahy

https://www.conjecture.dev/about

https://substack.com/@gabecc?

TOC:

1. AI Intelligence and Safety Fundamentals

[00:00:00] 1.1 Understanding Intelligence and AI Capabilities

[00:06:20] 1.2 Emergence of Intelligence and Regulatory Challenges

[00:10:18] 1.3 Human vs Animal Intelligence Debate

[00:18:00] 1.4 AI Regulation and Risk Assessment Approaches

[00:26:14] 1.5 Competing AI Development Ideologies

2. Economic and Social Impact

[00:29:10] 2.1 Labor Market Disruption and Post-Scarcity Scenarios

[00:32:40] 2.2 Institutional Frameworks and Tech Power Dynamics

[00:37:40] 2.3 Ethical Frameworks and AI Governance Debates

[00:40:52] 2.4 AI Alignment Evolution and Technical Challenges

3. Technical Governance Framework

[00:55:07] 3.1 Three Levels of AI Safety: Alignment, Corrigibility, and Boundedness

[00:55:30] 3.2 Challenges of AI System Corrigibility and Constitutional Models

[00:57:35] 3.3 Limitations of Current Boundedness Approaches

[00:59:11] 3.4 Abstract Governance Concepts and Policy Solutions

4. Democratic Implementation and Coordination

[00:59:20] 4.1 Governance Design and Measurement Challenges

[01:00:10] 4.2 Democratic Institutions and Experimental Governance

[01:14:10] 4.3 Political Engagement and AI Safety Advocacy

[01:25:30] 4.4 Practical AI Safety Measures and International Coordination

CORE REFS:

[00:01:45] The Compendium (2023), Leahy et al.

https://pdf.thecompendium.ai/the_compendium.pdf

[00:06:50] Geoffrey Hinton Leaves Google, BBC News

https://www.bbc.com/news/world-us-canada-65452940

[00:10:00] ARC-AGI, Chollet

https://arcprize.org/arc-agi

[00:13:25] A Brief History of Intelligence, Bennett

https://www.amazon.com/Brief-History-Intelligence-Humans-Breakthroughs/dp/0063286343

[00:25:35] Statement on AI Risk, Center for AI Safety

https://www.safe.ai/work/statement-on-ai-risk

[00:26:15] Machines of Love and Grace, Amodei

https://darioamodei.com/machines-of-loving-grace

[00:26:35] The Techno-Optimist Manifesto, Andreessen

https://a16z.com/the-techno-optimist-manifesto/

[00:31:55] Techno-Feudalism, Varoufakis

https://www.amazon.co.uk/Technofeudalism-Killed-Capitalism-Yanis-Varoufakis/dp/1847927270

[00:42:40] Introducing Superalignment, OpenAI

https://openai.com/index/introducing-superalignment/

[00:47:20] Three Laws of Robotics, Asimov

https://www.britannica.com/topic/Three-Laws-of-Robotics

[00:50:00] Symbolic AI (GOFAI), Haugeland

https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence

[00:52:30] Intent Alignment, Christiano

https://www.alignmentforum.org/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety

[00:55:10] Large Language Model Alignment: A Survey, Jiang et al.

http://arxiv.org/pdf/2309.15025

[00:55:40] Constitutional Checks and Balances, Bok

https://plato.stanford.edu/entries/montesquieu/

<trunc, see PDF>

2025-03-30
Link to episode

ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge.

https://arcprize.org/

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT:

https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0

TOC:

1. ARC v2 Core Design & Objectives

[00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture

[00:03:16] 1.2 Test-Time Optimization and AGI Assessment

[00:06:24] 1.3 Human-AI Capability Analysis

[00:13:02] 1.4 OpenAI o3 Initial Performance Results

2. ARC Technical Evolution

[00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements

[00:21:12] 2.2 Human Validation Methodology

[00:26:05] 2.3 Task Design and Gaming Prevention

[00:29:11] 2.4 Intelligence Measurement Framework

3. O3 Performance & Future Challenges

[00:38:50] 3.1 O3 Comprehensive Performance Analysis

[00:43:40] 3.2 System Limitations and Failure Modes

[00:49:30] 3.3 Program Synthesis Applications

[00:53:00] 3.4 Future Development Roadmap

REFS:

[00:00:15] On the Measure of Intelligence, François Chollet

https://arxiv.org/abs/1911.01547

[00:06:45] ARC Prize Foundation, François Chollet, Mike Knoop

https://arcprize.org/

[00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Team

https://arcprize.org/blog/oai-o3-pub-breakthrough

[00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al.

https://arxiv.org/abs/2201.11903

[00:21:45] ARC-v2 benchmark tasks, Mike Knoop

https://arcprize.org/blog/introducing-arc-agi-public-leaderboard

[00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al.

https://arxiv.org/html/2412.04604v2

[00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradt

https://arxiv.org/abs/2412.04604

[00:48:55] The Bitter Lesson, Rich Sutton

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

[00:53:30] Decoding strategies in neural text generation, Sina Zarrieß

https://www.mdpi.com/2078-2489/12/9/355/pdf

2025-03-24
Link to episode

Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT + REFS:

https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0

Mohamed Osman (Tufa Labs)

https://x.com/MohamedOsmanML

Jack Cole (Tufa Labs)

https://x.com/MindsAI_Jack

How and why deep learning for ARC paper:

https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdf

TOC:

1. Abstract Reasoning Foundations

[00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview

[00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning

[00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture

[00:20:26] 1.4 Technical Implementation with Long T5 Model

2. ARC Solution Architectures

[00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions

[00:27:54] 2.2 Model Generalization and Function Generation Challenges

[00:32:53] 2.3 Input Representation and VLM Limitations

[00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration

[00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches

3. Advanced Systems Integration

[00:43:00] 3.1 DreamCoder Evolution and LLM Integration

[00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs

[00:54:15] 3.3 ARC v2 Development and Performance Scaling

[00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations

[01:01:50] 3.5 Neural Architecture Optimization and Processing Distribution

REFS:

[00:01:32] Original ARC challenge paper, François Chollet

https://arxiv.org/abs/1911.01547

[00:06:55] DreamCoder, Kevin Ellis et al.

https://arxiv.org/abs/2006.08381

[00:12:50] Deep Learning with Python, François Chollet

https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438

[00:13:35] Deep Learning with Python, François Chollet

https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438

[00:13:35] Influence of pretraining data for reasoning, Laura Ruis

https://arxiv.org/abs/2411.12580

[00:17:50] Latent Program Networks, Clement Bonnet

https://arxiv.org/html/2411.08706v1

[00:20:50] T5, Colin Raffel et al.

https://arxiv.org/abs/1910.10683

[00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.

https://arxiv.org/abs/2411.02272

[00:34:15] Six finger problem, Chen et al.

https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf

[00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AI

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B

[00:40:10] ARC Prize 2024 Technical Report, François Chollet et al.

https://arxiv.org/html/2412.04604v2

[00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellis

https://arxiv.org/html/2503.15540

[00:54:25] Abstraction and Reasoning Corpus, François Chollet

https://github.com/fchollet/ARC-AGI

[00:57:10] O3 breakthrough on ARC-AGI, OpenAI

https://arcprize.org/

[00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchell

https://arxiv.org/abs/2305.07141

[01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.

http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf

2025-03-22
Link to episode

GSMSymbolic paper - Iman Mirzadeh (Apple)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Iman Mirzadeh from Apple, who recently published the GSM-Symbolic paper discusses the crucial distinction between intelligence and achievement in AI systems. He critiques current AI research methodologies, highlighting the limitations of Large Language Models (LLMs) in reasoning and knowledge representation.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT + RESEARCH:

https://www.dropbox.com/scl/fi/mlcjl9cd5p1kem4l0vqd3/IMAN.pdf?rlkey=dqfqb74zr81a5gqr8r6c8isg3&dl=0

TOC:

1. Intelligence vs Achievement in AI Systems

[00:00:00] 1.1 Intelligence vs Achievement Metrics in AI Systems

[00:03:27] 1.2 AlphaZero and Abstract Understanding in Chess

[00:10:10] 1.3 Language Models and Distribution Learning Limitations

[00:14:47] 1.4 Research Methodology and Theoretical Frameworks

2. Intelligence Measurement and Learning

[00:24:24] 2.1 LLM Capabilities: Interpolation vs True Reasoning

[00:29:00] 2.2 Intelligence Definition and Measurement Approaches

[00:34:35] 2.3 Learning Capabilities and Agency in AI Systems

[00:39:26] 2.4 Abstract Reasoning and Symbol Understanding

3. LLM Performance and Evaluation

[00:47:15] 3.1 Scaling Laws and Fundamental Limitations

[00:54:33] 3.2 Connectionism vs Symbolism Debate in Neural Networks

[00:58:09] 3.3 GSM-Symbolic: Testing Mathematical Reasoning in LLMs

[01:08:38] 3.4 Benchmark Evaluation and Model Performance Assessment

REFS:

[00:01:00] AlphaZero chess AI system, Silver et al.

https://arxiv.org/abs/1712.01815

[00:07:10] Game Changer: AlphaZero's Groundbreaking Chess Strategies, Sadler & Regan

https://www.amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184

[00:11:35] Cross-entropy loss in language modeling, Voita

http://lena-voita.github.io/nlp_course/language_modeling.html

[00:17:20] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in LLMs, Mirzadeh et al.

https://arxiv.org/abs/2410.05229

[00:21:25] Connectionism and Cognitive Architecture: A Critical Analysis, Fodor & Pylyshyn

https://www.sciencedirect.com/science/article/pii/001002779090014B

[00:28:55] Brain-to-body mass ratio scaling laws, Sutskever

https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training

[00:29:40] On the Measure of Intelligence, Chollet

https://arxiv.org/abs/1911.01547

[00:33:30] On definition of intelligence, Gignac et al.

https://www.sciencedirect.com/science/article/pii/S0160289624000266

[00:35:30] Defining intelligence, Wang

https://cis.temple.edu/~wangp/papers.html

[00:37:40] How We Learn: Why Brains Learn Better Than Any Machine... for Now, Dehaene

https://www.amazon.com/How-We-Learn-Brains-Machine/dp/0525559884

[00:39:35] Surfaces and Essences: Analogy as the Fuel and Fire of Thinking, Hofstadter and Sander

https://www.amazon.com/Surfaces-Essences-Analogy-Fuel-Thinking/dp/0465018475

[00:43:15] Chain-of-thought prompting, Wei et al.

https://arxiv.org/abs/2201.11903

[00:47:20] Test-time scaling laws in machine learning, Brown

https://podcasts.apple.com/mv/podcast/openais-noam-brown-ilge-akkaya-and-hunter-lightman-on/id1750736528?i=1000671532058

[00:47:50] Scaling Laws for Neural Language Models, Kaplan et al.

https://arxiv.org/abs/2001.08361

[00:55:15] Tensor product variable binding, Smolensky

https://www.sciencedirect.com/science/article/abs/pii/000437029090007M

[01:08:45] GSM-8K dataset, OpenAI

https://huggingface.co/datasets/openai/gsm8k

2025-03-19
Link to episode

Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Dr. Max Bartolo from Cohere discusses machine learning model development, evaluation, and robustness. Key topics include model reasoning, the DynaBench platform for dynamic benchmarking, data-centric AI development, model training challenges, and the limitations of human feedback mechanisms. The conversation also covers technical aspects like influence functions, model quantization, and the PRISM project.

Max Bartolo (Cohere):

https://www.maxbartolo.com/

https://cohere.com/command

TRANSCRIPT:

https://www.dropbox.com/scl/fi/vujxscaffw37pqgb6hpie/MAXB.pdf?rlkey=0oqjxs5u49eqa2m7uaol64lbw&dl=0

TOC:

1. Model Reasoning and Verification

[00:00:00] 1.1 Model Consistency and Reasoning Verification

[00:03:25] 1.2 Influence Functions and Distributed Knowledge Analysis

[00:10:28] 1.3 AI Application Development and Model Deployment

[00:14:24] 1.4 AI Alignment and Human Feedback Limitations

2. Evaluation and Bias Assessment

[00:20:15] 2.1 Human Evaluation Challenges and Factuality Assessment

[00:27:15] 2.2 Cultural and Demographic Influences on Model Behavior

[00:32:43] 2.3 Adversarial Examples and Model Robustness

3. Benchmarking Systems and Methods

[00:41:54] 3.1 DynaBench and Dynamic Benchmarking Approaches

[00:50:02] 3.2 Benchmarking Challenges and Alternative Metrics

[00:50:33] 3.3 Evolution of Model Benchmarking Methods

[00:51:15] 3.4 Hierarchical Capability Testing Framework

[00:52:35] 3.5 Benchmark Platforms and Tools

4. Model Architecture and Performance

[00:55:15] 4.1 Cohere's Model Development Process

[01:00:26] 4.2 Model Quantization and Performance Evaluation

[01:05:18] 4.3 Reasoning Capabilities and Benchmark Standards

[01:08:27] 4.4 Training Progression and Technical Challenges

5. Future Directions and Challenges

[01:13:48] 5.1 Context Window Evolution and Trade-offs

[01:22:47] 5.2 Enterprise Applications and Future Challenges

REFS:

[00:03:10] Research at Cohere with Laura Ruis et al., Max Bartolo, Laura Ruis et al.

https://cohere.com/research/papers/procedural-knowledge-in-pretraining-drives-reasoning-in-large-language-models-2024-11-20

[00:04:15] Influence functions in machine learning, Koh & Liang

https://arxiv.org/abs/1703.04730

[00:08:05] Studying Large Language Model Generalization with Influence Functions, Roger Grosse et al.

https://storage.prod.researchhub.com/uploads/papers/2023/08/08/2308.03296.pdf

[00:11:10] The LLM ARChitect: Solving ARC-AGI Is A Matter of Perspective, Daniel Franzen, Jan Disselhoff, and David Hartmann

https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

[00:12:10] Hugging Face model repo for C4AI Command A, Cohere and Cohere For AI

https://huggingface.co/CohereForAI/c4ai-command-a-03-2025

[00:13:30] OpenInterpreter

https://github.com/KillianLucas/open-interpreter

[00:16:15] Human Feedback is not Gold Standard, Tom Hosking, Max Bartolo, Phil Blunsom

https://arxiv.org/abs/2309.16349

[00:27:15] The PRISM Alignment Dataset, Hannah Kirk et al.

https://arxiv.org/abs/2404.16019

[00:32:50] How adversarial examples arise, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry

https://arxiv.org/abs/1905.02175

[00:43:00] DynaBench platform paper, Douwe Kiela et al.

https://aclanthology.org/2021.naacl-main.324.pdf

[00:50:15] Sara Hooker's work on compute limitations, Sara Hooker

https://arxiv.org/html/2407.05694v1

[00:53:25] DataPerf: Community-led benchmark suite, Mazumder et al.

https://arxiv.org/abs/2207.10062

[01:04:35] DROP, Dheeru Dua et al.

https://arxiv.org/abs/1903.00161

[01:07:05] GSM8k, Cobbe et al.

https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k

[01:09:30] ARC, François Chollet

https://github.com/fchollet/ARC-AGI

[01:15:50] Command A, Cohere

https://cohere.com/blog/command-a

[01:22:55] Enterprise search using LLMs, Cohere

https://cohere.com/blog/commonly-asked-questions-about-search-from-coheres-enterprise-customers

2025-03-19
Link to episode

Tau Language: The Software Synthesis Future (sponsored)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

This sponsored episode features mathematician Ohad Asor discussing logical approaches to AI, focusing on the limitations of machine learning and introducing the Tau language for software development and blockchain tech. Asor argues that machine learning cannot guarantee correctness. Tau allows logical specification of software requirements, automatically creating provably correct implementations with potential to revolutionize distributed systems. The discussion highlights program synthesis, software updates, and applications in finance and governance.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + RESEARCH:https://www.dropbox.com/scl/fi/t849j6v1juk3gc15g4rsy/TAU.pdf?rlkey=hh11h2mhog3ncdbeapbzpzctc&dl=0Tau:https://tau.net/Tau Language:https://tau.ai/tau-language/Research:https://tau.net/Theories-and-Applications-of-Boolean-Algebras-0.29.pdfTOC:1. Machine Learning Foundations and Limitations [00:00:00] 1.1 Fundamental Limitations of Machine Learning and PAC Learning Theory [00:04:50] 1.2 Transductive Learning and the Three Curses of Machine Learning [00:08:57] 1.3 Language, Reality, and AI System Design [00:12:58] 1.4 Program Synthesis and Formal Verification Approaches2. Logical Programming Architecture [00:31:55] 2.1 Safe AI Development Requirements [00:32:05] 2.2 Self-Referential Language Architecture [00:32:50] 2.3 Boolean Algebra and Logical Foundations [00:37:52] 2.4 SAT Solvers and Complexity Challenges [00:44:30] 2.5 Program Synthesis and Specification [00:47:39] 2.6 Overcoming Tarski's Undefinability with Boolean Algebra [00:56:05] 2.7 Tau Language Implementation and User Control3. Blockchain-Based Software Governance [01:09:10] 3.1 User Control and Software Governance Mechanisms [01:18:27] 3.2 Tau's Blockchain Architecture and Meta-Programming Capabilities [01:21:43] 3.3 Development Status and Token Implementation [01:24:52] 3.4 Consensus Building and Opinion Mapping System [01:35:29] 3.5 Automation and Financial ApplicationsCORE REFS (more in pinned comment):[00:03:45] PAC (Probably Approximately Correct) Learning framework, Leslie Valianthttps://en.wikipedia.org/wiki/Probably_approximately_correct_learning[00:06:10] Boolean Satisfiability Problem (SAT), Varioushttps://en.wikipedia.org/wiki/Boolean_satisfiability_problem[00:13:55] Knowledge as Justified True Belief (JTB), Matthias Steuphttps://plato.stanford.edu/entries/epistemology/[00:17:50] Wittgenstein's concept of the limits of language, Ludwig Wittgensteinhttps://plato.stanford.edu/entries/wittgenstein/[00:21:25] Boolean algebras, Ohad Osorhttps://tau.net/tau-language-research/[00:26:10] The Halting Problemhttps://plato.stanford.edu/entries/turing-machine/#HaltProb[00:30:25] Alfred Tarski (1901-1983), Mario Gómez-Torrentehttps://plato.stanford.edu/entries/tarski/[00:41:50] DPLLhttps://www.cs.princeton.edu/~zkincaid/courses/fall18/readings/SATHandbook-CDCL.pdf[00:49:50] Tarski's undefinability theorem (1936), Alfred Tarskihttps://plato.stanford.edu/entries/tarski-truth/[00:51:45] Boolean Algebra mathematical foundations, J. Donald Monkhttps://plato.stanford.edu/entries/boolalg-math/[01:02:35] Belief Revision Theory and AGM Postulates, Sven Ove Hanssonhttps://plato.stanford.edu/entries/logic-belief-revision/[01:05:35] Quantifier elimination in atomless boolean algebra, H. Jerome Keislerhttps://people.math.wisc.edu/~hkeisler/random.pdf[01:08:35] Quantifier elimination in Tau language specification, Ohad Asorhttps://tau.ai/Theories-and-Applications-of-Boolean-Algebras-0.29.pdf[01:11:50] Tau Net blockchain platformhttps://tau.net/[01:19:20] Tau blockchain's innovative approach treating blockchain code itself as a contracthttps://tau.net/Whitepaper.pdf

2025-03-12
Link to episode

John Palazza - Vice President of Global Sales @ CentML ( sponsored)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

John Palazza from CentML joins us in this sponsored interview to discuss the critical importance of infrastructure optimization in the age of Large Language Models and Generative AI. We explore how enterprises can transition from the innovation phase to production and scale, highlighting the significance of efficient GPU utilization and cost management. The conversation covers the open-source versus proprietary model debate, the rise of AI agents, and the need for platform independence to avoid vendor lock-in, as well as emerging trends in AI infrastructure and the pivotal role of strategic partnerships.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT:

https://www.dropbox.com/scl/fi/dnjsygrgdgq5ng5fdlfjg/JOHNPALAZZA.pdf?rlkey=hl9wyydi9mj077rbg5acdmo3a&dl=0

John Palazza:

Vice President of Global Sales @ CentML

https://www.linkedin.com/in/john-p-b34655/

TOC:

1. Enterprise AI Organization and Strategy

[00:00:00] 1.1 Organizational Structure and ML Ownership

[00:02:59] 1.2 Infrastructure Efficiency and GPU Utilization

[00:07:59] 1.3 Platform Centralization vs Team Autonomy

[00:11:32] 1.4 Enterprise AI Adoption Strategy and Leadership

2. MLOps Infrastructure and Resource Management

[00:15:08] 2.1 Technology Evolution and Enterprise Integration

[00:19:10] 2.2 Enterprise MLOps Platform Development

[00:22:15] 2.3 AI Interface Evolution and Agent-Based Solutions

[00:25:47] 2.4 CentML's Infrastructure Solutions

[00:30:00] 2.5 Workload Abstraction and Resource Allocation

3. LLM Infrastructure Optimization and Independence

[00:33:10] 3.1 GPU Optimization and Cost Efficiency

[00:36:47] 3.2 AI Efficiency and Innovation Challenges

[00:41:40] 3.3 Cloud Provider Strategy and Infrastructure Control

[00:46:52] 3.4 Platform Independence and Vendor Lock-in

[00:50:53] 3.5 Technical Innovation and Growth Strategy

REFS:

[00:01:25] Apple Acquires GraphLab, Apple Inc.

https://techcrunch.com/2016/08/05/apple-acquires-turi-a-machine-learning-company/

[00:03:50] Bain Tech Report 2024, Gartner

https://www.bain.com/insights/topics/technology-report/

[00:04:50] PaaS vs IaaS Efficiency, Gartner

https://www.gartner.com/en/newsroom/press-releases/2024-11-19-gartner-forecasts-worldwide-public-cloud-end-user-spending-to-total-723-billion-dollars-in-2025

[00:14:55] Fashion Quote, Oscar Wilde

https://www.amazon.com/Complete-Works-Oscar-Wilde-Collins/dp/0007144369

[00:15:30] PointCast Network, PointCast Inc.

https://en.wikipedia.org/wiki/Push_technology

[00:18:05] AI Bain Report, Bain & Company

https://www.bain.com/insights/how-generative-ai-changes-the-game-in-tech-services-tech-report-2024/

[00:20:40] Uber Michelangelo, Uber Engineering Team

https://www.uber.com/en-SE/blog/michelangelo-machine-learning-platform/

[00:20:50] Algorithmia Acquisition, DataRobot

https://www.datarobot.com/newsroom/press/datarobot-is-acquiring-algorithmia-enhancing-leading-mlops-architecture-for-the-enterprise/

[00:22:55] Fine Tuning vs RAG, Heydar Soudani, Evangelos Kanoulas & Faegheh Hasibi.

https://arxiv.org/html/2403.01432v2

[00:24:40] LLM Agent Survey, Lei Wang et al.

https://arxiv.org/abs/2308.11432

[00:26:30] CentML CServe, CentML

https://docs.centml.ai/apps/llm

[00:29:15] CentML Snowflake, Snowflake

https://www.snowflake.com/en/engineering-blog/optimize-llms-with-llama-snowflake-ai-stack/

[00:30:15] NVIDIA H100 GPU, NVIDIA

https://www.nvidia.com/en-us/data-center/h100/

[00:33:25] CentML\'s 60% savings, CentML

https://centml.ai/platform/

2025-03-10
Link to episode

Transformers Need Glasses! - Federico Barbero

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Federico Barbero (DeepMind/Oxford) is the lead author of "Transformers Need Glasses!".

Have you ever wondered why LLMs struggle with seemingly simple tasks like counting or copying long strings of text? We break down the theoretical reasons behind these failures, revealing architectural bottlenecks and the challenges of maintaining information fidelity across extended contexts.

Federico explains how these issues are rooted in the transformer's design, drawing parallels to over-squashing in graph neural networks and detailing how the softmax function limits sharp decision-making.

But it's not all bad news! Discover practical "glasses" that can help transformers see more clearly, from simple input modifications to architectural tweaks.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

https://federicobarbero.com/

TRANSCRIPT + RESEARCH:

https://www.dropbox.com/s/h7ys83ztwktqjje/Federico.pdf?dl=0

TOC:

1. Transformer Limitations: Token Detection & Representation

[00:00:00] 1.1 Transformers fail at single token detection

[00:02:45] 1.2 Representation collapse in transformers

[00:03:21] 1.3 Experiment: LLMs fail at copying last tokens

[00:18:00] 1.4 Attention sharpness limitations in transformers

2. Transformer Limitations: Information Flow & Quantization

[00:18:50] 2.1 Unidirectional information mixing

[00:18:50] 2.2 Unidirectional information flow towards sequence beginning in transformers

[00:21:50] 2.3 Diagonal attention heads as expensive no-ops in LAMA/Gemma

[00:27:14] 2.4 Sequence entropy affects transformer model distinguishability

[00:30:36] 2.5 Quantization limitations lead to information loss & representational collapse

[00:38:34] 2.6 LLMs use subitizing as opposed to counting algorithms

3. Transformers and the Nature of Reasoning

[00:40:30] 3.1 Turing completeness conditions in transformers

[00:43:23] 3.2 Transformers struggle with sequential tasks

[00:45:50] 3.3 Windowed attention as solution to information compression

[00:51:04] 3.4 Chess engines: mechanical computation vs creative reasoning

[01:00:35] 3.5 Epistemic foraging introduced

REFS:

[00:01:05] Transformers Need Glasses!, Barbero et al.

https://proceedings.neurips.cc/paper_files/paper/2024/file/b1d35561c4a4a0e0b6012b2af531e149-Paper-Conference.pdf

[00:05:30] Softmax is Not Enough, Veli?kovi? et al.

https://arxiv.org/abs/2410.01104

[00:11:30] Adv Alg Lecture 15, Chawla

https://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec15.pdf

[00:15:05] Graph Attention Networks, Veli?kovi?

https://arxiv.org/abs/1710.10903

[00:19:15] Extract Training Data, Carlini et al.

https://arxiv.org/pdf/2311.17035

[00:31:30] 1-bit LLMs, Ma et al.

https://arxiv.org/abs/2402.17764

[00:38:35] LLMs Solve Math, Nikankin et al.

https://arxiv.org/html/2410.21272v1

[00:38:45] Subitizing, Railo

https://link.springer.com/10.1007/978-1-4419-1428-6_578

[00:43:25] NN & Chomsky Hierarchy, Delétang et al.

https://arxiv.org/abs/2207.02098

[00:51:05] Measure of Intelligence, Chollet

https://arxiv.org/abs/1911.01547

[00:52:10] AlphaZero, Silver et al.

https://pubmed.ncbi.nlm.nih.gov/30523106/

[00:55:10] Golden Gate Claude, Anthropic

https://www.anthropic.com/news/golden-gate-claude

[00:56:40] Chess Positions, Chase & Simon

https://www.sciencedirect.com/science/article/abs/pii/0010028573900042

[01:00:35] Epistemic Foraging, Friston

https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2016.00056/full

2025-03-08
Link to episode

Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

We speak with Sakana AI, who are building nature-inspired methods that could fundamentally transform how we develop AI systems.

The guests include Chris Lu, a researcher who recently completed his DPhil at Oxford University under Prof. Jakob Foerster's supervision, where he focused on meta-learning and multi-agent systems. Chris is the first author of the DiscoPOP paper, which demonstrates how language models can discover and design better training algorithms. Also joining is Robert Tjarko Lange, a founding member of Sakana AI who specializes in evolutionary algorithms and large language models. Robert leads research at the intersection of evolutionary computation and foundation models, and is completing his PhD at TU Berlin on evolutionary meta-learning. The discussion also features Cong Lu, currently a Research Scientist at Google DeepMind's Open-Endedness team, who previously helped develop The AI Scientist and Intelligent Go-Explore.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

* DiscoPOP - A framework where language models discover their own optimization algorithms

* EvoLLM - Using language models as evolution strategies for optimization

The AI Scientist - A fully automated system that conducts scientific research end-to-end

* Neural Attention Memory Models (NAMMs) - Evolved memory systems that make transformers both faster and more accurate

TRANSCRIPT + REFS:

https://www.dropbox.com/scl/fi/gflcyvnujp8cl7zlv3v9d/Sakana.pdf?rlkey=woaoo82943170jd4yyi2he71c&dl=0

Robert Tjarko Lange

https://roberttlange.com/

Chris Lu

https://chrislu.page/

Cong Lu

https://www.conglu.co.uk/

Sakana

https://sakana.ai/blog/

TOC:

1. LLMs for Algorithm Generation and Optimization

[00:00:00] 1.1 LLMs generating algorithms for training other LLMs

[00:04:00] 1.2 Evolutionary black-box optim using neural network loss parameterization

[00:11:50] 1.3 DiscoPOP: Non-convex loss function for noisy data

[00:20:45] 1.4 External entropy Injection for preventing Model collapse

[00:26:25] 1.5 LLMs for black-box optimization using abstract numerical sequences

2. Model Learning and Generalization

[00:31:05] 2.1 Fine-tuning on teacher algorithm trajectories

[00:31:30] 2.2 Transformers learning gradient descent

[00:33:00] 2.3 LLM tokenization biases towards specific numbers

[00:34:50] 2.4 LLMs as evolution strategies for black box optimization

[00:38:05] 2.5 DiscoPOP: LLMs discovering novel optimization algorithms

3. AI Agents and System Architectures

[00:51:30] 3.1 ARC challenge: Induction vs. transformer approaches

[00:54:35] 3.2 LangChain / modular agent components

[00:57:50] 3.3 Debate improves LLM truthfulness

[01:00:55] 3.4 Time limits controlling AI agent systems

[01:03:00] 3.5 Gemini: Million-token context enables flatter hierarchies

[01:04:05] 3.6 Agents follow own interest gradients

[01:09:50] 3.7 Go-Explore algorithm: archive-based exploration

[01:11:05] 3.8 Foundation models for interesting state discovery

[01:13:00] 3.9 LLMs leverage prior game knowledge

4. AI for Scientific Discovery and Human Alignment

[01:17:45] 4.1 Encoding Alignment & Aesthetics via Reward Functions

[01:20:00] 4.2 AI Scientist: Automated Open-Ended Scientific Discovery

[01:24:15] 4.3 DiscoPOP: LLM for Preference Optimization Algorithms

[01:28:30] 4.4 Balancing AI Knowledge with Human Understanding

[01:33:55] 4.5 AI-Driven Conferences and Paper Review

2025-03-01
Link to episode

Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning?

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Clement Bonnet discusses his novel approach to the ARC (Abstraction and Reasoning Corpus) challenge. Unlike approaches that rely on fine-tuning LLMs or generating samples at inference time, Clement's method encodes input-output pairs into a latent space, optimizes this representation with a search algorithm, and decodes outputs for new inputs. This end-to-end architecture uses a VAE loss, including reconstruction and prior losses.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT + RESEARCH OVERVIEW:

https://www.dropbox.com/scl/fi/j7m0gaz1126y594gswtma/CLEMMLST.pdf?rlkey=y5qvwq2er5nchbcibm07rcfpq&dl=0

Clem and Matthew-

https://www.linkedin.com/in/clement-bonnet16/

https://github.com/clement-bonnet

https://mvmacfarlane.github.io/

TOC

1. LPN Fundamentals

[00:00:00] 1.1 Introduction to ARC Benchmark and LPN Overview

[00:05:05] 1.2 Neural Networks' Challenges with ARC and Program Synthesis

[00:06:55] 1.3 Induction vs Transduction in Machine Learning

2. LPN Architecture and Latent Space

[00:11:50] 2.1 LPN Architecture and Latent Space Implementation

[00:16:25] 2.2 LPN Latent Space Encoding and VAE Architecture

[00:20:25] 2.3 Gradient-Based Search Training Strategy

[00:23:39] 2.4 LPN Model Architecture and Implementation Details

3. Implementation and Scaling

[00:27:34] 3.1 Training Data Generation and re-ARC Framework

[00:31:28] 3.2 Limitations of Latent Space and Multi-Thread Search

[00:34:43] 3.3 Program Composition and Computational Graph Architecture

4. Advanced Concepts and Future Directions

[00:45:09] 4.1 AI Creativity and Program Synthesis Approaches

[00:49:47] 4.2 Scaling and Interpretability in Latent Space Models

REFS

[00:00:05] ARC benchmark, Chollet

https://arxiv.org/abs/2412.04604

[00:02:10] Latent Program Spaces, Bonnet, Macfarlane

https://arxiv.org/abs/2411.08706

[00:07:45] Kevin Ellis work on program generation

https://www.cs.cornell.edu/~ellisk/

[00:08:45] Induction vs transduction in abstract reasoning, Li et al.

https://arxiv.org/abs/2411.02272

[00:17:40] VAEs, Kingma, Welling

https://arxiv.org/abs/1312.6114

[00:27:50] re-ARC, Hodel

https://github.com/michaelhodel/re-arc

[00:29:40] Grid size in ARC tasks, Chollet

https://github.com/fchollet/ARC-AGI

[00:33:00] Critique of deep learning, Marcus

https://arxiv.org/vc/arxiv/papers/2002/2002.06177v1.pdf

2025-02-19
Link to episode

Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Prof. Jakob Foerster, a leading AI researcher at Oxford University and Meta, and Chris Lu, a researcher at OpenAI -- they explain how AI is moving beyond just mimicking human behaviour to creating truly intelligent agents that can learn and solve problems on their own. Foerster champions open-source AI for responsible, decentralised development. He addresses AI scaling, goal misalignment (Goodhart's Law), and the need for holistic alignment, offering a quick look at the future of AI and how to guide it.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT/REFS:

https://www.dropbox.com/scl/fi/yqjszhntfr00bhjh6t565/JAKOB.pdf?rlkey=scvny4bnwj8th42fjv8zsfu2y&dl=0

Prof. Jakob Foerster

https://x.com/j_foerst

https://www.jakobfoerster.com/

University of Oxford Profile:

https://eng.ox.ac.uk/people/jakob-foerster/

Chris Lu:

https://chrislu.page/

TOC

1. GPU Acceleration and Training Infrastructure

[00:00:00] 1.1 ARC Challenge Criticism and FLAIR Lab Overview

[00:01:25] 1.2 GPU Acceleration and Hardware Lottery in RL

[00:05:50] 1.3 Data Wall Challenges and Simulation-Based Solutions

[00:08:40] 1.4 JAX Implementation and Technical Acceleration

2. Learning Frameworks and Policy Optimization

[00:14:18] 2.1 Evolution of RL Algorithms and Mirror Learning Framework

[00:15:25] 2.2 Meta-Learning and Policy Optimization Algorithms

[00:21:47] 2.3 Language Models and Benchmark Challenges

[00:28:15] 2.4 Creativity and Meta-Learning in AI Systems

3. Multi-Agent Systems and Decentralization

[00:31:24] 3.1 Multi-Agent Systems and Emergent Intelligence

[00:38:35] 3.2 Swarm Intelligence vs Monolithic AGI Systems

[00:42:44] 3.3 Democratic Control and Decentralization of AI Development

[00:46:14] 3.4 Open Source AI and Alignment Challenges

[00:49:31] 3.5 Collaborative Models for AI Development

REFS

[[00:00:05] ARC Benchmark, Chollet

https://github.com/fchollet/ARC-AGI

[00:03:05] DRL Doesn't Work, Irpan

https://www.alexirpan.com/2018/02/14/rl-hard.html

[00:05:55] AI Training Data, Data Provenance Initiative

https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html

[00:06:10] JaxMARL, Foerster et al.

https://arxiv.org/html/2311.10090v5

[00:08:50] M-FOS, Lu et al.

https://arxiv.org/abs/2205.01447

[00:09:45] JAX Library, Google Research

https://github.com/jax-ml/jax

[00:12:10] Kinetix, Mike and Michael

https://arxiv.org/abs/2410.23208

[00:12:45] Genie 2, DeepMind

https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/

[00:14:42] Mirror Learning, Grudzien, Kuba et al.

https://arxiv.org/abs/2208.01682

[00:16:30] Discovered Policy Optimisation, Lu et al.

https://arxiv.org/abs/2210.05639

[00:24:10] Goodhart's Law, Goodhart

https://en.wikipedia.org/wiki/Goodhart%27s_law

[00:25:15] LLM ARChitect, Franzen et al.

https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

[00:28:55] AlphaGo, Silver et al.

https://arxiv.org/pdf/1712.01815.pdf

[00:30:10] Meta-learning, Lu, Towers, Foerster

https://direct.mit.edu/isal/proceedings-pdf/isal2023/35/67/2354943/isal_a_00674.pdf

[00:31:30] Emergence of Pragmatics, Yuan et al.

https://arxiv.org/abs/2001.07752

[00:34:30] AI Safety, Amodei et al.

https://arxiv.org/abs/1606.06565

[00:35:45] Intentional Stance, Dennett

https://plato.stanford.edu/entries/ethics-ai/

[00:39:25] Multi-Agent RL, Zhou et al.

https://arxiv.org/pdf/2305.10091

[00:41:00] Open Source Generative AI, Foerster et al.

https://arxiv.org/abs/2405.08597

<trunc, see PDF/YT>

2025-02-18
Link to episode

Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Daniel Franzen and Jan Disselhoff, the "ARChitects" are the official winners of the ARC Prize 2024. Filmed at Tufa Labs in Zurich - they revealed how they achieved a remarkable 53.5% accuracy by creatively utilising large language models (LLMs) in new ways. Discover their innovative techniques, including depth-first search for token selection, test-time training, and a novel augmentation-based validation system. Their results were extremely surprising.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

Jan Disselhoff

https://www.linkedin.com/in/jan-disselhoff-1423a2240/

Daniel Franzen

https://github.com/da-fr

ARC Prize: http://arcprize.org/

TRANSCRIPT AND BACKGROUND READING:

https://www.dropbox.com/scl/fi/utkn2i1ma79fn6an4yvjw/ARCHitects.pdf?rlkey=67pe38mtss7oyhjk2ad0d2aza&dl=0

TOC

1. Solution Architecture and Strategy Overview

[00:00:00] 1.1 Initial Solution Overview and Model Architecture

[00:04:25] 1.2 LLM Capabilities and Dataset Approach

[00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies

[00:14:08] 1.4 Sampling Methods and Search Implementation

[00:17:52] 1.5 ARC vs Language Model Context Comparison

2. LLM Search and Model Implementation

[00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation

[00:27:04] 2.2 Symmetry Augmentation and Model Architecture

[00:30:11] 2.3 Model Intelligence Characteristics and Performance

[00:37:23] 2.4 Tokenization and Numerical Processing Challenges

3. Advanced Training and Optimization

[00:45:15] 3.1 DFS Token Selection and Probability Thresholds

[00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs

[00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention

[00:56:10] 3.4 Training Infrastructure and Optimization Experiments

[01:02:34] 3.5 Search Tree Analysis and Entropy Distribution Patterns

REFS

[00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmann

https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

[00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchell

https://arxiv.org/html/2411.14215

[00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodel

https://github.com/michaelhodel/re-arc

[00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al.

https://arxiv.org/html/2408.00724v2

[00:16:55] Language model reachability space exploration, University of Toronto

https://www.youtube.com/watch?v=Bpgloy1dDn0

[00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatt

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

[00:41:20] GPT tokenization approach for numbers, OpenAI

https://platform.openai.com/docs/guides/text-generation/tokenizer-examples

[00:46:25] DFS in AI search strategies, Russell & Norvig

https://www.amazon.com/Artificial-Intelligence-Modern-Approach-4th/dp/0134610997

[00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al.

https://www.pnas.org/doi/10.1073/pnas.1611835114

[00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al.

https://arxiv.org/abs/2106.09685

[00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIA

https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/

[01:04:55] Original MCTS in computer Go, Yifan Jin

https://stanford.edu/~rezab/classes/cme323/S15/projects/montecarlo_search_tree_report.pdf

2025-02-12
Link to episode

Sepp Hochreiter - LSTM: The Comeback Story?

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks ? a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT AND BACKGROUND READING:

https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0

Prof. Sepp Hochreiter

https://www.nx-ai.com/

https://x.com/hochreitersepp

https://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=en

TOC:

1. LLM Evolution and Reasoning Capabilities

[00:00:00] 1.1 LLM Capabilities and Limitations Debate

[00:03:16] 1.2 Program Generation and Reasoning in AI Systems

[00:06:30] 1.3 Human vs AI Reasoning Comparison

[00:09:59] 1.4 New Research Initiatives and Hybrid Approaches

2. LSTM Technical Architecture

[00:13:18] 2.1 LSTM Development History and Technical Background

[00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity

[00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison

[00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential

3. Industrial Applications and Neuro-Symbolic AI

[00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages

[00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project

[00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches

[00:51:29] 3.4 Evolution of AI Paradigms and System Thinking

[00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison

[00:58:12] 3.6 NXAI Company and Industrial AI Applications

REFS:

[00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber)

https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory

[00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov)

https://link.springer.com/article/10.1007/BF02478259

[00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors)

https://www.arxiv.org/pdf/2502.03671

[00:09:05] AlphaGo?s Move 37 demonstrating creative AI (Google DeepMind)

https://deepmind.google/research/breakthroughs/alphago/

[00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier)

https://tufalabs.ai

[00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.)

https://arxiv.org/abs/2405.04517

[00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.)

https://arxiv.org/abs/2205.14135

[00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey)

https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx

[00:36:10] Mamba 2 state space model architecture (Albert Gu et al.)

https://arxiv.org/abs/2312.00752

[00:46:00] Austria?s Pi AI project integrating symbolic & neural AI (Hochreiter et al.)

https://www.jku.at/en/institute-of-machine-learning/research/projects/

[00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.)

https://openreview.net/forum?id=7PGluppo4k

[00:49:30] JKU Linz?s historical and neuro-symbolic research (Sepp Hochreiter)

https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/

YT: https://www.youtube.com/watch?v=8u2pW2zZLCs

<truncated, see show notes/YT>

2025-02-12
Link to episode

Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/

***

Randall Balestriero

https://x.com/randall_balestr

https://randallbalestriero.github.io/

Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge4upq5gy0ug75j4a/RANDALLSHOW.pdf?rlkey=nbemgpa0jhawt1e86rx7372e4&dl=0

TOC:

- Introduction

- 00:00:00: Introduction

- Neural Network Geometry and Spline Theory

- 00:01:41: Neural Network Geometry and Spline Theory

- 00:07:41: Deep Networks Always Grok

- 00:11:39: Grokking and Adversarial Robustness

- 00:16:09: Double Descent and Catastrophic Forgetting

- Reconstruction Learning

- 00:18:49: Reconstruction Learning

- 00:24:15: Frequency Bias in Neural Networks

- Geometric Analysis of Neural Networks

- 00:29:02: Geometric Analysis of Neural Networks

- 00:34:41: Adversarial Examples and Region Concentration

- LLM Safety and Geometric Analysis

- 00:40:05: LLM Safety and Geometric Analysis

- 00:46:11: Toxicity Detection in LLMs

- 00:52:24: Intrinsic Dimensionality and Model Control

- 00:58:07: RLHF and High-Dimensional Spaces

- Conclusion

- 01:02:13: Neural Tangent Kernel

- 01:08:07: Conclusion

REFS:

[00:01:35] Humayun ? Deep network geometry & input space partitioning

https://arxiv.org/html/2408.04809v1

[00:03:55] Balestriero & Paris ? Linking deep networks to adaptive spline operators

https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf

[00:13:55] Song et al. ? Gradient-based white-box adversarial attacks

https://arxiv.org/abs/2012.14965

[00:16:05] Humayun, Balestriero & Baraniuk ? Grokking phenomenon & emergent robustness

https://arxiv.org/abs/2402.15555

[00:18:25] Humayun ? Training dynamics & double descent via linear region evolution

https://arxiv.org/abs/2310.12977

[00:20:15] Balestriero ? Power diagram partitions in DNN decision boundaries

https://arxiv.org/abs/1905.08443

[00:23:00] Frankle & Carbin ? Lottery Ticket Hypothesis for network pruning

https://arxiv.org/abs/1803.03635

[00:24:00] Belkin et al. ? Double descent phenomenon in modern ML

https://arxiv.org/abs/1812.11118

[00:25:55] Balestriero et al. ? Batch normalization?s regularization effects

https://arxiv.org/pdf/2209.14778

[00:29:35] EU ? EU AI Act 2024 with compute restrictions

https://www.lw.com/admin/upload/SiteAttachments/EU-AI-Act-Navigating-a-Brave-New-World.pdf

[00:39:30] Humayun, Balestriero & Baraniuk ? SplineCam: Visualizing deep network geometry

https://openaccess.thecvf.com/content/CVPR2023/papers/Humayun_SplineCam_Exact_Visualization_and_Characterization_of_Deep_Network_Geometry_and_CVPR_2023_paper.pdf

[00:40:40] Carlini ? Trade-offs between adversarial robustness and accuracy

https://arxiv.org/pdf/2407.20099

[00:44:55] Balestriero & LeCun ? Limitations of reconstruction-based learning methods

https://openreview.net/forum?id=ez7w0Ss4g9

(truncated, see shownotes PDF)

2025-02-08
Link to episode

Nicholas Carlini (Google DeepMind)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/

***

Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0

TOC:

1. ML Security Fundamentals

[00:00:00] 1.1 ML Model Reasoning and Security Fundamentals

[00:03:04] 1.2 ML Security Vulnerabilities and System Design

[00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior

[00:13:20] 1.4 Model Training, RLHF, and Calibration Effects

2. Model Evaluation and Research Methods

[00:19:40] 2.1 Model Reasoning and Evaluation Metrics

[00:24:37] 2.2 Security Research Philosophy and Methodology

[00:27:50] 2.3 Security Disclosure Norms and Community Differences

3. LLM Applications and Best Practices

[00:44:29] 3.1 Practical LLM Applications and Productivity Gains

[00:49:51] 3.2 Effective LLM Usage and Prompting Strategies

[00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code

4. Advanced LLM Research and Architecture

[00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience

[01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges

[01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction

REFS:

[00:01:15] Nicholas Carlini?s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/

[00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/

[00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644

[00:05:20] Computer Fraud and Abuse Act (CFAA) ? primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud

[00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html

[00:16:10] Paper: ?Self-Play Preference Optimization for Language Model Alignment? (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675

[00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774

[00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation

[00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241

[00:27:40] Nicholas Carlini?s essay ?Why I Attack? (June 2024) ? motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html

[00:34:05] Google Project Zero?s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html

[00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878

[01:04:05] Rust?s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html

[01:10:05] Paper: ?Stealing Part of a Production Language Model? (Carlini et al., March 2024) ? extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634

[01:10:55] First model stealing paper (Tramèr et al., 2016) ? attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943

2025-01-25
Link to episode

Subbarao Kambhampati - Do o1 models search?

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems.

* How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see

* The evolution from traditional Large Language Models to more sophisticated reasoning systems

* The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably

* Why O1's improved performance comes with substantial computational costs

* The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google)

* The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/

***

TOC:

1. **O1 Architecture and Reasoning Foundations**

[00:00:00] 1.1 Fractal Intelligence and Reasoning Model Limitations

[00:04:28] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning

[00:14:28] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach

[00:23:18] 1.4 Empirical Evaluation of O1's Planning Capabilities

2. **Monte Carlo Methods and Model Deep-Dive**

[00:29:30] 2.1 Monte Carlo Methods and MARCO-O1 Implementation

[00:31:30] 2.2 Reasoning vs. Retrieval in LLM Systems

[00:40:40] 2.3 Fractal Intelligence Capabilities and Limitations

[00:45:59] 2.4 Mechanistic Interpretability of Model Behavior

[00:51:41] 2.5 O1 Response Patterns and Performance Analysis

3. **System Design and Real-World Applications**

[00:59:30] 3.1 Evolution from LLMs to Language Reasoning Models

[01:06:48] 3.2 Cost-Efficiency Analysis: LLMs vs O1

[01:11:28] 3.3 Autonomous vs Human-in-the-Loop Systems

[01:16:01] 3.4 Program Generation and Fine-Tuning Approaches

[01:26:08] 3.5 Hybrid Architecture Implementation Strategies

Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0

REFS:

[00:02:00] Monty Python (1975)

Witch trial scene: flawed logical reasoning.

https://www.youtube.com/watch?v=zrzMhU_4m-g

[00:04:00] Cade Metz (2024)

Microsoft?OpenAI partnership evolution and control dynamics.

https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html

[00:07:25] Kojima et al. (2022)

Zero-shot chain-of-thought prompting ('Let's think step by step').

https://arxiv.org/pdf/2205.11916

[00:12:50] DeepMind Research Team (2023)

Multi-bot game solving with external and internal planning.

https://deepmind.google/research/publications/139455/

[00:15:10] Silver et al. (2016)

AlphaGo's Monte Carlo Tree Search and Q-learning.

https://www.nature.com/articles/nature16961

[00:16:30] Kambhampati, S. et al. (2023)

Evaluates O1's planning in "Strawberry Fields" benchmarks.

https://arxiv.org/pdf/2410.02162

[00:29:30] Alibaba AIDC-AI Team (2023)

MARCO-O1: Chain-of-Thought + MCTS for improved reasoning.

https://arxiv.org/html/2411.14405

[00:31:30] Kambhampati, S. (2024)

Explores LLM "reasoning vs retrieval" debate.

https://arxiv.org/html/2403.04121v2

[00:37:35] Wei, J. et al. (2022)

Chain-of-thought prompting (introduces last-letter concatenation).

https://arxiv.org/pdf/2201.11903

[00:42:35] Barbero, F. et al. (2024)

Transformer attention and "information over-squashing."

https://arxiv.org/html/2406.04267v2

[00:46:05] Ruis, L. et al. (2023)

Influence functions to understand procedural knowledge in LLMs.

https://arxiv.org/html/2411.12580v1

(truncated - continued in shownotes/transcript doc)

2025-01-23
Link to episode

How Do AI Models Actually Think? - Laura Ruis

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Laura Ruis, a PhD student at University College London and researcher at Cohere, explains her groundbreaking research into how large language models (LLMs) perform reasoning tasks, the fundamental mechanisms underlying LLM reasoning capabilities, and whether these models primarily rely on retrieval or develop procedural knowledge.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/

***

TOC

1. LLM Foundations and Learning

1.1 Scale and Learning in Language Models [00:00:00]

1.2 Procedural Knowledge vs Fact Retrieval [00:03:40]

1.3 Influence Functions and Model Analysis [00:07:40]

1.4 Role of Code in LLM Reasoning [00:11:10]

1.5 Semantic Understanding and Physical Grounding [00:19:30]

2. Reasoning Architectures and Measurement

2.1 Measuring Understanding and Reasoning in Language Models [00:23:10]

2.2 Formal vs Approximate Reasoning and Model Creativity [00:26:40]

2.3 Symbolic vs Subsymbolic Computation Debate [00:34:10]

2.4 Neural Network Architectures and Tensor Product Representations [00:40:50]

3. AI Agency and Risk Assessment

3.1 Agency and Goal-Directed Behavior in Language Models [00:45:10]

3.2 Defining and Measuring Agency in AI Systems [00:49:50]

3.3 Core Knowledge Systems and Agency Detection [00:54:40]

3.4 Language Models as Agent Models and Simulator Theory [01:03:20]

3.5 AI Safety and Societal Control Mechanisms [01:07:10]

3.6 Evolution of AI Capabilities and Emergent Risks [01:14:20]

REFS:

[00:01:10] Procedural Knowledge in Pretraining & LLM Reasoning

Ruis et al., 2024

https://arxiv.org/abs/2411.12580

[00:03:50] EK-FAC Influence Functions in Large LMs

Grosse et al., 2023

https://arxiv.org/abs/2308.03296

[00:13:05] Surfaces and Essences: Analogy as the Core of Cognition

Hofstadter & Sander

https://www.amazon.com/Surfaces-Essences-Analogy-Fuel-Thinking/dp/0465018475

[00:13:45] Wittgenstein on Language Games

https://plato.stanford.edu/entries/wittgenstein/

[00:14:30] Montague Semantics for Natural Language

https://plato.stanford.edu/entries/montague-semantics/

[00:19:35] The Chinese Room Argument

David Cole

https://plato.stanford.edu/entries/chinese-room/

[00:19:55] ARC: Abstraction and Reasoning Corpus

François Chollet

https://arxiv.org/abs/1911.01547

[00:24:20] Systematic Generalization in Neural Nets

Lake & Baroni, 2023

https://www.nature.com/articles/s41586-023-06668-3

[00:27:40] Open-Endedness & Creativity in AI

Tim Rocktäschel

https://arxiv.org/html/2406.04268v1

[00:30:50] Fodor & Pylyshyn on Connectionism

https://www.sciencedirect.com/science/article/abs/pii/0010027788900315

[00:31:30] Tensor Product Representations

Smolensky, 1990

https://www.sciencedirect.com/science/article/abs/pii/000437029090007M

[00:35:50] DreamCoder: Wake-Sleep Program Synthesis

Kevin Ellis et al.

https://courses.cs.washington.edu/courses/cse599j1/22sp/papers/dreamcoder.pdf

[00:36:30] Compositional Generalization Benchmarks

Ruis, Lake et al., 2022

https://arxiv.org/pdf/2202.10745

[00:40:30] RNNs & Tensor Products

McCoy et al., 2018

https://arxiv.org/abs/1812.08718

[00:46:10] Formal Causal Definition of Agency

Kenton et al.

https://arxiv.org/pdf/2208.08345v2

[00:48:40] Agency in Language Models

Sumers et al.

https://arxiv.org/abs/2309.02427

[00:55:20] Heider & Simmel?s Moving Shapes Experiment

https://www.nature.com/articles/s41598-024-65532-0

[01:00:40] Language Models as Agent Models

Jacob Andreas, 2022

https://arxiv.org/abs/2212.01681

[01:13:35] Pragmatic Understanding in LLMs

Ruis et al.

https://arxiv.org/abs/2210.14986

2025-01-20
Link to episode

Jurgen Schmidhuber on Humans co-existing with AIs

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Jürgen Schmidhuber, the father of generative AI, challenges current AI narratives, revealing that early deep learning work is in his opinion misattributed, where it actually originated in Ukraine and Japan. He discusses his early work on linear transformers and artificial curiosity which preceded modern developments, shares his expansive vision of AI colonising space, and explains his groundbreaking 1991 consciousness model. Schmidhuber dismisses fears of human-AI conflict, arguing that superintelligent AI scientists will be fascinated by their own origins and motivated to protect life rather than harm it, while being more interested in other superintelligent AI and in cosmic expansion than earthly matters. He offers unique insights into how humans and AI might coexist. This was the long-awaited second, unreleased part of our interview we filmed last time. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Interviewer: Tim Scarfe TOC [00:00:00] The Nature and Motivations of AI [00:02:08] Influential Inventions: 20th vs. 21st Century [00:05:28] Transformer and GPT: A Reflection The revolutionary impact of modern language models, the 1991 linear transformer, linear vs. quadratic scaling, the fast weight controller, and fast weight matrix memory. [00:11:03] Pioneering Contributions to AI and Deep Learning The invention of the transformer, pre-trained networks, the first GANs, the role of predictive coding, and the emergence of artificial curiosity. [00:13:58] AI's Evolution and Achievements The role of compute, breakthroughs in handwriting recognition and computer vision, the rise of GPU-based CNNs, achieving superhuman results, and Japanese contributions to CNN development. [00:15:40] The Hardware Lottery and GPUs GPUs as a serendipitous advantage for AI, the gaming-AI parallel, and Nvidia's strategic shift towards AI. [00:19:58] AI Applications and Societal Impact AI-powered translation breaking communication barriers, AI in medicine for imaging and disease prediction, and AI's potential for human enhancement and sustainable development. [00:23:26] The Path to AGI and Current Limitations Distinguishing large language models from AGI, challenges in replacing physical world workers, and AI's difficulty in real-world versus board games. [00:25:56] AI and Consciousness Simulating consciousness through unsupervised learning, chunking and automatizing neural networks, data compression, and self-symbols in predictive world models. [00:30:50] The Future of AI and Humanity Transition from AGIs as tools to AGIs with their own goals, the role of humans in an AGI-dominated world, and the concept of Homo Ludens. [00:38:05] The AI Race: Europe, China, and the US Europe's historical contributions, current dominance of the US and East Asia, and the role of venture capital and industrial policy. [00:50:32] Addressing AI Existential Risk The obsession with AI existential risk, commercial pressure for friendly AIs, AI vs. hydrogen bombs, and the long-term future of AI. [00:58:00] The Fermi Paradox and Extraterrestrial Intelligence Expanding AI bubbles as an explanation for the Fermi paradox, dark matter and encrypted civilizations, and Earth as the first to spawn an AI bubble. [01:02:08] The Diversity of AI and AI Ecologies The unrealism of a monolithic super intelligence, diverse AIs with varying goals, and intense competition and collaboration in AI ecologies. [01:12:21] Final Thoughts and Closing Remarks REFERENCES: See pinned comment on YT: https://youtu.be/fZYUqICYCAk

2025-01-16
Link to episode

Yoshua Bengio - Designing out Agency for Safe AI

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Professor Yoshua Bengio is a pioneer in deep learning and Turing Award winner. Bengio talks about AI safety, why goal-seeking ?agentic? AIs might be dangerous, and his vision for building powerful AI tools without giving them agency. Topics include reward tampering risks, instrumental convergence, global AI governance, and how non-agent AIs could revolutionize science and medicine while reducing existential threats. Perfect for anyone curious about advanced AI risks and how to manage them responsibly.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

They are hosting an event in Zurich on January 9th with the ARChitects, join if you can.

Goto https://tufalabs.ai/

***

Interviewer: Tim Scarfe

Yoshua Bengio:

https://x.com/Yoshua_Bengio

https://scholar.google.com/citations?user=kukA0LcAAAAJ&hl=en

https://yoshuabengio.org/

https://en.wikipedia.org/wiki/Yoshua_Bengio

TOC:

1. AI Safety Fundamentals

[00:00:00] 1.1 AI Safety Risks and International Cooperation

[00:03:20] 1.2 Fundamental Principles vs Scaling in AI Development

[00:11:25] 1.3 System 1/2 Thinking and AI Reasoning Capabilities

[00:15:15] 1.4 Reward Tampering and AI Agency Risks

[00:25:17] 1.5 Alignment Challenges and Instrumental Convergence

2. AI Architecture and Safety Design

[00:33:10] 2.1 Instrumental Goals and AI Safety Fundamentals

[00:35:02] 2.2 Separating Intelligence from Goals in AI Systems

[00:40:40] 2.3 Non-Agent AI as Scientific Tools

[00:44:25] 2.4 Oracle AI Systems and Mathematical Safety Frameworks

3. Global Governance and Security

[00:49:50] 3.1 International AI Competition and Hardware Governance

[00:51:58] 3.2 Military and Security Implications of AI Development

[00:56:07] 3.3 Personal Evolution of AI Safety Perspectives

[01:00:25] 3.4 AI Development Scaling and Global Governance Challenges

[01:12:10] 3.5 AI Regulation and Corporate Oversight

4. Technical Innovations

[01:23:00] 4.1 Evolution of Neural Architectures: From RNNs to Transformers

[01:26:02] 4.2 GFlowNets and Symbolic Computation

[01:30:47] 4.3 Neural Dynamics and Consciousness

[01:34:38] 4.4 AI Creativity and Scientific Discovery

SHOWNOTES (Transcript, references, best clips etc):

https://www.dropbox.com/scl/fi/ajucigli8n90fbxv9h94x/BENGIO_SHOW.pdf?rlkey=38hi2m19sylnr8orb76b85wkw&dl=0

CORE REFS (full list in shownotes and pinned comment):

[00:00:15] Bengio et al.: "AI Risk" Statement

https://www.safe.ai/work/statement-on-ai-risk

[00:23:10] Bengio on reward tampering & AI safety (Harvard Data Science Review)

https://hdsr.mitpress.mit.edu/pub/w974bwb0

[00:40:45] Munk Debate on AI existential risk, featuring Bengio

https://munkdebates.com/debates/artificial-intelligence

[00:44:30] "Can a Bayesian Oracle Prevent Harm from an Agent?" (Bengio et al.) on oracle-to-agent safety

https://arxiv.org/abs/2408.05284

[00:51:20] Bengio (2024) memo on hardware-based AI governance verification

https://yoshuabengio.org/wp-content/uploads/2024/08/FlexHEG-Memo_August-2024.pdf

[01:12:55] Bengio?s involvement in EU AI Act code of practice

https://digital-strategy.ec.europa.eu/en/news/meet-chairs-leading-development-first-general-purpose-ai-code-practice

[01:27:05] Complexity-based compositionality theory (Elmoznino, Jiralerspong, Bengio, Lajoie)

https://arxiv.org/abs/2410.14817

[01:29:00] GFlowNet Foundations (Bengio et al.) for probabilistic inference

https://arxiv.org/pdf/2111.09266

[01:32:10] Discrete attractor states in neural systems (Nam, Elmoznino, Bengio, Lajoie)

https://arxiv.org/pdf/2302.06403

2025-01-15
Link to episode

Francois Chollet - ARC reflections - NeurIPS 2024

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

They are hosting an event in Zurich on January 9th with the ARChitects, join if you can.

Goto https://tufalabs.ai/

***

Read about the recent result on o3 with ARC here (Chollet knew about it at the time of the interview but wasn't allowed to say):

https://arcprize.org/blog/oai-o3-pub-breakthrough

TOC:

1. Introduction and Opening

[00:00:00] 1.1 Deep Learning vs. Symbolic Reasoning: François?s Long-Standing Hybrid View

[00:00:48] 1.2 ?Why Do They Call You a Symbolist?? ? Addressing Misconceptions

[00:01:31] 1.3 Defining Reasoning

3. ARC Competition 2024 Results and Evolution

[00:07:26] 3.1 ARC Prize 2024: Reflecting on the Narrative Shift Toward System 2

[00:10:29] 3.2 Comparing Private Leaderboard vs. Public Leaderboard Solutions

[00:13:17] 3.3 Two Winning Approaches: Deep Learning?Guided Program Synthesis and Test-Time Training

4. Transduction vs. Induction in ARC

[00:16:04] 4.1 Test-Time Training, Overfitting Concerns, and Developer-Aware Generalization

[00:19:35] 4.2 Gradient Descent Adaptation vs. Discrete Program Search

5. ARC-2 Development and Future Directions

[00:23:51] 5.1 Ensemble Methods, Benchmark Flaws, and the Need for ARC-2

[00:25:35] 5.2 Human-Level Performance Metrics and Private Test Sets

[00:29:44] 5.3 Task Diversity, Redundancy Issues, and Expanded Evaluation Methodology

6. Program Synthesis Approaches

[00:30:18] 6.1 Induction vs. Transduction

[00:32:11] 6.2 Challenges of Writing Algorithms for Perceptual vs. Algorithmic Tasks

[00:34:23] 6.3 Combining Induction and Transduction

[00:37:05] 6.4 Multi-View Insight and Overfitting Regulation

7. Latent Space and Graph-Based Synthesis

[00:38:17] 7.1 Clément Bonnet?s Latent Program Search Approach

[00:40:10] 7.2 Decoding to Symbolic Form and Local Discrete Search

[00:41:15] 7.3 Graph of Operators vs. Token-by-Token Code Generation

[00:45:50] 7.4 Iterative Program Graph Modifications and Reusable Functions

8. Compute Efficiency and Lifelong Learning

[00:48:05] 8.1 Symbolic Process for Architecture Generation

[00:50:33] 8.2 Logarithmic Relationship of Compute and Accuracy

[00:52:20] 8.3 Learning New Building Blocks for Future Tasks

9. AI Reasoning and Future Development

[00:53:15] 9.1 Consciousness as a Self-Consistency Mechanism in Iterative Reasoning

[00:56:30] 9.2 Reconciling Symbolic and Connectionist Views

[01:00:13] 9.3 System 2 Reasoning - Awareness and Consistency

[01:03:05] 9.4 Novel Problem Solving, Abstraction, and Reusability

10. Program Synthesis and Research Lab

[01:05:53] 10.1 François Leaving Google to Focus on Program Synthesis

[01:09:55] 10.2 Democratizing Programming and Natural Language Instruction

11. Frontier Models and O1 Architecture

[01:14:38] 11.1 Search-Based Chain of Thought vs. Standard Forward Pass

[01:16:55] 11.2 o1?s Natural Language Program Generation and Test-Time Compute Scaling

[01:19:35] 11.3 Logarithmic Gains with Deeper Search

12. ARC Evaluation and Human Intelligence

[01:22:55] 12.1 LLMs as Guessing Machines and Agent Reliability Issues

[01:25:02] 12.2 ARC-2 Human Testing and Correlation with g-Factor

[01:26:16] 12.3 Closing Remarks and Future Directions

SHOWNOTES PDF:

https://www.dropbox.com/scl/fi/ujaai0ewpdnsosc5mc30k/CholletNeurips.pdf?rlkey=s68dp432vefpj2z0dp5wmzqz6&st=hazphyx5&dl=0

2025-01-09
Link to episode

Jeff Clune - Agent AI Needs Darwin

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

AI professor Jeff Clune ruminates on open-ended evolutionary algorithms?systems designed to generate novel and interesting outcomes forever. Drawing inspiration from nature?s boundless creativity, Clune and his collaborators aim to build ?Darwin Complete? search spaces, where any computable environment can be simulated. By harnessing the power of large language models and reinforcement learning, these AI agents continuously develop new skills, explore uncharted domains, and even cooperate with one another in complex tasks.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

They are hosting an event in Zurich on January 9th with the ARChitects, join if you can.

Goto https://tufalabs.ai/

***

A central theme throughout Clune?s work is ?interestingness?: an elusive quality that nudges AI agents toward genuinely original discoveries. Rather than rely on narrowly defined metrics?which often fail due to Goodhart?s Law?Clune employs language models to serve as proxies for human judgment. In doing so, he ensures that ?interesting? always reflects authentic novelty, opening the door to unending innovation.

Yet with these extraordinary possibilities come equally significant risks. Clune says we need AI safety measures?particularly as the technology matures into powerful, open-ended forms. Potential pitfalls include agents inadvertently causing harm or malicious actors subverting AI?s capabilities for destructive ends. To mitigate this, Clune advocates for prudent governance involving democratic coalitions, regulation of cutting-edge models, and global alignment protocols.

Jeff Clune:

https://x.com/jeffclune

http://jeffclune.com/

(Interviewer: Tim Scarfe)

TOC:

1. Introduction

[00:00:00] 1.1 Overview and Opening Thoughts

2. Sponsorship

[00:03:00] 2.1 TufaAI Labs and CentML

3. Evolutionary AI Foundations

[00:04:12] 3.1 Open-Ended Algorithm Development and Abstraction Approaches

[00:07:56] 3.2 Novel Intelligence Forms and Serendipitous Discovery

[00:11:46] 3.3 Frontier Models and the 'Interestingness' Problem

[00:30:36] 3.4 Darwin Complete Systems and Evolutionary Search Spaces

4. System Architecture and Learning

[00:37:35] 4.1 Code Generation vs Neural Networks Comparison

[00:41:04] 4.2 Thought Cloning and Behavioral Learning Systems

[00:47:00] 4.3 Language Emergence in AI Systems

[00:50:23] 4.4 AI Interpretability and Safety Monitoring Techniques

5. AI Safety and Governance

[00:53:56] 5.1 Language Model Consistency and Belief Systems

[00:57:00] 5.2 AI Safety Challenges and Alignment Limitations

[01:02:07] 5.3 Open Source AI Development and Value Alignment

[01:08:19] 5.4 Global AI Governance and Development Control

6. Advanced AI Systems and Evolution

[01:16:55] 6.1 Agent Systems and Performance Evaluation

[01:22:45] 6.2 Continuous Learning Challenges and In-Context Solutions

[01:26:46] 6.3 Evolution Algorithms and Environment Generation

[01:35:36] 6.4 Evolutionary Biology Insights and Experiments

[01:48:08] 6.5 Personal Journey from Philosophy to AI Research

Shownotes:

We craft detailed show notes for each episode with high quality transcript and references and best parts bolded.

https://www.dropbox.com/scl/fi/fz43pdoc5wq5jh7vsnujl/JEFFCLUNE.pdf?rlkey=uu0e70ix9zo6g5xn6amykffpm&st=k2scxteu&dl=0

2025-01-04
Link to episode

Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Neel Nanda, a senior research scientist at Google DeepMind, leads their mechanistic interpretability team. In this extensive interview, he discusses his work trying to understand how neural networks function internally. At just 25 years old, Nanda has quickly become a prominent voice in AI research after completing his pure mathematics degree at Cambridge in 2020.

Nanda reckons that machine learning is unique because we create neural networks that can perform impressive tasks (like complex reasoning and software engineering) without understanding how they work internally. He compares this to having computer programs that can do things no human programmer knows how to write. His work focuses on "mechanistic interpretability" - attempting to uncover and understand the internal structures and algorithms that emerge within these networks.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/

***

SHOWNOTES, TRANSCRIPT, ALL REFERENCES (DONT MISS!):

https://www.dropbox.com/scl/fi/36dvtfl3v3p56hbi30im7/NeelShow.pdf?rlkey=pq8t7lyv2z60knlifyy17jdtx&st=kiutudhc&dl=0

We riff on:

* How neural networks develop meaningful internal representations beyond simple pattern matching

* The effectiveness of chain-of-thought prompting and why it improves model performance

* The importance of hands-on coding over extensive paper reading for new researchers

* His journey from Cambridge to working with Chris Olah at Anthropic and eventually Google DeepMind

* The role of mechanistic interpretability in AI safety

NEEL NANDA:

https://www.neelnanda.io/

https://scholar.google.com/citations?user=GLnX3MkAAAAJ&hl=en

https://x.com/NeelNanda5

Interviewer - Tim Scarfe

TOC:

1. Part 1: Introduction

[00:00:00] 1.1 Introduction and Core Concepts Overview

2. Part 2: Outside Interview

[00:06:45] 2.1 Mechanistic Interpretability Foundations

3. Part 3: Main Interview

[00:32:52] 3.1 Mechanistic Interpretability

4. Neural Architecture and Circuits

[01:00:31] 4.1 Biological Evolution Parallels

[01:04:03] 4.2 Universal Circuit Patterns and Induction Heads

[01:11:07] 4.3 Entity Detection and Knowledge Boundaries

[01:14:26] 4.4 Mechanistic Interpretability and Activation Patching

5. Model Behavior Analysis

[01:30:00] 5.1 Golden Gate Claude Experiment and Feature Amplification

[01:33:27] 5.2 Model Personas and RLHF Behavior Modification

[01:36:28] 5.3 Steering Vectors and Linear Representations

[01:40:00] 5.4 Hallucinations and Model Uncertainty

6. Sparse Autoencoder Architecture

[01:44:54] 6.1 Architecture and Mathematical Foundations

[02:22:03] 6.2 Core Challenges and Solutions

[02:32:04] 6.3 Advanced Activation Functions and Top-k Implementations

[02:34:41] 6.4 Research Applications in Transformer Circuit Analysis

7. Feature Learning and Scaling

[02:48:02] 7.1 Autoencoder Feature Learning and Width Parameters

[03:02:46] 7.2 Scaling Laws and Training Stability

[03:11:00] 7.3 Feature Identification and Bias Correction

[03:19:52] 7.4 Training Dynamics Analysis Methods

8. Engineering Implementation

[03:23:48] 8.1 Scale and Infrastructure Requirements

[03:25:20] 8.2 Computational Requirements and Storage

[03:35:22] 8.3 Chain-of-Thought Reasoning Implementation

[03:37:15] 8.4 Latent Structure Inference in Language Models

2024-12-07
Link to episode

Jonas Hübotter (ETH) - Test Time Inference

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Jonas Hübotter, PhD student at ETH Zurich's Institute for Machine Learning, discusses his groundbreaking research on test-time computation and local learning. He demonstrates how smaller models can outperform larger ones by 30x through strategic test-time computation and introduces a novel paradigm combining inductive and transductive learning approaches.

Using Bayesian linear regression as a surrogate model for uncertainty estimation, Jonas explains how models can efficiently adapt to specific tasks without massive pre-training. He draws an analogy to Google Earth's variable resolution system to illustrate dynamic resource allocation based on task complexity.

The conversation explores the future of AI architecture, envisioning systems that continuously learn and adapt beyond current monolithic models. Jonas concludes by proposing hybrid deployment strategies combining local and cloud computation, suggesting a future where compute resources are allocated based on task complexity rather than fixed model size.

This research represents a significant shift in machine learning, prioritizing intelligent resource allocation and adaptive learning over traditional scaling approaches.

SPONSOR MESSAGES:

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/

Transcription, references and show notes PDF download:

https://www.dropbox.com/scl/fi/cxg80p388snwt6qbp4m52/JonasFinal.pdf?rlkey=glk9mhpzjvesanlc14rtpvk4r&st=6qwi8n3x&dl=0

Jonas Hübotter

https://jonhue.github.io/

https://scholar.google.com/citations?user=pxi_RkwAAAAJ

Transductive Active Learning: Theory and Applications (NeurIPS 2024)

https://arxiv.org/pdf/2402.15898

EFFICIENTLY LEARNING AT TEST-TIME: ACTIVE FINE-TUNING OF LLMS (SIFT)

https://arxiv.org/pdf/2410.08020

TOC:

1. Test-Time Computation Fundamentals

[00:00:00] Intro

[00:03:10] 1.1 Test-Time Computation and Model Performance Comparison

[00:05:52] 1.2 Retrieval Augmentation and Machine Teaching Strategies

[00:09:40] 1.3 In-Context Learning vs Fine-Tuning Trade-offs

2. System Architecture and Intelligence

[00:15:58] 2.1 System Architecture and Intelligence Emergence

[00:23:22] 2.2 Active Inference and Constrained Agency in AI

[00:29:52] 2.3 Evolution of Local Learning Methods

[00:32:05] 2.4 Vapnik's Contributions to Transductive Learning

3. Resource Optimization and Local Learning

[00:34:35] 3.1 Computational Resource Allocation in ML Models

[00:35:30] 3.2 Historical Context and Traditional ML Optimization

[00:37:55] 3.3 Variable Resolution Processing and Active Inference in ML

[00:43:01] 3.4 Local Learning and Base Model Capacity Trade-offs

[00:48:04] 3.5 Active Learning vs Local Learning Approaches

4. Information Retrieval and Model Interpretability

[00:51:08] 4.1 Information Retrieval and Nearest Neighbor Limitations

[01:03:07] 4.2 Model Interpretability and Surrogate Models

[01:15:03] 4.3 Bayesian Uncertainty Estimation and Surrogate Models

5. Distributed Systems and Deployment

[01:23:56] 5.1 Memory Architecture and Controller Systems

[01:28:14] 5.2 Evolution from Static to Distributed Learning Systems

[01:38:03] 5.3 Transductive Learning and Model Specialization

[01:41:58] 5.4 Hybrid Local-Cloud Deployment Strategies

2024-12-01
Link to episode

How AI Could Be A Mathematician's Co-Pilot by 2026 (Prof. Swarat Chaudhuri)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Professor Swarat Chaudhuri from the University of Texas at Austin and visiting researcher at Google DeepMind discusses breakthroughs in AI reasoning, theorem proving, and mathematical discovery. Chaudhuri explains his groundbreaking work on COPRA (a GPT-based prover agent), shares insights on neurosymbolic approaches to AI.

Professor Swarat Chaudhuri:

https://www.cs.utexas.edu/~swarat/

SPONSOR MESSAGES:

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/

TOC:

[00:00:00] 0. Introduction / CentML ad, Tufa ad

1. AI Reasoning: From Language Models to Neurosymbolic Approaches

[00:02:27] 1.1 Defining Reasoning in AI

[00:09:51] 1.2 Limitations of Current Language Models

[00:17:22] 1.3 Neuro-symbolic Approaches and Program Synthesis

[00:24:59] 1.4 COPRA and In-Context Learning for Theorem Proving

[00:34:39] 1.5 Symbolic Regression and LLM-Guided Abstraction

2. AI in Mathematics: Theorem Proving and Concept Discovery

[00:43:37] 2.1 AI-Assisted Theorem Proving and Proof Verification

[01:01:37] 2.2 Symbolic Regression and Concept Discovery in Mathematics

[01:11:57] 2.3 Scaling and Modularizing Mathematical Proofs

[01:21:53] 2.4 COPRA: In-Context Learning for Formal Theorem-Proving

[01:28:22] 2.5 AI-driven theorem proving and mathematical discovery

3. Formal Methods and Challenges in AI Mathematics

[01:30:42] 3.1 Formal proofs, empirical predicates, and uncertainty in AI mathematics

[01:34:01] 3.2 Characteristics of good theoretical computer science research

[01:39:16] 3.3 LLMs in theorem generation and proving

[01:42:21] 3.4 Addressing contamination and concept learning in AI systems

REFS:

00:04:58 The Chinese Room Argument, https://plato.stanford.edu/entries/chinese-room/

00:11:42 Software 2.0, https://medium.com/@karpathy/software-2-0-a64152b37c35

00:11:57 Solving Olympiad Geometry Without Human Demonstrations, https://www.nature.com/articles/s41586-023-06747-5

00:13:26 Lean, https://lean-lang.org/

00:15:43 A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go Through Self-Play, https://www.science.org/doi/10.1126/science.aar6404

00:19:24 DreamCoder (Ellis et al., PLDI 2021), https://arxiv.org/abs/2006.08381

00:24:37 The Lambda Calculus, https://plato.stanford.edu/entries/lambda-calculus/

00:26:43 Neural Sketch Learning for Conditional Program Generation, https://arxiv.org/pdf/1703.05698

00:28:08 Learning Differentiable Programs With Admissible Neural Heuristics, https://arxiv.org/abs/2007.12101

00:31:03 Symbolic Regression With a Learned Concept Library (Grayeli et al., NeurIPS 2024), https://arxiv.org/abs/2409.09359

00:41:30 Formal Verification of Parallel Programs, https://dl.acm.org/doi/10.1145/360248.360251

01:00:37 Training Compute-Optimal Large Language Models, https://arxiv.org/abs/2203.15556

01:18:19 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, https://arxiv.org/abs/2201.11903

01:18:42 Draft, Sketch, and Prove: Guiding Formal Theorem Provers With Informal Proofs, https://arxiv.org/abs/2210.12283

01:19:49 Learning Formal Mathematics From Intrinsic Motivation, https://arxiv.org/pdf/2407.00695

01:20:19 An In-Context Learning Agent for Formal Theorem-Proving (Thakur et al., CoLM 2024), https://arxiv.org/pdf/2310.04353

01:23:58 Learning to Prove Theorems via Interacting With Proof Assistants, https://arxiv.org/abs/1905.09381

01:39:58 An In-Context Learning Agent for Formal Theorem-Proving (Thakur et al., CoLM 2024), https://arxiv.org/pdf/2310.04353

01:42:24 Programmatically Interpretable Reinforcement Learning (Verma et al., ICML 2018), https://arxiv.org/abs/1804.02477

2024-11-25
Link to episode

Nora Belrose - AI Development, Safety, and Meaning

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Nora Belrose, Head of Interpretability Research at EleutherAI, discusses critical challenges in AI safety and development. The conversation begins with her technical work on concept erasure in neural networks through LEACE (LEAst-squares Concept Erasure), while highlighting how neural networks' progression from simple to complex learning patterns could have important implications for AI safety.

Many fear that advanced AI will pose an existential threat -- pursuing its own dangerous goals once it's powerful enough. But Belrose challenges this popular doomsday scenario with a fascinating breakdown of why it doesn't add up.

Belrose also provides a detailed critique of current AI alignment approaches, particularly examining "counting arguments" and their limitations when applied to AI safety. She argues that the Principle of Indifference may be insufficient for addressing existential risks from advanced AI systems. The discussion explores how emergent properties in complex AI systems could lead to unpredictable and potentially dangerous behaviors that simple reductionist approaches fail to capture.

The conversation concludes by exploring broader philosophical territory, where Belrose discusses her growing interest in Buddhism's potential relevance to a post-automation future. She connects concepts of moral anti-realism with Buddhist ideas about emptiness and non-attachment, suggesting these frameworks might help humans find meaning in a world where AI handles most practical tasks. Rather than viewing this automated future with alarm, she proposes that Zen Buddhism's emphasis on spontaneity and presence might complement a society freed from traditional labor.

SPONSOR MESSAGES:

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/

Nora Belrose:

https://norabelrose.com/

https://scholar.google.com/citations?user=p_oBc64AAAAJ&hl=en

https://x.com/norabelrose

SHOWNOTES:

https://www.dropbox.com/scl/fi/38fhsv2zh8gnubtjaoq4a/NORA_FINAL.pdf?rlkey=0e5r8rd261821g1em4dgv0k70&st=t5c9ckfb&dl=0

TOC:

1. Neural Network Foundations

[00:00:00] 1.1 Philosophical Foundations and Neural Network Simplicity Bias

[00:02:20] 1.2 LEACE and Concept Erasure Fundamentals

[00:13:16] 1.3 LISA Technical Implementation and Applications

[00:18:50] 1.4 Practical Implementation Challenges and Data Requirements

[00:22:13] 1.5 Performance Impact and Limitations of Concept Erasure

2. Machine Learning Theory

[00:32:23] 2.1 Neural Network Learning Progression and Simplicity Bias

[00:37:10] 2.2 Optimal Transport Theory and Image Statistics Manipulation

[00:43:05] 2.3 Grokking Phenomena and Training Dynamics

[00:44:50] 2.4 Texture vs Shape Bias in Computer Vision Models

[00:45:15] 2.5 CNN Architecture and Shape Recognition Limitations

3. AI Systems and Value Learning

[00:47:10] 3.1 Meaning, Value, and Consciousness in AI Systems

[00:53:06] 3.2 Global Connectivity vs Local Culture Preservation

[00:58:18] 3.3 AI Capabilities and Future Development Trajectory

4. Consciousness Theory

[01:03:03] 4.1 4E Cognition and Extended Mind Theory

[01:09:40] 4.2 Thompson's Views on Consciousness and Simulation

[01:12:46] 4.3 Phenomenology and Consciousness Theory

[01:15:43] 4.4 Critique of Illusionism and Embodied Experience

[01:23:16] 4.5 AI Alignment and Counting Arguments Debate

(TRUNCATED, TOC embedded in MP3 file with more information)

2024-11-17
Link to episode

Why Your GPUs are underutilised for AI - CentML CEO Explains

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Prof. Gennady Pekhimenko (CEO of CentML, UofT) joins us in this *sponsored episode* to dive deep into AI system optimization and enterprise implementation. From NVIDIA's technical leadership model to the rise of open-source AI, Pekhimenko shares insights on bridging the gap between academic research and industrial applications. Learn about "dark silicon," GPU utilization challenges in ML workloads, and how modern enterprises can optimize their AI infrastructure. The conversation explores why some companies achieve only 10% GPU efficiency and practical solutions for improving AI system performance. A must-watch for anyone interested in the technical foundations of enterprise AI and hardware optimization.

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Cheaper, faster, no commitments, pay as you go, scale massively, simple to setup. Check it out!

https://centml.ai/pricing/

SPONSOR MESSAGES:

MLST is also sponsored by Tufa AI Labs - https://tufalabs.ai/

They are hiring cracked ML engineers/researchers to work on ARC and build AGI!

SHOWNOTES (diarised transcript, TOC, references, summary, best quotes etc)

https://www.dropbox.com/scl/fi/w9kbpso7fawtm286kkp6j/Gennady.pdf?rlkey=aqjqmncx3kjnatk2il1gbgknk&st=2a9mccj8&dl=0

TOC:

1. AI Strategy and Leadership

[00:00:00] 1.1 Technical Leadership and Corporate Structure

[00:09:55] 1.2 Open Source vs Proprietary AI Models

[00:16:04] 1.3 Hardware and System Architecture Challenges

[00:23:37] 1.4 Enterprise AI Implementation and Optimization

[00:35:30] 1.5 AI Reasoning Capabilities and Limitations

2. AI System Development

[00:38:45] 2.1 Computational and Cognitive Limitations of AI Systems

[00:42:40] 2.2 Human-LLM Communication Adaptation and Patterns

[00:46:18] 2.3 AI-Assisted Software Development Challenges

[00:47:55] 2.4 Future of Software Engineering Careers in AI Era

[00:49:49] 2.5 Enterprise AI Adoption Challenges and Implementation

3. ML Infrastructure Optimization

[00:54:41] 3.1 MLOps Evolution and Platform Centralization

[00:55:43] 3.2 Hardware Optimization and Performance Constraints

[01:05:24] 3.3 ML Compiler Optimization and Python Performance

[01:15:57] 3.4 Enterprise ML Deployment and Cloud Provider Partnerships

4. Distributed AI Architecture

[01:27:05] 4.1 Multi-Cloud ML Infrastructure and Optimization

[01:29:45] 4.2 AI Agent Systems and Production Readiness

[01:32:00] 4.3 RAG Implementation and Fine-Tuning Considerations

[01:33:45] 4.4 Distributed AI Systems Architecture and Ray Framework

5. AI Industry Standards and Research

[01:37:55] 5.1 Origins and Evolution of MLPerf Benchmarking

[01:43:15] 5.2 MLPerf Methodology and Industry Impact

[01:50:17] 5.3 Academic Research vs Industry Implementation in AI

[01:58:59] 5.4 AI Research History and Safety Concerns

2024-11-13
Link to episode

Eliezer Yudkowsky and Stephen Wolfram on AI X-risk

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Eliezer Yudkowsky and Stephen Wolfram discuss artificial intelligence and its potential existen?

tial risks. They traversed fundamental questions about AI safety, consciousness, computational irreducibility, and the nature of intelligence.

The discourse centered on Yudkowsky?s argument that advanced AI systems pose an existential threat to humanity, primarily due to the challenge of alignment and the potential for emergent goals that diverge from human values. Wolfram, while acknowledging potential risks, approached the topic from a his signature measured perspective, emphasizing the importance of understanding computational systems? fundamental nature and questioning whether AI systems would necessarily develop the kind of goal?directed behavior Yudkowsky fears.

***

MLST IS SPONSORED BY TUFA AI LABS!

The current winners of the ARC challenge, MindsAI are part of Tufa AI Labs. They are hiring ML engineers. Are you interested?! Please goto https://tufalabs.ai/

***

TOC:

1. Foundational AI Concepts and Risks

[00:00:01] 1.1 AI Optimization and System Capabilities Debate

[00:06:46] 1.2 Computational Irreducibility and Intelligence Limitations

[00:20:09] 1.3 Existential Risk and Species Succession

[00:23:28] 1.4 Consciousness and Value Preservation in AI Systems

2. Ethics and Philosophy in AI

[00:33:24] 2.1 Moral Value of Human Consciousness vs. Computation

[00:36:30] 2.2 Ethics and Moral Philosophy Debate

[00:39:58] 2.3 Existential Risks and Digital Immortality

[00:43:30] 2.4 Consciousness and Personal Identity in Brain Emulation

3. Truth and Logic in AI Systems

[00:54:39] 3.1 AI Persuasion Ethics and Truth

[01:01:48] 3.2 Mathematical Truth and Logic in AI Systems

[01:11:29] 3.3 Universal Truth vs Personal Interpretation in Ethics and Mathematics

[01:14:43] 3.4 Quantum Mechanics and Fundamental Reality Debate

4. AI Capabilities and Constraints

[01:21:21] 4.1 AI Perception and Physical Laws

[01:28:33] 4.2 AI Capabilities and Computational Constraints

[01:34:59] 4.3 AI Motivation and Anthropomorphization Debate

[01:38:09] 4.4 Prediction vs Agency in AI Systems

5. AI System Architecture and Behavior

[01:44:47] 5.1 Computational Irreducibility and Probabilistic Prediction

[01:48:10] 5.2 Teleological vs Mechanistic Explanations of AI Behavior

[02:09:41] 5.3 Machine Learning as Assembly of Computational Components

[02:29:52] 5.4 AI Safety and Predictability in Complex Systems

6. Goal Optimization and Alignment

[02:50:30] 6.1 Goal Specification and Optimization Challenges in AI Systems

[02:58:31] 6.2 Intelligence, Computation, and Goal-Directed Behavior

[03:02:18] 6.3 Optimization Goals and Human Existential Risk

[03:08:49] 6.4 Emergent Goals and AI Alignment Challenges

7. AI Evolution and Risk Assessment

[03:19:44] 7.1 Inner Optimization and Mesa-Optimization Theory

[03:34:00] 7.2 Dynamic AI Goals and Extinction Risk Debate

[03:56:05] 7.3 AI Risk and Biological System Analogies

[04:09:37] 7.4 Expert Risk Assessments and Optimism vs Reality

8. Future Implications and Economics

[04:13:01] 8.1 Economic and Proliferation Considerations

SHOWNOTES (transcription, references, summary, best quotes etc):

https://www.dropbox.com/scl/fi/3st8dts2ba7yob161dchd/EliezerWolfram.pdf?rlkey=b6va5j8upgqwl9s2muc924vtt&st=vemwqx7a&dl=0

2024-11-11
Link to episode

Pattern Recognition vs True Intelligence - Francois Chollet

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Francois Chollet, a prominent AI expert and creator of ARC-AGI, discusses intelligence, consciousness, and artificial intelligence.

Chollet explains that real intelligence isn't about memorizing information or having lots of knowledge - it's about being able to handle new situations effectively. This is why he believes current large language models (LLMs) have "near-zero intelligence" despite their impressive abilities. They're more like sophisticated memory and pattern-matching systems than truly intelligent beings.

***

MLST IS SPONSORED BY TUFA AI LABS!

The current winners of the ARC challenge, MindsAI are part of Tufa AI Labs. They are hiring ML engineers. Are you interested?! Please goto https://tufalabs.ai/

***

He introduced his "Kaleidoscope Hypothesis," which suggests that while the world seems infinitely complex, it's actually made up of simpler patterns that repeat and combine in different ways. True intelligence, he argues, involves identifying these basic patterns and using them to understand new situations.

Chollet also talked about consciousness, suggesting it develops gradually in children rather than appearing all at once. He believes consciousness exists in degrees - animals have it to some extent, and even human consciousness varies with age and circumstances (like being more conscious when learning something new versus doing routine tasks).

On AI safety, Chollet takes a notably different stance from many in Silicon Valley. He views AGI development as a scientific challenge rather than a religious quest, and doesn't share the apocalyptic concerns of some AI researchers. He argues that intelligence itself isn't dangerous - it's just a tool for turning information into useful models. What matters is how we choose to use it.

ARC-AGI Prize:

https://arcprize.org/

Francois Chollet:

https://x.com/fchollet

Shownotes:

https://www.dropbox.com/scl/fi/j2068j3hlj8br96pfa7bi/CHOLLET_FINAL.pdf?rlkey=xkbr7tbnrjdl66m246w26uc8k&st=0a4ec4na&dl=0

TOC:

1. Intelligence and Model Building

[00:00:00] 1.1 Intelligence Definition and ARC Benchmark

[00:05:40] 1.2 LLMs as Program Memorization Systems

[00:09:36] 1.3 Kaleidoscope Hypothesis and Abstract Building Blocks

[00:13:39] 1.4 Deep Learning Limitations and System 2 Reasoning

[00:29:38] 1.5 Intelligence vs. Skill in LLMs and Model Building

2. ARC Benchmark and Program Synthesis

[00:37:36] 2.1 Intelligence Definition and LLM Limitations

[00:41:33] 2.2 Meta-Learning System Architecture

[00:56:21] 2.3 Program Search and Occam's Razor

[00:59:42] 2.4 Developer-Aware Generalization

[01:06:49] 2.5 Task Generation and Benchmark Design

3. Cognitive Systems and Program Generation

[01:14:38] 3.1 System 1/2 Thinking Fundamentals

[01:22:17] 3.2 Program Synthesis and Combinatorial Challenges

[01:31:18] 3.3 Test-Time Fine-Tuning Strategies

[01:36:10] 3.4 Evaluation and Leakage Problems

[01:43:22] 3.5 ARC Implementation Approaches

4. Intelligence and Language Systems

[01:50:06] 4.1 Intelligence as Tool vs Agent

[01:53:53] 4.2 Cultural Knowledge Integration

[01:58:42] 4.3 Language and Abstraction Generation

[02:02:41] 4.4 Embodiment in Cognitive Systems

[02:09:02] 4.5 Language as Cognitive Operating System

5. Consciousness and AI Safety

[02:14:05] 5.1 Consciousness and Intelligence Relationship

[02:20:25] 5.2 Development of Machine Consciousness

[02:28:40] 5.3 Consciousness Prerequisites and Indicators

[02:36:36] 5.4 AGI Safety Considerations

[02:40:29] 5.5 AI Regulation Framework

2024-11-07
Link to episode

The Elegant Math Behind Machine Learning - Anil Ananthaswamy

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Anil Ananthaswamy is an award-winning science writer and former staff writer and deputy news editor for the London-based New Scientist magazine.

Machine learning systems are making life-altering decisions for us: approving mortgage loans, determining whether a tumor is cancerous, or deciding if someone gets bail. They now influence developments and discoveries in chemistry, biology, and physics?the study of genomes, extrasolar planets, even the intricacies of quantum systems. And all this before large language models such as ChatGPT came on the scene.

We are living through a revolution in machine learning-powered AI that shows no signs of slowing down. This technology is based on relatively simple mathematical ideas, some of which go back centuries, including linear algebra and calculus, the stuff of seventeenth- and eighteenth-century mathematics. It took the birth and advancement of computer science and the kindling of 1990s computer chips designed for video games to ignite the explosion of AI that we see today. In this enlightening book, Anil Ananthaswamy explains the fundamental math behind machine learning, while suggesting intriguing links between artificial and natural intelligence. Might the same math underpin them both?

As Ananthaswamy resonantly concludes, to make safe and effective use of artificial intelligence, we need to understand its profound capabilities and limitations, the clues to which lie in the math that makes machine learning possible.

Why Machines Learn: The Elegant Math Behind Modern AI:

https://amzn.to/3UAWX3D

https://anilananthaswamy.com/

Sponsor message:

DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?

Interested? Apply for an ML research position: benjamin@tufa.ai

Shownotes:

https://www.dropbox.com/scl/fi/wpv22m5jxyiqr6pqfkzwz/anil.pdf?rlkey=9c233jo5armr548ctwo419n6p&st=xzhahtje&dl=0

Chapters:

1. ML Fundamentals and Prerequisites

[00:00:00] 1.1 Differences Between Human and Machine Learning

[00:00:35] 1.2 Mathematical Prerequisites and Societal Impact of ML

[00:02:20] 1.3 Author's Journey and Book Background

[00:11:30] 1.4 Mathematical Foundations and Core ML Concepts

[00:21:45] 1.5 Bias-Variance Tradeoff and Modern Deep Learning

2. Deep Learning Architecture

[00:29:05] 2.1 Double Descent and Overparameterization in Deep Learning

[00:32:40] 2.2 Mathematical Foundations and Self-Supervised Learning

[00:40:05] 2.3 High-Dimensional Spaces and Model Architecture

[00:52:55] 2.4 Historical Development of Backpropagation

3. AI Understanding and Limitations

[00:59:13] 3.1 Pattern Matching vs Human Reasoning in ML Models

[01:00:20] 3.2 Mathematical Foundations and Pattern Recognition in AI

[01:04:08] 3.3 LLM Reliability and Machine Understanding Debate

[01:12:50] 3.4 Historical Development of Deep Learning Technologies

[01:15:21] 3.5 Alternative AI Approaches and Bio-inspired Methods

4. Ethical and Neurological Perspectives

[01:24:32] 4.1 Neural Network Scaling and Mathematical Limitations

[01:31:12] 4.2 AI Ethics and Societal Impact

[01:38:30] 4.3 Consciousness and Neurological Conditions

[01:46:17] 4.4 Body Ownership and Agency in Neuroscience

2024-11-04
Link to episode

Michael Levin - Why Intelligence Isn't Limited To Brains.

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Professor Michael Levin explores the revolutionary concept of diverse intelligence, demonstrating how cognitive capabilities extend far beyond traditional brain-based intelligence. Drawing from his groundbreaking research, he explains how even simple biological systems like gene regulatory networks exhibit learning, memory, and problem-solving abilities. Levin introduces key concepts like "cognitive light cones" - the scope of goals a system can pursue - and shows how these ideas are transforming our approach to cancer treatment and biological engineering. His insights challenge conventional views of intelligence and agency, with profound implications for both medicine and artificial intelligence development. This deep discussion reveals how understanding intelligence as a spectrum, from molecular networks to human minds, could be crucial for humanity's future technological development. Contains technical discussion of biological systems, cybernetics, and theoretical frameworks for understanding emergent cognition.

Prof. Michael Levin

https://as.tufts.edu/biology/people/faculty/michael-levin

https://x.com/drmichaellevin

Sponsor message:

DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?

Interested? Apply for an ML research position: benjamin@tufa.ai

TOC

1. Intelligence Fundamentals and Evolution

[00:00:00] 1.1 Future Evolution of Human Intelligence and Consciousness

[00:03:00] 1.2 Science Fiction's Role in Exploring Intelligence Possibilities

[00:08:15] 1.3 Essential Characteristics of Human-Level Intelligence and Relationships

[00:14:20] 1.4 Biological Systems Architecture and Intelligence

2. Biological Computing and Cognition

[00:24:00] 2.1 Agency and Intelligence in Biological Systems

[00:30:30] 2.2 Learning Capabilities in Gene Regulatory Networks

[00:35:37] 2.3 Biological Control Systems and Competency Architecture

[00:39:58] 2.4 Scientific Metaphors and Polycomputing Paradigm

3. Systems and Collective Intelligence

[00:43:26] 3.1 Embodiment and Problem-Solving Spaces

[00:44:50] 3.2 Perception-Action Loops and Biological Intelligence

[00:46:55] 3.3 Intelligence, Wisdom and Collective Systems

[00:53:07] 3.4 Cancer and Cognitive Light Cones

[00:57:09] 3.5 Emergent Intelligence and AI Agency

Shownotes:

https://www.dropbox.com/scl/fi/i2vl1vs009thg54lxx5wc/LEVIN.pdf?rlkey=dtk8okhbsejryiu2vrht19qp6&st=uzi0vo45&dl=0

REFS:

[0:05:30] A Fire Upon the Deep - Vernor Vinge sci-fi novel on AI and consciousness

[0:05:35] Maria Chudnovsky - MacArthur Fellow, Princeton mathematician, graph theory expert

[0:14:20] Bow-tie architecture in biological systems - Network structure research by Csete & Doyle

[0:15:40] Richard Watson - Southampton Professor, evolution and learning systems expert

[0:17:00] Levin paper on human issues in AI and evolution

[0:19:00] Bow-tie architecture in Darwin's agential materialism - Levin

[0:22:55] Philip Goff - Work on panpsychism and consciousness in Galileo's Error

[0:23:30] Strange Loop - Hofstadter's work on self-reference and consciousness

[0:25:00] The Hard Problem of Consciousness - Van Gulick

[0:26:15] Daniel Dennett - Theories on consciousness and intentional systems

[0:29:35] Principle of Least Action - Light path selection in physics

[0:29:50] Free Energy Principle - Friston's unified behavioral framework

[0:30:35] Gene regulatory networks - Learning capabilities in biological systems

[0:36:55] Minimal networks with learning capacity - Levin

[0:38:50] Multi-scale competency in biological systems - Levin

[0:41:40] Polycomputing paradigm - Biological computation by Bongard & Levin

[0:45:40] Collective intelligence in biology - Levin et al.

[0:46:55] Niche construction and stigmergy - Torday

[0:53:50] Tasmanian Devil Facial Tumor Disease - Transmissible cancer research

[0:55:05] Cognitive light cone - Computational boundaries of self - Levin

[0:58:05] Cognitive properties in sorting algorithms - Zhang, Goldstein & Levin

2024-10-24
Link to episode

Speechmatics CTO - Next-Generation Speech Recognition

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Will Williams is CTO of Speechmatics in Cambridge. In this sponsored episode - he shares deep technical insights into modern speech recognition technology and system architecture. The episode covers several key technical areas:

* Speechmatics' hybrid approach to ASR, which focusses on unsupervised learning methods, achieving comparable results with 100x less data than fully supervised approaches. Williams explains why this is more efficient and generalizable than end-to-end models like Whisper.

* Their production architecture implementing multiple operating points for different latency-accuracy trade-offs, with careful latency padding (up to 1.8 seconds) to ensure consistent user experience. The system uses lattice-based decoding with language model integration for improved accuracy.

* The challenges and solutions in real-time ASR, including their approach to diarization (speaker identification), handling cross-talk, and implicit source separation. Williams explains why these problems remain difficult even with modern deep learning approaches.

* Their testing and deployment infrastructure, including the use of mirrored environments for catching edge cases in production, and their strategy of maintaining global models rather than allowing customer-specific fine-tuning.

* Technical evolution in ASR, from early days of custom CUDA kernels and manual memory management to modern frameworks, with Williams offering interesting critiques of current PyTorch memory management approaches and arguing for more efficient direct memory allocation in production systems.

Get coding with their API! This is their URL:

https://www.speechmatics.com/

DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?

MLST is sponsored by Tufa Labs:

Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more.

Interested? Apply for an ML research position: benjamin@tufa.ai

TOC

1. ASR Core Technology & Real-time Architecture

[00:00:00] 1.1 ASR and Diarization Fundamentals

[00:05:25] 1.2 Real-time Conversational AI Architecture

[00:09:21] 1.3 Neural Network Streaming Implementation

[00:12:49] 1.4 Multi-modal System Integration

2. Production System Optimization

[00:29:38] 2.1 Production Deployment and Testing Infrastructure

[00:35:40] 2.2 Model Architecture and Deployment Strategy

[00:37:12] 2.3 Latency-Accuracy Trade-offs

[00:39:15] 2.4 Language Model Integration

[00:40:32] 2.5 Lattice-based Decoding Architecture

3. Performance Evaluation & Ethical Considerations

[00:44:00] 3.1 ASR Performance Metrics and Capabilities

[00:46:35] 3.2 AI Regulation and Evaluation Methods

[00:51:09] 3.3 Benchmark and Testing Challenges

[00:54:30] 3.4 Real-world Implementation Metrics

[01:00:51] 3.5 Ethics and Privacy Considerations

4. ASR Technical Evolution

[01:09:00] 4.1 WER Calculation and Evaluation Methodologies

[01:10:21] 4.2 Supervised vs Self-Supervised Learning Approaches

[01:21:02] 4.3 Temporal Learning and Feature Processing

[01:24:45] 4.4 Feature Engineering to Automated ML

5. Enterprise Implementation & Scale

[01:27:55] 5.1 Future AI Systems and Adaptation

[01:31:52] 5.2 Technical Foundations and History

[01:34:53] 5.3 Infrastructure and Team Scaling

[01:38:05] 5.4 Research and Talent Strategy

[01:41:11] 5.5 Engineering Practice Evolution

Shownotes:

https://www.dropbox.com/scl/fi/d94b1jcgph9o8au8shdym/Speechmatics.pdf?rlkey=bi55wvktzomzx0y5sic6jz99y&st=6qwofv8t&dl=0

2024-10-24
Link to episode

Dr. Sanjeev Namjoshi - Active Inference

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Dr. Sanjeev Namjoshi, a machine learning engineer who recently submitted a book on Active Inference to MIT Press, discusses the theoretical foundations and practical applications of Active Inference, the Free Energy Principle (FEP), and Bayesian mechanics. He explains how these frameworks describe how biological and artificial systems maintain stability by minimizing uncertainty about their environment.

DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?

MLST is sponsored by Tufa Labs:

Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more.

Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2.

Interested? Apply for an ML research position: benjamin@tufa.ai

Namjoshi traces the evolution of these fields from early 2000s neuroscience research to current developments, highlighting how Active Inference provides a unified framework for perception and action through variational free energy minimization. He contrasts this with traditional machine learning approaches, emphasizing Active Inference's natural capacity for exploration and curiosity through epistemic value.

He sees Active Inference as being at a similar stage to deep learning in the early 2000s - poised for significant breakthroughs but requiring better tools and wider adoption. While acknowledging current computational challenges, he emphasizes Active Inference's potential advantages over reinforcement learning, particularly its principled approach to exploration and planning.

Dr. Sanjeev Namjoshi

https://snamjoshi.github.io/

TOC:

1. Theoretical Foundations: AI Agency and Sentience

[00:00:00] 1.1 Intro

[00:02:45] 1.2 Free Energy Principle and Active Inference Theory

[00:11:16] 1.3 Emergence and Self-Organization in Complex Systems

[00:19:11] 1.4 Agency and Representation in AI Systems

[00:29:59] 1.5 Bayesian Mechanics and Systems Modeling

2. Technical Framework: Active Inference and Free Energy

[00:38:37] 2.1 Generative Processes and Agent-Environment Modeling

[00:42:27] 2.2 Markov Blankets and System Boundaries

[00:44:30] 2.3 Bayesian Inference and Prior Distributions

[00:52:41] 2.4 Variational Free Energy Minimization Framework

[00:55:07] 2.5 VFE Optimization Techniques: Generalized Filtering vs DEM

3. Implementation and Optimization Methods

[00:58:25] 3.1 Information Theory and Free Energy Concepts

[01:05:25] 3.2 Surprise Minimization and Action in Active Inference

[01:15:58] 3.3 Evolution of Active Inference Models: Continuous to Discrete Approaches

[01:26:00] 3.4 Uncertainty Reduction and Control Systems in Active Inference

4. Safety and Regulatory Frameworks

[01:32:40] 4.1 Historical Evolution of Risk Management and Predictive Systems

[01:36:12] 4.2 Agency and Reality: Philosophical Perspectives on Models

[01:39:20] 4.3 Limitations of Symbolic AI and Current System Design

[01:46:40] 4.4 AI Safety Regulation and Corporate Governance

5. Socioeconomic Integration and Modeling

[01:52:55] 5.1 Economic Policy and Public Sentiment Modeling

[01:55:21] 5.2 Free Energy Principle: Libertarian vs Collectivist Perspectives

[01:58:53] 5.3 Regulation of Complex Socio-Technical Systems

[02:03:04] 5.4 Evolution and Current State of Active Inference Research

6. Future Directions and Applications

[02:14:26] 6.1 Active Inference Applications and Future Development

[02:22:58] 6.2 Cultural Learning and Active Inference

[02:29:19] 6.3 Hierarchical Relationship Between FEP, Active Inference, and Bayesian Mechanics

[02:33:22] 6.4 Historical Evolution of Free Energy Principle

[02:38:52] 6.5 Active Inference vs Traditional Machine Learning Approaches

Transcript and shownotes with refs and URLs:

https://www.dropbox.com/scl/fi/qj22a660cob1795ej0gbw/SanjeevShow.pdf?rlkey=w323r3e8zfsnve22caayzb17k&st=el1fdgfr&dl=0

2024-10-22
Link to episode

Joscha Bach - Why Your Thoughts Aren't Yours.

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Dr. Joscha Bach discusses advanced AI, consciousness, and cognitive modeling. He presents consciousness as a virtual property emerging from self-organizing software patterns, challenging panpsychism and materialism. Bach introduces "Cyberanima," reinterpreting animism through information processing, viewing spirits as self-organizing software agents.

He addresses limitations of current large language models and advocates for smaller, more efficient AI models capable of reasoning from first principles. Bach describes his work with Liquid AI on novel neural network architectures for improved expressiveness and efficiency.

The interview covers AI's societal implications, including regulation challenges and impact on innovation. Bach argues for balancing oversight with technological progress, warning against overly restrictive regulations.

Throughout, Bach frames consciousness, intelligence, and agency as emergent properties of complex information processing systems, proposing a computational framework for cognitive phenomena and reality.

SPONSOR MESSAGE:

DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? MLST is sponsored by Tufa Labs: Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2. Interested? Apply for an ML research position: benjamin@tufa.ai

TOC

[00:00:00] 1.1 Consciousness and Intelligence in AI Development

[00:07:44] 1.2 Agency, Intelligence, and Their Relationship to Physical Reality

[00:13:36] 1.3 Virtual Patterns and Causal Structures in Consciousness

[00:25:49] 1.4 Reinterpreting Concepts of God and Animism in Information Processing Terms

[00:32:50] 1.5 Animism and Evolution as Competition Between Software Agents

2. Self-Organizing Systems and Cognitive Models in AI

[00:37:59] 2.1 Consciousness as self-organizing software

[00:45:49] 2.2 Critique of panpsychism and alternative views on consciousness

[00:50:48] 2.3 Emergence of consciousness in complex systems

[00:52:50] 2.4 Neuronal motivation and the origins of consciousness

[00:56:47] 2.5 Coherence and Self-Organization in AI Systems

3. Advanced AI Architectures and Cognitive Processes

[00:57:50] 3.1 Second-Order Software and Complex Mental Processes

[01:01:05] 3.2 Collective Agency and Shared Values in AI

[01:05:40] 3.3 Limitations of Current AI Agents and LLMs

[01:06:40] 3.4 Liquid AI and Novel Neural Network Architectures

[01:10:06] 3.5 AI Model Efficiency and Future Directions

[01:19:00] 3.6 LLM Limitations and Internal State Representation

4. AI Regulation and Societal Impact

[01:31:23] 4.1 AI Regulation and Societal Impact

[01:49:50] 4.2 Open-Source AI and Industry Challenges

Refs in shownotes and MP3 metadata

Shownotes:

https://www.dropbox.com/scl/fi/g28dosz19bzcfs5imrvbu/JoschaInterview.pdf?rlkey=s3y18jy192ktz6ogd7qtvry3d&st=10z7q7w9&dl=0

2024-10-20
Link to episode

Decompiling Dreams: A New Approach to ARC? - Alessandro Palmarini

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Alessandro Palmarini is a post-baccalaureate researcher at the Santa Fe Institute working under the supervision of Melanie Mitchell. He completed his undergraduate degree in Artificial Intelligence and Computer Science at the University of Edinburgh. Palmarini's current research focuses on developing AI systems that can efficiently acquire new skills from limited data, inspired by François Chollet's work on measuring intelligence. His work builds upon the DreamCoder program synthesis system, introducing a novel approach called "dream decompiling" to improve library learning in inductive program synthesis. Palmarini is particularly interested in addressing the Abstraction and Reasoning Corpus (ARC) challenge, aiming to create AI systems that can perform abstract reasoning tasks more efficiently than current approaches. His research explores the balance between computational efficiency and data efficiency in AI learning processes.

DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? MLST is sponsored by Tufa Labs: Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2. Interested? Apply for an ML research position: benjamin@tufa.ai

TOC:

1. Intelligence Measurement in AI Systems

[00:00:00] 1.1 Defining Intelligence in AI Systems

[00:02:00] 1.2 Research at Santa Fe Institute

[00:04:35] 1.3 Impact of Gaming on AI Development

[00:05:10] 1.4 Comparing AI and Human Learning Efficiency

2. Efficient Skill Acquisition in AI

[00:06:40] 2.1 Intelligence as Skill Acquisition Efficiency

[00:08:25] 2.2 Limitations of Current AI Systems in Generalization

[00:09:45] 2.3 Human vs. AI Cognitive Processes

[00:10:40] 2.4 Measuring AI Intelligence: Chollet's ARC Challenge

3. Program Synthesis and ARC Challenge

[00:12:55] 3.1 Philosophical Foundations of Program Synthesis

[00:17:14] 3.2 Introduction to Program Induction and ARC Tasks

[00:18:49] 3.3 DreamCoder: Principles and Techniques

[00:27:55] 3.4 Trade-offs in Program Synthesis Search Strategies

[00:31:52] 3.5 Neural Networks and Bayesian Program Learning

4. Advanced Program Synthesis Techniques

[00:32:30] 4.1 DreamCoder and Dream Decompiling Approach

[00:39:00] 4.2 Beta Distribution and Caching in Program Synthesis

[00:45:10] 4.3 Performance and Limitations of Dream Decompiling

[00:47:45] 4.4 Alessandro's Approach to ARC Challenge

[00:51:12] 4.5 Conclusion and Future Discussions

Refs:

Full reflist on YT VD, Show Notes and MP3 metadata

Show Notes: https://www.dropbox.com/scl/fi/x50201tgqucj5ba2q4typ/Ale.pdf?rlkey=0ubvk7p5gtyx1gpownpdadim8&st=5pniu3nq&dl=0

2024-10-19
Link to episode

It's Not About Scale, It's About Abstraction - Francois Chollet

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

François Chollet discusses the limitations of Large Language Models (LLMs) and proposes a new approach to advancing artificial intelligence. He argues that current AI systems excel at pattern recognition but struggle with logical reasoning and true generalization.

This was Chollet's keynote talk at AGI-24, filmed in high-quality. We will be releasing a full interview with him shortly. A teaser clip from that is played in the intro!

Chollet introduces the Abstraction and Reasoning Corpus (ARC) as a benchmark for measuring AI progress towards human-like intelligence. He explains the concept of abstraction in AI systems and proposes combining deep learning with program synthesis to overcome current limitations. Chollet suggests that breakthroughs in AI might come from outside major tech labs and encourages researchers to explore new ideas in the pursuit of artificial general intelligence.

TOC

1. LLM Limitations and Intelligence Concepts

[00:00:00] 1.1 LLM Limitations and Composition

[00:12:05] 1.2 Intelligence as Process vs. Skill

[00:17:15] 1.3 Generalization as Key to AI Progress

2. ARC-AGI Benchmark and LLM Performance

[00:19:59] 2.1 Introduction to ARC-AGI Benchmark

[00:20:05] 2.2 Introduction to ARC-AGI and the ARC Prize

[00:23:35] 2.3 Performance of LLMs and Humans on ARC-AGI

3. Abstraction in AI Systems

[00:26:10] 3.1 The Kaleidoscope Hypothesis and Abstraction Spectrum

[00:30:05] 3.2 LLM Capabilities and Limitations in Abstraction

[00:32:10] 3.3 Value-Centric vs Program-Centric Abstraction

[00:33:25] 3.4 Types of Abstraction in AI Systems

4. Advancing AI: Combining Deep Learning and Program Synthesis

[00:34:05] 4.1 Limitations of Transformers and Need for Program Synthesis

[00:36:45] 4.2 Combining Deep Learning and Program Synthesis

[00:39:59] 4.3 Applying Combined Approaches to ARC Tasks

[00:44:20] 4.4 State-of-the-Art Solutions for ARC

Shownotes (new!): https://www.dropbox.com/scl/fi/i7nsyoahuei6np95lbjxw/CholletKeynote.pdf?rlkey=t3502kbov5exsdxhderq70b9i&st=1ca91ewz&dl=0

[0:01:15] Abstraction and Reasoning Corpus (ARC): AI benchmark (François Chollet)

https://arxiv.org/abs/1911.01547

[0:05:30] Monty Hall problem: Probability puzzle (Steve Selvin)

https://www.tandfonline.com/doi/abs/10.1080/00031305.1975.10479121

[0:06:20] LLM training dynamics analysis (Tirumala et al.)

https://arxiv.org/abs/2205.10770

[0:10:20] Transformer limitations on compositionality (Dziri et al.)

https://arxiv.org/abs/2305.18654

[0:10:25] Reversal Curse in LLMs (Berglund et al.)

https://arxiv.org/abs/2309.12288

[0:19:25] Measure of intelligence using algorithmic information theory (François Chollet)

https://arxiv.org/abs/1911.01547

[0:20:10] ARC-AGI: GitHub repository (François Chollet)

https://github.com/fchollet/ARC-AGI

[0:22:15] ARC Prize: $1,000,000+ competition (François Chollet)

https://arcprize.org/

[0:33:30] System 1 and System 2 thinking (Daniel Kahneman)

https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555

[0:34:00] Core knowledge in infants (Elizabeth Spelke)

https://www.harvardlds.org/wp-content/uploads/2017/01/SpelkeKinzler07-1.pdf

[0:34:30] Embedding interpretive spaces in ML (Tennenholtz et al.)

https://arxiv.org/abs/2310.04475

[0:44:20] Hypothesis Search with LLMs for ARC (Wang et al.)

https://arxiv.org/abs/2309.05660

[0:44:50] Ryan Greenblatt's high score on ARC public leaderboard

https://arcprize.org/

2024-10-12
Link to episode

Bold AI Predictions From Cohere Co-founder

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Ivan Zhang, co-founder of Cohere, discusses the company's enterprise-focused AI solutions. He explains Cohere's early emphasis on embedding technology and training models for secure environments.

Zhang highlights their implementation of Retrieval-Augmented Generation in healthcare, significantly reducing doctor preparation time. He explores the shift from monolithic AI models to heterogeneous systems and the importance of improving various AI system components. Zhang shares insights on using synthetic data to teach models reasoning, the democratization of software development through AI, and how his gaming skills transfer to running an AI company.

He advises young developers to fully embrace AI technologies and offers perspectives on AI reliability, potential risks, and future model architectures.

https://cohere.com/

https://ivanzhang.ca/

https://x.com/1vnzh

TOC:

00:00:00 Intro

00:03:20 AI & Language Model Evolution

00:06:09 Future AI Apps & Development

00:09:29 Impact on Software Dev Practices

00:13:03 Philosophical & Societal Implications

00:16:30 Compute Efficiency & RAG

00:20:39 Adoption Challenges & Solutions

00:22:30 GPU Optimization & Kubernetes Limits

00:24:16 Cohere's Implementation Approach

00:28:13 Gaming's Professional Influence

00:34:45 Transformer Optimizations

00:36:45 Future Models & System-Level Focus

00:39:20 Inference-Time Computation & Reasoning

00:42:05 Capturing Human Thought in AI

00:43:15 Research, Hiring & Developer Advice

REFS:

00:02:31 Cohere, https://cohere.com/

00:02:40 The Transformer architecture, https://arxiv.org/abs/1706.03762

00:03:22 The Innovator's Dilemma, https://www.amazon.com/Innovators-Dilemma-Technologies-Management-Innovation/dp/1633691780

00:09:15 The actor model, https://en.wikipedia.org/wiki/Actor_model

00:14:35 John Searle's Chinese Room Argument, https://plato.stanford.edu/entries/chinese-room/

00:18:00 Retrieval-Augmented Generation, https://arxiv.org/abs/2005.11401

00:18:40 Retrieval-Augmented Generation, https://docs.cohere.com/v2/docs/retrieval-augmented-generation-rag

00:35:39 Let?s Verify Step by Step, https://arxiv.org/pdf/2305.20050

00:39:20 Adaptive Inference-Time Compute, https://arxiv.org/abs/2410.02725

00:43:20 Ryan Greenblatt ARC entry, https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

Disclaimer: This show is part of our Cohere partnership series

2024-10-10
Link to episode

Open-Ended AI: The Key to Superhuman Intelligence? - Prof. Tim Rocktäschel

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Prof. Tim Rocktäschel, AI researcher at UCL and Google DeepMind, talks about open-ended AI systems. These systems aim to keep learning and improving on their own, like evolution does in nature.

Ad: Are you a hardcore ML engineer who wants to work for Daniel Cahn at SlingshotAI building AI for mental health? Give him an email! - danielc@slingshot.xyz

TOC:

00:00:00 Introduction to Open-Ended AI and Key Concepts

00:01:37 Tim Rocktäschel's Background and Research Focus

00:06:25 Defining Open-Endedness in AI Systems

00:10:39 Subjective Nature of Interestingness and Learnability

00:16:22 Open-Endedness in Practice: Examples and Limitations

00:17:50 Assessing Novelty in Open-ended AI Systems

00:20:05 Adversarial Attacks and AI Robustness

00:24:05 Rainbow Teaming and LLM Safety

00:25:48 Open-ended Research Approaches in AI

00:29:05 Balancing Long-term Vision and Exploration in AI Research

00:37:25 LLMs in Program Synthesis and Open-Ended Learning

00:37:55 Transition from Human-Based to Novel AI Strategies

00:39:00 Expanding Context Windows and Prompt Evolution

00:40:17 AI Intelligibility and Human-AI Interfaces

00:46:04 Self-Improvement and Evolution in AI Systems

Show notes (New!) https://www.dropbox.com/scl/fi/5avpsyz8jbn4j1az7kevs/TimR.pdf?rlkey=pqjlcqbtm3undp4udtgfmie8n&st=x50u1d1m&dl=0

REFS:

00:01:47 - UCL DARK Lab (Rocktäschel) - AI research lab focusing on RL and open-ended learning - https://ucldark.com/

00:02:31 - GENIE (Bruce) - Generative interactive environment from unlabelled videos - https://arxiv.org/abs/2402.15391

00:02:42 - Promptbreeder (Fernando) - Self-referential LLM prompt evolution - https://arxiv.org/abs/2309.16797

00:03:05 - Picbreeder (Secretan) - Collaborative online image evolution - https://dl.acm.org/doi/10.1145/1357054.1357328

00:03:14 - Why Greatness Cannot Be Planned (Stanley) - Book on open-ended exploration - https://www.amazon.com/Why-Greatness-Cannot-Planned-Objective/dp/3319155237

00:04:36 - NetHack Learning Environment (Küttler) - RL research in procedurally generated game - https://arxiv.org/abs/2006.13760

00:07:35 - Open-ended learning (Clune) - AI systems for continual learning and adaptation - https://arxiv.org/abs/1905.10985

00:07:35 - OMNI (Zhang) - LLMs modeling human interestingness for exploration - https://arxiv.org/abs/2306.01711

00:10:42 - Observer theory (Wolfram) - Computationally bounded observers in complex systems - https://writings.stephenwolfram.com/2023/12/observer-theory/

00:15:25 - Human-Timescale Adaptation (Rocktäschel) - RL agent adapting to novel 3D tasks - https://arxiv.org/abs/2301.07608

00:16:15 - Open-Endedness for AGI (Hughes) - Importance of open-ended learning for AGI - https://arxiv.org/abs/2406.04268

00:16:35 - POET algorithm (Wang) - Open-ended approach to generate and solve challenges - https://arxiv.org/abs/1901.01753

00:17:20 - AlphaGo (Silver) - AI mastering the game of Go - https://deepmind.google/technologies/alphago/

00:20:35 - Adversarial Go attacks (Dennis) - Exploiting weaknesses in Go AI systems - https://www.ifaamas.org/Proceedings/aamas2024/pdfs/p1630.pdf

00:22:00 - Levels of AGI (Morris) - Framework for categorizing AGI progress - https://arxiv.org/abs/2311.02462

00:24:30 - Rainbow Teaming (Samvelyan) - LLM-based adversarial prompt generation - https://arxiv.org/abs/2402.16822

00:25:50 - Why Greatness Cannot Be Planned (Stanley) - 'False compass' and 'stepping stone collection' concepts - https://www.amazon.com/Why-Greatness-Cannot-Planned-Objective/dp/3319155237

00:27:45 - AI Debate (Khan) - Improving LLM truthfulness through debate - https://proceedings.mlr.press/v235/khan24a.html

00:29:40 - Gemini (Google DeepMind) - Advanced multimodal AI model - https://deepmind.google/technologies/gemini/

00:30:15 - How to Take Smart Notes (Ahrens) - Effective note-taking methodology - https://www.amazon.com/How-Take-Smart-Notes-Nonfiction/dp/1542866502

(truncated, see shownotes)

2024-10-05
Link to episode

Ben Goertzel on "Superintelligence"

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Ben Goertzel discusses AGI development, transhumanism, and the potential societal impacts of superintelligent AI. He predicts human-level AGI by 2029 and argues that the transition to superintelligence could happen within a few years after. Goertzel explores the challenges of AI regulation, the limitations of current language models, and the need for neuro-symbolic approaches in AGI research. He also addresses concerns about resource allocation and cultural perspectives on transhumanism.

TOC:

[00:00:00] AGI Timeline Predictions and Development Speed

[00:00:45] Limitations of Language Models in AGI Development

[00:02:18] Current State and Trends in AI Research and Development

[00:09:02] Emergent Reasoning Capabilities and Limitations of LLMs

[00:18:15] Neuro-Symbolic Approaches and the Future of AI Systems

[00:20:00] Evolutionary Algorithms and LLMs in Creative Tasks

[00:21:25] Symbolic vs. Sub-Symbolic Approaches in AI

[00:28:05] Language as Internal Thought and External Communication

[00:30:20] AGI Development and Goal-Directed Behavior

[00:35:51] Consciousness and AI: Expanding States of Experience

[00:48:50] AI Regulation: Challenges and Approaches

[00:55:35] Challenges in AI Regulation

[00:59:20] AI Alignment and Ethical Considerations

[01:09:15] AGI Development Timeline Predictions

[01:12:40] OpenCog Hyperon and AGI Progress

[01:17:48] Transhumanism and Resource Allocation Debate

[01:20:12] Cultural Perspectives on Transhumanism

[01:23:54] AGI and Post-Scarcity Society

[01:31:35] Challenges and Implications of AGI Development

New! PDF Show notes: https://www.dropbox.com/scl/fi/fyetzwgoaf70gpovyfc4x/BenGoertzel.pdf?rlkey=pze5dt9vgf01tf2wip32p5hk5&st=svbcofm3&dl=0

Refs:

00:00:15 Ray Kurzweil's AGI timeline prediction, Ray Kurzweil, https://en.wikipedia.org/wiki/Technological_singularity

00:01:45 Ben Goertzel: SingularityNET founder, Ben Goertzel, https://singularitynet.io/

00:02:35 AGI Conference series, AGI Conference Organizers, https://agi-conf.org/2024/

00:03:55 Ben Goertzel's contributions to AGI, Wikipedia contributors, https://en.wikipedia.org/wiki/Ben_Goertzel

00:11:05 Chain-of-Thought prompting, Subbarao Kambhampati, https://arxiv.org/abs/2405.04776

00:11:35 Algorithmic information content, Pieter Adriaans, https://plato.stanford.edu/entries/information-entropy/

00:12:10 Turing completeness in neural networks, Various contributors, https://plato.stanford.edu/entries/turing-machine/

00:16:15 AlphaGeometry: AI for geometry problems, Trieu, Li, et al., https://www.nature.com/articles/s41586-023-06747-5

00:18:25 Shane Legg and Ben Goertzel's collaboration, Shane Legg, https://en.wikipedia.org/wiki/Shane_Legg

00:20:00 Evolutionary algorithms in music generation, Yanxu Chen, https://arxiv.org/html/2409.03715v1

00:22:00 Peirce's theory of semiotics, Charles Sanders Peirce, https://plato.stanford.edu/entries/peirce-semiotics/

00:28:10 Chomsky's view on language, Noam Chomsky, https://chomsky.info/1983____/

00:34:05 Greg Egan's 'Diaspora', Greg Egan, https://www.amazon.co.uk/Diaspora-post-apocalyptic-thriller-perfect-MIRROR/dp/0575082097

00:40:35 'The Consciousness Explosion', Ben Goertzel & Gabriel Axel Montes, https://www.amazon.com/Consciousness-Explosion-Technological-Experiential-Singularity/dp/B0D8C7QYZD

00:41:55 Ray Kurzweil's books on singularity, Ray Kurzweil, https://www.amazon.com/Singularity-Near-Humans-Transcend-Biology/dp/0143037889

00:50:50 California AI regulation bills, California State Senate, https://sd18.senate.ca.gov/news/senate-unanimously-approves-senator-padillas-artificial-intelligence-package

00:56:40 Limitations of Compute Thresholds, Sara Hooker, https://arxiv.org/abs/2407.05694

00:56:55 'Taming Silicon Valley', Gary F. Marcus, https://www.penguinrandomhouse.com/books/768076/taming-silicon-valley-by-gary-f-marcus/

01:09:15 Kurzweil's AGI prediction update, Ray Kurzweil, https://www.theguardian.com/technology/article/2024/jun/29/ray-kurzweil-google-ai-the-singularity-is-nearer

2024-10-02
Link to episode

Taming Silicon Valley - Prof. Gary Marcus

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

AI expert Prof. Gary Marcus doesn't mince words about today's artificial intelligence. He argues that despite the buzz, chatbots like ChatGPT aren't as smart as they seem and could cause real problems if we're not careful.

Marcus is worried about tech companies putting profits before people. He thinks AI could make fake news and privacy issues even worse. He's also concerned that a few big tech companies have too much power. Looking ahead, Marcus believes the AI hype will die down as reality sets in. He wants to see AI developed in smarter, more responsible ways. His message to the public? We need to speak up and demand better AI before it's too late.

Buy Taming Silicon Valley:

https://amzn.to/3XTlC5s

Gary Marcus:

https://garymarcus.substack.com/

https://x.com/GaryMarcus

Interviewer:

Dr. Tim Scarfe

(Refs in top comment)

TOC

[00:00:00] AI Flaws, Improvements & Industry Critique

[00:16:29] AI Safety Theater & Image Generation Issues

[00:23:49] AI's Lack of World Models & Human-like Understanding

[00:31:09] LLMs: Superficial Intelligence vs. True Reasoning

[00:34:45] AI in Specialized Domains: Chess, Coding & Limitations

[00:42:10] AI-Generated Code: Capabilities & Human-AI Interaction

[00:48:10] AI Regulation: Industry Resistance & Oversight Challenges

[00:54:55] Copyright Issues in AI & Tech Business Models

[00:57:26] AI's Societal Impact: Risks, Misinformation & Ethics

[01:23:14] AI X-risk, Alignment & Moral Principles Implementation

[01:37:10] Persistent AI Flaws: System Limitations & Architecture Challenges

[01:44:33] AI Future: Surveillance Concerns, Economic Challenges & Neuro-Symbolic AI

YT version with refs: https://youtu.be/o9MfuUoGlSw

2024-09-24
Link to episode

Prof. Mark Solms - The Hidden Spring

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Prof. Mark Solms, a neuroscientist and psychoanalyst, discusses his groundbreaking work on consciousness, challenging conventional cortex-centric views and emphasizing the role of brainstem structures in generating consciousness and affect.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

Key points discussed:

The limitations of vision-centric approaches to consciousness studies.

Evidence from decorticated animals and hydranencephalic children supporting the brainstem's role in consciousness.

The relationship between homeostasis, the free energy principle, and consciousness.

Critiques of behaviorism and modern theories of consciousness.

The importance of subjective experience in understanding brain function.

The discussion also explored broader topics:

The potential impact of affect-based theories on AI development.

The role of the SEEKING system in exploration and learning.

Connections between neuroscience, psychoanalysis, and philosophy of mind.

Challenges in studying consciousness and the limitations of current theories.

Mark Solms:

https://neuroscience.uct.ac.za/contacts/mark-solms

Show notes and transcript: https://www.dropbox.com/scl/fo/roipwmnlfmwk2e7kivzms/ACjZF-VIGC2-Suo30KcwVV0?rlkey=53y8v2cajfcgrf17p1h7v3suz&st=z8vu81hn&dl=0

TOC (*) are best bits

00:00:00 1. Intro: Challenging vision-centric approaches to consciousness *

00:02:20 2. Evidence from decorticated animals and hydranencephalic children *

00:07:40 3. Emotional responses in hydranencephalic children

00:10:40 4. Brainstem stimulation and affective states

00:15:00 5. Brainstem's role in generating affective consciousness *

00:21:50 6. Dual-aspect monism and the mind-brain relationship

00:29:37 7. Information, affect, and the hard problem of consciousness *

00:37:25 8. Wheeler's participatory universe and Chalmers' theories

00:48:51 9. Homeostasis, free energy principle, and consciousness *

00:59:25 10. Affect, voluntary behavior, and decision-making

01:05:45 11. Psychoactive substances, REM sleep, and consciousness research

01:12:14 12. Critiquing behaviorism and modern consciousness theories *

01:24:25 13. The SEEKING system and exploration in neuroscience

Refs:

1. Mark Solms' book "The Hidden Spring" [00:20:34] (MUST READ!)

https://amzn.to/3XyETb3

2. Karl Friston's free energy principle [00:03:50]

https://www.nature.com/articles/nrn2787

3. Hydranencephaly condition [00:07:10]

https://en.wikipedia.org/wiki/Hydranencephaly

4. Periaqueductal gray (PAG) [00:08:57]

https://en.wikipedia.org/wiki/Periaqueductal_gray

5. Positron Emission Tomography (PET) [00:13:52]

https://en.wikipedia.org/wiki/Positron_emission_tomography

6. Paul MacLean's triune brain theory [00:03:30]

https://en.wikipedia.org/wiki/Triune_brain

7. Baruch Spinoza's philosophy of mind [00:23:48]

https://plato.stanford.edu/entries/spinoza-epistemology-mind

8. Claude Shannon's "A Mathematical Theory of Communication" [00:32:15]

https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf

9. Francis Crick's "The Astonishing Hypothesis" [00:39:57]

https://en.wikipedia.org/wiki/The_Astonishing_Hypothesis

10. Frank Jackson's Knowledge Argument [00:40:54]

https://plato.stanford.edu/entries/qualia-knowledge/

11. Mesolimbic dopamine system [01:11:51]

https://en.wikipedia.org/wiki/Mesolimbic_pathway

12. Jaak Panksepp's SEEKING system [01:25:23]

https://en.wikipedia.org/wiki/Jaak_Panksepp#Affective_neuroscience

2024-09-18
Link to episode

Patrick Lewis (Cohere) - Retrieval Augmented Generation

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Dr. Patrick Lewis, who coined the term RAG (Retrieval Augmented Generation) and now works at Cohere, discusses the evolution of language models, RAG systems, and challenges in AI evaluation.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmented generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

Key topics covered:

- Origins and evolution of Retrieval Augmented Generation (RAG)

- Challenges in evaluating RAG systems and language models

- Human-AI collaboration in research and knowledge work

- Word embeddings and the progression to modern language models

- Dense vs sparse retrieval methods in information retrieval

The discussion also explored broader implications and applications:

- Balancing faithfulness and fluency in RAG systems

- User interface design for AI-augmented research tools

- The journey from chemistry to AI research

- Challenges in enterprise search compared to web search

- The importance of data quality in training AI models

Patrick Lewis: https://www.patricklewis.io/

Cohere Command Models, check them out - they are amazing for RAG!

https://cohere.com/command

TOC

00:00:00 1. Intro to RAG

00:05:30 2. RAG Evaluation: Poll framework & model performance

00:12:55 3. Data Quality: Cleanliness vs scale in AI training

00:15:13 4. Human-AI Collaboration: Research agents & UI design

00:22:57 5. RAG Origins: Open-domain QA to generative models

00:30:18 6. RAG Challenges: Info retrieval, tool use, faithfulness

00:42:01 7. Dense vs Sparse Retrieval: Techniques & trade-offs

00:47:02 8. RAG Applications: Grounding, attribution, hallucination prevention

00:54:04 9. UI for RAG: Human-computer interaction & model optimization

00:59:01 10. Word Embeddings: Word2Vec, GloVe, and semantic spaces

01:06:43 11. Language Model Evolution: BERT, GPT, and beyond

01:11:38 12. AI & Human Cognition: Sequential processing & chain-of-thought

Refs:

1. Retrieval Augmented Generation (RAG) paper / Patrick Lewis et al. [00:27:45]

https://arxiv.org/abs/2005.11401

2. LAMA (LAnguage Model Analysis) probe / Petroni et al. [00:26:35]

https://arxiv.org/abs/1909.01066

3. KILT (Knowledge Intensive Language Tasks) benchmark / Petroni et al. [00:27:05]

https://arxiv.org/abs/2009.02252

4. Word2Vec algorithm / Tomas Mikolov et al. [01:00:25]

https://arxiv.org/abs/1301.3781

5. GloVe (Global Vectors for Word Representation) / Pennington et al. [01:04:35]

https://nlp.stanford.edu/projects/glove/

6. BERT (Bidirectional Encoder Representations from Transformers) / Devlin et al. [01:08:00]

https://arxiv.org/abs/1810.04805

7. 'The Language Game' book / Nick Chater and Morten H. Christiansen [01:11:40]

https://amzn.to/4grEUpG

Disclaimer: This is the sixth video from our Cohere partnership. We were not told what to say in the interview. Filmed in Seattle in June 2024.

2024-09-16
Link to episode

Ashley Edwards - Genie Paper (DeepMind/Runway)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Ashley Edwards, who was working at DeepMind when she co-authored the Genie paper and is now at Runway, covered several key aspects of the Genie AI system and its applications in video generation, robotics, and game creation.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

Genie's approach to learning interactive environments, balancing compression and fidelity.

The use of latent action models and VQE models for video processing and tokenization.

Challenges in maintaining action consistency across frames and integrating text-to-image models.

Evaluation metrics for AI-generated content, such as FID and PS&R diff metrics.

The discussion also explored broader implications and applications:

The potential impact of AI video generation on content creation jobs.

Applications of Genie in game generation and robotics.

The use of foundation models in robotics and the differences between internet video data and specialized robotics data.

Challenges in mapping AI-generated actions to real-world robotic actions.

Ashley Edwards: https://ashedwards.github.io/

TOC (*) are best bits

00:00:00 1. Intro to Genie & Brave Search API: Trade-offs & limitations *

00:02:26 2. Genie's Architecture: Latent action, VQE, video processing *

00:05:06 3. Genie's Constraints: Frame consistency & image model integration

00:07:26 4. Evaluation: FID, PS&R diff metrics & latent induction methods

00:09:44 5. AI Video Gen: Content creation impact, depth & parallax effects

00:11:39 6. Model Scaling: Training data impact & computational trade-offs

00:13:50 7. Game & Robotics Apps: Gamification & action mapping challenges *

00:16:16 8. Robotics Foundation Models: Action space & data considerations *

00:19:18 9. Mask-GPT & Video Frames: Real-time optimization, RL from videos

00:20:34 10. Research Challenges: AI value, efficiency vs. quality, safety

00:24:20 11. Future Dev: Efficiency improvements & fine-tuning strategies

Refs:

1. Genie (learning interactive environments from videos) / Ashley and DM collegues [00:01]

https://arxiv.org/abs/2402.15391

2. VQ-VAE (Vector Quantized Variational Autoencoder) / Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu [02:43]

https://arxiv.org/abs/1711.00937

3. FID (Fréchet Inception Distance) metric / Martin Heusel et al. [07:37]

https://arxiv.org/abs/1706.08500

4. PS&R (Precision and Recall) metric / Mehdi S. M. Sajjadi et al. [08:02]

https://arxiv.org/abs/1806.00035

5. Vision Transformer (ViT) architecture / Alexey Dosovitskiy et al. [12:14]

https://arxiv.org/abs/2010.11929

6. Genie (robotics foundation models) / Google DeepMind [17:34]

https://deepmind.google/research/publications/60474/

7. Chelsea Finn's lab work on robotics datasets / Chelsea Finn [17:38]

https://ai.stanford.edu/~cbfinn/

8. Imitation from observation in reinforcement learning / YuXuan Liu [20:58]

https://arxiv.org/abs/1707.03374

9. Waymo's autonomous driving technology / Waymo [22:38]

https://waymo.com/

10. Gen3 model release by Runway / Runway [23:48]

https://runwayml.com/

11. Classifier-free guidance technique / Jonathan Ho and Tim Salimans [24:43]

https://arxiv.org/abs/2207.12598

2024-09-13
Link to episode

Cohere's SVP Technology - Saurabh Baji

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Saurabh Baji discusses Cohere's approach to developing and deploying large language models (LLMs) for enterprise use.

* Cohere focuses on pragmatic, efficient models tailored for business applications rather than pursuing the largest possible models.

* They offer flexible deployment options, from cloud services to on-premises installations, to meet diverse enterprise needs.

* Retrieval-augmented generation (RAG) is highlighted as a critical capability, allowing models to leverage enterprise data securely.

* Cohere emphasizes model customization, fine-tuning, and tools like reranking to optimize performance for specific use cases.

* The company has seen significant growth, transitioning from developer-focused to enterprise-oriented services.

* Major customers like Oracle, Fujitsu, and TD Bank are using Cohere's models across various applications, from HR to finance.

* Baji predicts a surge in enterprise AI adoption over the next 12-18 months as more companies move from experimentation to production.

* He emphasizes the importance of trust, security, and verifiability in enterprise AI applications.

The interview provides insights into Cohere's strategy, technology, and vision for the future of enterprise AI adoption.

https://www.linkedin.com/in/saurabhbaji/

https://x.com/sbaji

https://cohere.com/

https://cohere.com/business

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

TOC (*) are best bits

00:00:00 1. Introduction and Background

00:04:24 2. Cloud Infrastructure and LLM Optimization

00:06:43 2.1 Model deployment and fine-tuning strategies *

00:09:37 3. Enterprise AI Deployment Strategies

00:11:10 3.1 Retrieval-augmented generation in enterprise environments *

00:13:40 3.2 Standardization vs. customization in cloud services *

00:18:20 4. AI Model Evaluation and Deployment

00:18:20 4.1 Comprehensive evaluation frameworks *

00:21:20 4.2 Key components of AI model stacks *

00:25:50 5. Retrieval Augmented Generation (RAG) in Enterprise

00:32:10 5.1 Pragmatic approach to RAG implementation *

00:33:45 6. AI Agents and Tool Integration

00:33:45 6.1 Leveraging tools for AI insights *

00:35:30 6.2 Agent-based AI systems and diagnostics *

00:42:55 7. AI Transparency and Reasoning Capabilities

00:49:10 8. AI Model Training and Customization

00:57:10 9. Enterprise AI Model Management

01:02:10 9.1 Managing AI model versions for enterprise customers *

01:04:30 9.2 Future of language model programming *

01:06:10 10. AI-Driven Software Development

01:06:10 10.1 AI bridging human expression and task achievement *

01:08:00 10.2 AI-driven virtual app fabrics in enterprise *

01:13:33 11. Future of AI and Enterprise Applications

01:21:55 12. Cohere's Customers and Use Cases

01:21:55 12.1 Cohere's growth and enterprise partnerships *

01:27:14 12.2 Diverse customers using generative AI *

01:27:50 12.3 Industry adaptation to generative AI *

01:29:00 13. Technical Advantages of Cohere Models

01:29:00 13.1 Handling large context windows *

01:29:40 13.2 Low latency impact on developer productivity *

Disclaimer: This is the fifth video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview. Filmed in Seattle in Aug 2024.

2024-09-12
Link to episode

David Hanson's Vision for Sentient Robots

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

David Hanson, CEO of Hanson Robotics and creator of the humanoid robot Sofia, explores the intersection of artificial intelligence, ethics, and human potential. In this thought-provoking interview, Hanson discusses his vision for developing AI systems that embody the best aspects of humanity while pushing beyond our current limitations, aiming to achieve what he calls "super wisdom."

YT version: https://youtu.be/LFCIEhlsozU

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

The interview with David Hanson covers:

The importance of incorporating biological drives and compassion into AI systems

Hanson's concept of "existential pattern ethics" as a basis for AI morality

The potential for AI to enhance human intelligence and wisdom

Challenges in developing artificial general intelligence (AGI)

The need to democratize AI technologies globally

Potential future advancements in human-AI integration and their societal impacts

Concerns about technological augmentation exacerbating inequality

The role of ethics in guiding AI development and deployment

Hanson advocates for creating AI systems that embody the best aspects of humanity while surpassing current human limitations, aiming for "super wisdom" rather than just artificial super intelligence.

David Hanson:

https://www.hansonrobotics.com/david-hanson/

https://www.youtube.com/watch?v=9u1O954cMmE

TOC

1. Introduction and Background [00:00:00]

1.1. David Hanson's interdisciplinary background [0:01:49]

1.2. Introduction to Sofia, the realistic robot [0:03:27]

2. Human Cognition and AI [0:03:50]

2.1. Importance of social interaction in cognition [0:03:50]

2.2. Compassion as distinguishing factor [0:05:55]

2.3. AI augmenting human intelligence [0:09:54]

3. Developing Human-like AI [0:13:17]

3.1. Incorporating biological drives in AI [0:13:17]

3.2. Creating AI with agency [0:20:34]

3.3. Implementing flexible desires in AI [0:23:23]

4. Ethics and Morality in AI [0:27:53]

4.1. Enhancing humanity through AI [0:27:53]

4.2. Existential pattern ethics [0:30:14]

4.3. Expanding morality beyond restrictions [0:35:35]

5. Societal Impact of AI [0:38:07]

5.1. AI adoption and integration [0:38:07]

5.2. Democratizing AI technologies [0:38:32]

5.3. Human-AI integration and identity [0:43:37]

6. Future Considerations [0:50:03]

6.1. Technological augmentation and inequality [0:50:03]

6.2. Emerging technologies for mental health [0:50:32]

6.3. Corporate ethics in AI development [0:52:26]

This was filmed at AGI-24

2024-09-10
Link to episode

The Fabric of Knowledge - David Spivak

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

David Spivak, a mathematician known for his work in category theory, discusses a wide range of topics related to intelligence, creativity, and the nature of knowledge. He explains category theory in simple terms and explores how it relates to understanding complex systems and relationships.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

We discuss abstract concepts like collective intelligence, the importance of embodiment in understanding the world, and how we acquire and process knowledge. Spivak shares his thoughts on creativity, discussing where it comes from and how it might be modeled mathematically.

A significant portion of the discussion focuses on the impact of artificial intelligence on human thinking and its potential role in the evolution of intelligence. Spivak also touches on the importance of language, particularly written language, in transmitting knowledge and shaping our understanding of the world.

David Spivak

http://www.dspivak.net/

TOC:

00:00:00 Introduction to category theory and functors

00:04:40 Collective intelligence and sense-making

00:09:54 Embodiment and physical concepts in knowledge acquisition

00:16:23 Creativity, open-endedness, and AI's impact on thinking

00:25:46 Modeling creativity and the evolution of intelligence

00:36:04 Evolution, optimization, and the significance of AI

00:44:14 Written language and its impact on knowledge transmission

REFS:

Mike Levin's work

https://scholar.google.com/citations?user=luouyakAAAAJ&hl=en

Eric Smith's videos on complexity and early life

https://www.youtube.com/watch?v=SpJZw-68QyE

Richard Dawkins' book "The Selfish Gene"

https://amzn.to/3X73X8w

Carl Sagan's statement about the cosmos knowing itself

https://amzn.to/3XhPruK

Herbert Simon's concept of "satisficing"

https://plato.stanford.edu/entries/bounded-rationality/

DeepMind paper on open-ended systems

https://arxiv.org/abs/2406.04268

Karl Friston's work on active inference

https://direct.mit.edu/books/oa-monograph/5299/Active-InferenceThe-Free-Energy-Principle-in-Mind

MIT category theory lectures by David Spivak (available on the Topos Institute channel)

https://www.youtube.com/watch?v=UusLtx9fIjs

2024-09-05
Link to episode

Jürgen Schmidhuber - Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Jürgen Schmidhuber, the father of generative AI shares his groundbreaking work in deep learning and artificial intelligence. In this exclusive interview, he discusses the history of AI, some of his contributions to the field, and his vision for the future of intelligent machines. Schmidhuber offers unique insights into the exponential growth of technology and the potential impact of AI on humanity and the universe.

YT version: https://youtu.be/DP454c1K_vQ

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

TOC

00:00:00 Intro

00:03:38 Reasoning

00:13:09 Potential AI Breakthroughs Reducing Computation Needs

00:20:39 Memorization vs. Generalization in AI

00:25:19 Approach to the ARC Challenge

00:29:10 Perceptions of Chat GPT and AGI

00:58:45 Abstract Principles of Jurgen's Approach

01:04:17 Analogical Reasoning and Compression

01:05:48 Breakthroughs in 1991: the P, the G, and the T in ChatGPT and Generative AI

01:15:50 Use of LSTM in Language Models by Tech Giants

01:21:08 Neural Network Aspect Ratio Theory

01:26:53 Reinforcement Learning Without Explicit Teachers

Refs:

? "Annotated History of Modern AI and Deep Learning" (2022 survey by Schmidhuber):

? Chain Rule For Backward Credit Assignment (Leibniz, 1676)

? First Neural Net / Linear Regression / Shallow Learning (Gauss & Legendre, circa 1800)

? First 20th Century Pioneer of Practical AI (Quevedo, 1914)

? First Recurrent NN (RNN) Architecture (Lenz, Ising, 1920-1925)

? AI Theory: Fundamental Limitations of Computation and Computation-Based AI (Gödel, 1931-34)

? Unpublished ideas about evolving RNNs (Turing, 1948)

? Multilayer Feedforward NN Without Deep Learning (Rosenblatt, 1958)

? First Published Learning RNNs (Amari and others, ~1972)

? First Deep Learning (Ivakhnenko & Lapa, 1965)

? Deep Learning by Stochastic Gradient Descent (Amari, 1967-68)

? ReLUs (Fukushima, 1969)

? Backpropagation (Linnainmaa, 1970); precursor (Kelley, 1960)

? Backpropagation for NNs (Werbos, 1982)

? First Deep Convolutional NN (Fukushima, 1979); later combined with Backprop (Waibel 1987, Zhang 1988).

? Metalearning or Learning to Learn (Schmidhuber, 1987)

? Generative Adversarial Networks / Artificial Curiosity / NN Online Planners (Schmidhuber, Feb 1990; see the G in Generative AI and ChatGPT)

? NNs Learn to Generate Subgoals and Work on Command (Schmidhuber, April 1990)

? NNs Learn to Program NNs: Unnormalized Linear Transformer (Schmidhuber, March 1991; see the T in ChatGPT)

? Deep Learning by Self-Supervised Pre-Training. Distilling NNs (Schmidhuber, April 1991; see the P in ChatGPT)

? Experiments with Pre-Training; Analysis of Vanishing/Exploding Gradients, Roots of Long Short-Term Memory / Highway Nets / ResNets (Hochreiter, June 1991, further developed 1999-2015 with other students of Schmidhuber)

? LSTM journal paper (1997, most cited AI paper of the 20th century)

? xLSTM (Hochreiter, 2024)

? Reinforcement Learning Prompt Engineer for Abstract Reasoning and Planning (Schmidhuber 2015)

? Mindstorms in Natural Language-Based Societies of Mind (2023 paper by Schmidhuber's team)

https://arxiv.org/abs/2305.17066

? Bremermann's physical limit of computation (1982)

EXTERNAL LINKS

CogX 2018 - Professor Juergen Schmidhuber

https://www.youtube.com/watch?v=17shdT9-wuA

Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability (Neural Networks, 1997)

https://sferics.idsia.ch/pub/juergen/loconet.pdf

The paradox at the heart of mathematics: Gödel's Incompleteness Theorem - Marcus du Sautoy

https://www.youtube.com/watch?v=I4pQbo5MQOs

(Refs truncated, full version on YT VD)

2024-08-28
Link to episode

"AI should NOT be regulated at all!" - Prof. Pedro Domingos

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Professor Pedro Domingos, is an AI researcher and professor of computer science. He expresses skepticism about current AI regulation efforts and argues for faster AI development rather than slowing it down. He also discusses the need for new innovations to fulfil the promises of current AI techniques.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmented generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

Show notes:

* Domingos' views on AI regulation and why he believes it's misguided

* His thoughts on the current state of AI technology and its limitations

* Discussion of his novel "2040", a satirical take on AI and tech culture

* Explanation of his work on "tensor logic", which aims to unify neural networks and symbolic AI

* Critiques of other approaches in AI, including those of OpenAI and Gary Marcus

* Thoughts on the AI "bubble" and potential future developments in the field

Prof. Pedro Domingos:

https://x.com/pmddomingos

2040: A Silicon Valley Satire [Pedro's new book]

https://amzn.to/3T51ISd

TOC:

00:00:00 Intro

00:06:31 Bio

00:08:40 Filmmaking skit

00:10:35 AI and the wisdom of crowds

00:19:49 Social Media

00:27:48 Master algorithm

00:30:48 Neurosymbolic AI / abstraction

00:39:01 Language

00:45:38 Chomsky

01:00:49 2040 Book

01:18:03 Satire as a shield for criticism?

01:29:12 AI Regulation

01:35:15 Gary Marcus

01:52:37 Copyright

01:56:11 Stochastic parrots come home to roost

02:00:03 Privacy

02:01:55 LLM ecosystem

02:05:06 Tensor logic

Refs:

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World [Pedro Domingos]

https://amzn.to/3MiWs9B

Rebooting AI: Building Artificial Intelligence We Can Trust [Gary Marcus]

https://amzn.to/3AAywvL

Flash Boys [Michael Lewis]

https://amzn.to/4dUGm1M

2024-08-25
Link to episode

Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Andrew Ilyas, a PhD student at MIT who is about to start as a professor at CMU. We discuss Data modeling and understanding how datasets influence model predictions, Adversarial examples in machine learning and why they occur, Robustness in machine learning models, Black box attacks on machine learning systems, Biases in data collection and dataset creation, particularly in ImageNet and Self-selection bias in data and methods to address it.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api

Andrew's site:

https://andrewilyas.com/

https://x.com/andrew_ilyas

TOC:

00:00:00 - Introduction and Andrew's background

00:03:52 - Overview of the machine learning pipeline

00:06:31 - Data modeling paper discussion

00:26:28 - TRAK: Evolution of data modeling work

00:43:58 - Discussion on abstraction, reasoning, and neural networks

00:53:16 - "Adversarial Examples Are Not Bugs, They Are Features" paper

01:03:24 - Types of features learned by neural networks

01:10:51 - Black box attacks paper

01:15:39 - Work on data collection and bias

01:25:48 - Future research plans and closing thoughts

References:

Adversarial Examples Are Not Bugs, They Are Features

https://arxiv.org/pdf/1905.02175

TRAK: Attributing Model Behavior at Scale

https://arxiv.org/pdf/2303.14186

Datamodels: Predicting Predictions from Training Data

https://arxiv.org/pdf/2202.00622

Adversarial Examples Are Not Bugs, They Are Features

https://arxiv.org/pdf/1905.02175

IMAGENET-TRAINED CNNS

https://arxiv.org/pdf/1811.12231

ZOO: Zeroth Order Optimization Based Black-box

https://arxiv.org/pdf/1708.03999

A Spline Theory of Deep Networks

https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf

Scaling Monosemanticity

https://transformer-circuits.pub/2024/scaling-monosemanticity/

Adversarial Examples Are Not Bugs, They Are Features

https://gradientscience.org/adv/

Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies

https://proceedings.mlr.press/v235/bartoldson24a.html

Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors

https://arxiv.org/abs/1807.07978

Estimation of Standard Auction Models

https://arxiv.org/abs/2205.02060

From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

https://arxiv.org/abs/2005.11295

Estimation of Standard Auction Models

https://arxiv.org/abs/2205.02060

What Makes A Good Fisherman? Linear Regression under Self-Selection Bias

https://arxiv.org/abs/2205.03246

Towards Tracing Factual Knowledge in Language Models Back to the

Training Data [Akyürek]

https://arxiv.org/pdf/2205.11482

2024-08-22
Link to episode

Joscha Bach - AGI24 Keynote (Cyberanimism)

Audio Player

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Dr. Joscha Bach introduces a surprising idea called "cyber animism" in his AGI-24 talk - the notion that nature might be full of self-organizing software agents, similar to the spirits in ancient belief systems. Bach suggests that consciousness could be a kind of software running on our brains, and wonders if similar "programs" might exist in plants or even entire ecosystems.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

Joscha takes us on a tour de force through history, philosophy, and cutting-edge computer science, teasing us to rethink what we know about minds, machines, and the world around us. Joscha believes we should blur the lines between human, artificial, and natural intelligence, and argues that consciousness might be more widespread and interconnected than we ever thought possible.

Dr. Joscha Bach

https://x.com/Plinz

This is video 2/9 from our coverage of AGI-24 in Seattle https://agi-conf.org/2024/

Watch the official MLST interview with Joscha which we did right after this talk on our Patreon now on early access - https://www.patreon.com/posts/joscha-bach-110199676 (you also get access to our private discord and biweekly calls)

TOC:

00:00:00 Introduction: AGI and Cyberanimism

00:03:57 The Nature of Consciousness

00:08:46 Aristotle's Concepts of Mind and Consciousness

00:13:23 The Hard Problem of Consciousness

00:16:17 Functional Definition of Consciousness

00:20:24 Comparing LLMs and Human Consciousness

00:26:52 Testing for Consciousness in AI Systems

00:30:00 Animism and Software Agents in Nature

00:37:02 Plant Consciousness and Ecosystem Intelligence

00:40:36 The California Institute for Machine Consciousness

00:44:52 Ethics of Conscious AI and Suffering

00:46:29 Philosophical Perspectives on Consciousness

00:49:55 Q&A: Formalisms for Conscious Systems

00:53:27 Coherence, Self-Organization, and Compute Resources

YT version (very high quality, filmed by us live)

https://youtu.be/34VOI_oo-qM

Refs:

Aristotle's work on the soul and consciousness

Richard Dawkins' work on genes and evolution

Gerald Edelman's concept of Neural Darwinism

Thomas Metzinger's book "Being No One"

Yoshua Bengio's concept of the "consciousness prior"

Stuart Hameroff's theories on microtubules and consciousness

Christof Koch's work on consciousness

Daniel Dennett's "Cartesian Theater" concept

Giulio Tononi's Integrated Information Theory

Mike Levin's work on organismal intelligence

The concept of animism in various cultures

Freud's model of the mind

Buddhist perspectives on consciousness and meditation

The Genesis creation narrative (for its metaphorical interpretation)

California Institute for Machine Consciousness

2024-08-21
Link to episode

Subscribe

Website

Episodes

Eiso Kant (CTO poolside) - Superhuman Coding Is Coming!

The Compendium - Connor Leahy and Gabriel Alfour

ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)

Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)

GSMSymbolic paper - Iman Mirzadeh (Apple)

Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere)

Tau Language: The Software Synthesis Future (sponsored)

John Palazza - Vice President of Global Sales @ CentML ( sponsored)

Transformers Need Glasses! - Federico Barbero

Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu

Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning?

Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?

Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

Sepp Hochreiter - LSTM: The Comeback Story?

Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

Nicholas Carlini (Google DeepMind)

Subbarao Kambhampati - Do o1 models search?

How Do AI Models Actually Think? - Laura Ruis

Jurgen Schmidhuber on Humans co-existing with AIs

Yoshua Bengio - Designing out Agency for Safe AI

Francois Chollet - ARC reflections - NeurIPS 2024

Jeff Clune - Agent AI Needs Darwin

Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)

Jonas Hübotter (ETH) - Test Time Inference

How AI Could Be A Mathematician's Co-Pilot by 2026 (Prof. Swarat Chaudhuri)

Nora Belrose - AI Development, Safety, and Meaning

Why Your GPUs are underutilised for AI - CentML CEO Explains

Eliezer Yudkowsky and Stephen Wolfram on AI X-risk

Pattern Recognition vs True Intelligence - Francois Chollet

The Elegant Math Behind Machine Learning - Anil Ananthaswamy

Michael Levin - Why Intelligence Isn't Limited To Brains.

Speechmatics CTO - Next-Generation Speech Recognition

Dr. Sanjeev Namjoshi - Active Inference

Joscha Bach - Why Your Thoughts Aren't Yours.

Decompiling Dreams: A New Approach to ARC? - Alessandro Palmarini

It's Not About Scale, It's About Abstraction - Francois Chollet

Bold AI Predictions From Cohere Co-founder

Open-Ended AI: The Key to Superhuman Intelligence? - Prof. Tim Rocktäschel

Ben Goertzel on "Superintelligence"

Taming Silicon Valley - Prof. Gary Marcus

Prof. Mark Solms - The Hidden Spring

Patrick Lewis (Cohere) - Retrieval Augmented Generation

Ashley Edwards - Genie Paper (DeepMind/Runway)

Cohere's SVP Technology - Saurabh Baji

David Hanson's Vision for Sentient Robots

The Fabric of Knowledge - David Spivak

Jürgen Schmidhuber - Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

"AI should NOT be regulated at all!" - Prof. Pedro Domingos

Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

Joscha Bach - AGI24 Keynote (Cyberanimism)