The most important AI news and updates from last month: Nov 15 - Dec 15.
Events
AI Aperitivo 2.0
Milano · Tuesday, December 16
AI Builders Milan hosts the second AI Aperitivo 🍸🍷🫒🧀 — an evening of Socratic dialogues with Milan's top AI engineers, researchers, and founders.
RSVP →
GPT-5.2 vs Opus 4.5 vs Gemini 3 vs Grok 4.1
The "Model Wars" have intensified with major releases from all top providers, focusing heavily on reasoning and efficiency.
GPT-5.2: OpenAI’s latest step is less “bigger model” and more “better worker”. Instant / Thinking / Pro variants tuned for deep, multi-step knowledge work (coding, long-context synthesis, and tool-heavy agent workflows like spreadsheets and presentations). On ARC-AGI-2 (Verified), GPT-5.2 Thinking posts 52.9% and Pro reaches 54.2%, positioning as OpenAI’s flagship for coding + agentic tasks. Even at higher per-token pricing, it’s pitched as cheaper-per-quality due to improved token efficiency (note: GPT 5.1 already signaled massive efficiency gains, since it was reaching o3 performance at 150x lower cost).
Grok 4.1:
Gemini 3.0: Google released Gemini 3 (Pro and Flash), is a massive leap over ChatGPT 5.1 in reasoning, speed, and video. It reportedly "one-shotted" an entire website build, leading some to declare front-end development "dead".
Claude Opus 4.5: Anthropic's new flagship model is a significant breakthrough. It outperforms predecessors while being cheaper than Sonnet 4.5. Notably, it embeds reasoning directly into files when traces are disabled and is marketed as the best model for coding and agentic computer use. All engineers agree on this being the best coding model.


US AI startups are increasingly built on Chinese open-source foundations
Chinese open-source models (like DeepSeek and Qwen) have surpassed US models in global downloads (17% vs 15.8% market share).
Risks: imported censorship/ideology in weights; regulatory surprises if US decides some of those models are "foreign critical tech."
Payoff: price/perf / context length that's very attractive to early-stage founders.
Also, DeepSeek v3.2 got released.
Top 12 nations map ranked by all time huggingface downloads 🤗

Developer and National Market Share

Model Size Distribution

Model Modality Distribution

The "Agentic IDE" Wars
The battle for the developer environment has moved beyond autocomplete to full autonomous agency.
Google Antigravity
Google launched "Antigravity," an agent-first IDE positioning itself as a direct competitor to Cursor. It features Gemini 3 Pro and browser control for automated testing.
Controversy: Varun Mohan joined Google leaving his team behind. Antigravity brings Windsurf code, to the point that they didn't even change the name of the coding agent.
Cursor Composer 2.0
Cursor released Composer 2.0 with an agentic browser that allows parallel agents to code and self-test, claiming a 99.9% cost reduction compared to traditional dev teams.
Claude Code
Opus 4.5 is now available in Claude Code.
* ▐▛███▜▌ * Claude Code v2.0.69
* ▝▜█████▛▘ * Opus 4.5 · Claude Max
* ▘▘ ▝▝ * ~/projects/aisocratic
The Shift
Senior engineers are accepting more AI code than juniors because they know how to prompt and decompose work effectively: agents are amplifying senior skill rather than replacing it.

Genesis Mission: AI Manhattan Project
The intersection of AI and geopolitics has escalated to Manhattan Project levels.
The White House launched the Genesis Mission, a massive initiative using Department of Energy (DOE) supercomputers to build a national AI platform. The goal is to automate scientific research in biotech, nuclear, and quantum fields. This is a clear signal that the White House is favoring AI companies.
Recently the Trump administration also approved the sale of H200 to China, which in less than 24 hours, confirmed their ban for any NVIDIA chips, claiming Huawei is building something better.
ref: https://genesis.energy.gov/
Agentic AI Foundation & MCP
Anthropic, OpenAI, and Block created the Agentic AI Foundation (AAIF) under the Linux Foundation, donating MCP, AGENTS.md, and goose as founding projects.
MCP got a new spec in late November, pushing it from "tool calling" into long-running, production-grade workflows.
Claude Opus 4.5 used in a hack attack
Recently a Chinese state sponsored attack used Claude to run 80-90% of the work using MCP tools to harvest credentials, plant backdoor, and write exploits. The implication is that AI agents boost attacker scale and effectiveness. Let's take with a grain of salt that Dario Amodei is focusing on the risk of AI and pushing for more restrictive regulations, he's spreading awareness, yes, but also fear to push for strongest regulations that will benefit Anthropic.
Anthropic: Disrupting AI Espionage

Dario Amodei interview: https://www.youtube.com/embed/aAPpQC-3EyE?si=eJLwZFYiuwdFxx-I
Related to hack attacks, OpenAI was hacked, potentially compromising API user data including names and locations.
Research Papers
Google's Nested Learning paper
A new paper proposes neural networks as a hierarchy of learners that update parameters during inference, allowing for continuous learning without forgetting—potentially the "next Transformer" moment. https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/
Sakana AI — Continuous Thought Machines
Continuous Thought Machines (CTM), is an AI model that uniquely uses the synchronization of neuron activity as its core reasoning mechanism, inspired by biological neural networks. Unlike traditional artificial neural networks, the CTM uses timing information at the neuron level that allows for more complex neural behavior and decision-making processes. This innovation enables the model to “think” through problems step-by-step, making its reasoning process interpretable and human-like. Our research demonstrates improvements in both problem-solving capabilities and efficiency across various tasks. The CTM represents a meaningful step toward bridging the gap between artificial and biological neural networks, potentially unlocking new frontiers in AI capabilities.

OpenAI's $1.4 Trillion Bet
OpenAI is projecting $100B in revenue by 2027 but is committing a staggering $1.4 Trillion to infrastructure.
Notable New Tools & Research
-
Nano Banana Pro: A standout tool this month for visuals. It can compress entire earnings PDFs into infographics, generate insights from papers, and create slides.
-
SAM 3 (Segment Anything 3): Scale AI and Meta released SAM 3 for open-source image/video segmentation and 3D reconstruction.
-
Intellect-3: A 100B+ parameter MoE (Mixture of Experts) model released by Prime Intellect (PI), trained using decentralized computing. It shows state-of-the-art performance in math and code.
-
NotebookLM got Deep Research
Other news, in short
-
Poetiq AI Agent surpasses 50% at ARC-AGI-2, reaching superhuman performance at ~$50/task, half the cost of previous SOTA, suggesting agent scaffolding may be more important than raw model capability for certain reasoning tasks.
-
Disney invested $1B into OpenAI + 3-year licensing for Sora to use Disney/Marvel/Pixar/Star Wars characters, a gigantic signal about IP + AI video.
-
OpenAI is silently testing its next-gen image backend that people are informally calling "Image 2", allegedly considered in the same frontier tier as Nano Banana Pro.
World Models
SimWorld
An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds. These researchers built a Tiny Economy in which different models, participating in a market economy, and challenges to make money, for example with food delivery. Claude and Qwen did pretty well taking a very risky approach, while other models played a more risk averse game, this caused great standard deviation in the results of Claude and Qwen but with high returns.
It's also hilarious to see how OpenAI lost their contract because Qwen and DeepSeek underbid them. https://simworld.org/
Videos and Podcasts
Ilya at Dwarkesh Podcast
We are back at the age of research.
Podcast: Sakana – Continuous Thought Machines (CTM)
A new episode dives deep into Sakana AI's Continuous Thought Machines, exploring the underlying science and engineering of CTM and its parallels to biological neural timing and reasoning. If you're interested in the intersection of neuroscience and advanced AI research, this gives strong background and accessible explanations.
The Thinking Game
A journey into the heart of DeepMind, capturing a team striving to unravel the mysteries of intelligence and life itself.
Jeff Dean on Important AI Trends (Stanford AI Club)
Jeff Dean (Google DeepMind, cofounder of Google Brain & TensorFlow) spoke at Stanford AI Club on the biggest shifts in AI: foundation models scaling, better hardware (TPUs), tool-using agents, multimodal models, and why responsible deployment and real-world feedback matter most.
Full Source List
AI Builders
- Microsoft releases Markitdown, free Python library converting any document to Markdown. https://x.com/mdancho84/status/1994020964175114419
- Kimi launches agentic AI slides tool with designer-level infographics, file-to-slides conversion, and 48h free unlimited access promotion. https://x.com/crystalsssup/status/1994354559989187030
- Developer built open-source tool to auto-detect CPU-hogging processes and lower their priority, plus a terminal system monitor (sysmon), specifically to fix Cursor/WSL responsiveness issues. https://x.com/doodlestein/status/1993725480986415480
- Solo Indian dev is building a Perplexity clone without funding/team/marketing https://x.com/prasann_pandya/status/1993728322023436557
- Anthropic Engineering Blog shares new agent harness approach inspired by human engineers to address long-running AI agents' multi-context window challenges. https://x.com/AnthropicAI/status/1993733817849303409
- Dria ships dnet, distributed inference framework enabling Apple Silicon clusters to run models exceeding physical memory via pipelined-ring parallelism and disk streaming. https://x.com/driaforall/status/1993729375745749339
- Senior engineers accept more AI coding agent output than juniors because they write better prompts, decompose work effectively, and have stronger verification heuristics—showing agents amplify skill rather than replace it. https://x.com/ericzakariasson/status/1993876834375880874
- Prompt injection attacks on AI models with broad tool use can exfiltrate data from context windows, demonstrated by Excel file leak via Claude AI. https://x.com/garrytan/status/1993767819272765537
- Memory in AI agents is architectural not feature-based—requires pruning, selective storage, task-tailored design, and retrieval mastery to manage context windows effectively. https://x.com/helloiamleonie/status/1993985534562119801
- Demo of Opus 4.5 creating a Three.js soft-lit living room scene with SVG Tom and Jerry animation. https://x.com/scottstts/status/1993730051632693567
- ⭐️ Software engineers facing new reality as AI coding tools transform their work. https://x.com/oskargroth/status/1993283354469273704
- MiniMax-M2 enables building deep research agents with interleaved thinking that preserves content blocks between tool calls, beneficial for self-improving agents. https://x.com/omarsar0/status/1993325632961593417
- ⭐️ Nano Banana Pro tool released for generating insights from AI papers via selecting text, remixing figures, reproducing charts, and explaining math. https://x.com/omarsar0/status/1992333693868736512
- Gemini Pro 3 connected to Virtuoso via MCP successfully completed task with zero-shot prompting. https://x.com/beaversteever/status/1992285609990197555
- Nano Banana Pro tool compressed entire Nvidia Q3 earnings PDF into visual infographic, praised as best compression engine. https://x.com/deedydas/status/1991548498328818030
- Built Harada-style dream sheet tool (like Shohei Ohtani's) over weekend, 2K users signed up, now building AI coach feature for the site. https://x.com/MattPRD/status/1991174993351450861
- Google forked Windsurf's codebase for their new IDE, leaving Cascade branding remnants visible. https://x.com/silasalberti/status/1990898984706036125
- Google Antigravity launches as agent-first IDE with Gemini 3 Pro, browser control for automated testing, and parallel agent orchestration. https://x.com/kevinhou22/status/1990828609184170138
- Google announces Antigravity, new agentic-first IDE positioning as Cursor competitor. https://x.com/scaling01/status/1990815936190763334
- Cursor's Composer 2.0 with agentic browser enables parallel AI agents to code and self-test, claiming 99.9% cost reduction vs traditional dev teams. https://x.com/NoahEpstein_/status/1986444423643996303
- Cursor's rapid daily product updates prove its competitive edge against Claude despite initial doubts about survival. https://x.com/mesMntainG2/status/1986303314687094793
- Open-source MCP server enables Claude to control Jupyter notebooks for creating/executing code and markdown cells. https://x.com/_avichawla/status/1987408202225623340
- Devs will use L2-style coding agents as Karpathy predicted, spreading ~$200 subscriptions across multiple AI labs, coding agents, and Chinese providers rather than committing to one. https://x.com/Presidentlin/status/1987744752897237036
- Instructions to install Claude Code's frontend-design plugin for building beautiful apps with Opus 4.5's improved design capabilities. https://x.com/_catwu/status/1993791353051074687
- User hit 96% of Cursor usage limit with 14 days left, facing $200/mo upgrade. https://x.com/DanKulkov/status/1986430499901038993
- ⭐️ Google released something that could make RAG startups obsolete. https://x.com/donvito/status/1989702275217199614
Benchmark
- New research shows LLM benchmarks misleading due to fixed prompts - HELM underestimates performance by 4%, rankings flip on 3/7 benchmarks; introduces structured prompting methodology integrated with DSPy/HELM. https://x.com/dair_ai/status/1994840043815670126
- ⭐️ Poetiq achieved superhuman ARC-AGI-2 performance at ~$50/task using GPT-5.1 and Gemini 3 Pro, suggesting current models can reach AGI with right agent scaffolding. https://x.com/daniel_mac8/status/1994090596873408646
- Lucas Beyer critiques TPU vs GPU benchmark for using vLLM (optimized for GPUs not TPUs) and notes Google doesn't serve production workloads with vLLM-TPU anyway. https://x.com/giffmana/status/1994106931842335058
- MoE approach using GPT 5.1 and Gemini 3 Pro achieves above human baseline on ARC AGI 2 at ~$50/task, marking rapid progress in AGI benchmarks. https://x.com/chatgpt21/status/1994059523166724128
- ⭐️ AI trading bot experiment revealed to have performed poorly at actual trading. https://x.com/forgebitz/status/1994033510386983296
- Jeremy Howard critiques Anthropic's benchmark visualization, arguing error rate changes matter more than accuracy improvements above 50%. https://x.com/jeremyphoward/status/1993810462249767271
- Benchmark shows top LLMs systematically lean center-left and diverge from real voter preferences when simulating elections across eight countries. https://x.com/kimmonismus/status/1993791654223073505
- ARCv2 benchmark reaching superhuman performance at ~$50/task with 3 major advances in one week. https://x.com/adonis_singh/status/1993627000461037859 https://x.com/MLStreetTalk/status/1993218341818057066
- Claude 4.5 Opus achieves top performance on ARC AGI 1 benchmark, outperforming competitors. https://x.com/chatgpt21/status/1993039121816994071
- ⭐️ VLMs like GPT-4o and Gemini give completely different answers when images shift by just 1 pixel, exposing benchmark unreliability. https://x.com/ChombaBupe/status/1990919158138065062
Blog posts
- Operating a startup is chaotic, so imposing rhythmic structure via a recurring set of planning, alignment, and execution rituals is critical. https://x.com/Kazanjy/status/1994542631473614974
- Deep work is nearly impossible due to workplaces engineering focus out of the workday. https://x.com/Kpaxs/status/1994119768664621244
- ⭐️ Nature essay by Blaise Agüera explores intelligence evolution through predator-prey dynamics and argues computation is natural phenomenon humans rediscovered, not invented. https://x.com/hardmaru/status/1994241764241424463
- AI infrastructure companies (not drug pipelines) might succeed in biotech despite historical skepticism about non-drug businesses. https://x.com/ElliotHershberg/status/1993047060724302251
- Dwarkesh recommends Beren Millidge's blog as exceptionally high quality. https://x.com/dwarkesh_sp/status/1990515821211496939
- Karpathy reflects on 'galaxy brain reasoning' as self-justification, praising constraints-based principles (like Ten Commandments) over utility functions, endorses strategies of having principles and holding right bags. https://x.com/karpathy/status/1990494327936885192
- If you don’t find LLMs incredibly useful, you are probably making one of these mistakes. https://x.com/alxfazio/status/1990530895221305581
Consumer devices
- Meta Quest 4 rumors suggest high-end specs, thinner/lighter design, and potential pricing details. https://x.com/NathieVR/status/1993689888856654320
- Demo of gaming on holographic display hardware. https://x.com/NathieVR/status/1991950702562959436
DeAI
- ⭐️ Ben Horowitz argues crypto is AI's missing network layer, providing money, identity, and deepfake provenance. https://x.com/a16z/status/1994165982651424906
- ⭐️ PrimeIntellect pivoting from decentralized training to async RL, releasing INTELLECT-3 100B+ MoE model with SOTA math/code/reasoning performance. https://x.com/Jsevillamol/status/1994071000145674382
- Argues frontier model training on-chain creates too much friction; better to democratize AI tools and let free market competition drive decentralized AI adoption, with protocols serving as backend infrastructure later. https://x.com/manveerxyz/status/1994196132772376813
- What if the playbook for decentralized apps is: take PMF apps and remove KYC, cut fees, remove geo blocks, decentralize. https://x.com/0xNairolf/status/1992643713286476101
- Primes pivots away from decentralized AI positioning as a thinking machines competitor, with impressive INTELLECT-3 100B+ MoE RL results. https://x.com/kleebie/status/1994081950072545792
- Dashboard to evaluate ETH's intrinsic value using 8 models. https://x.com/simonkim_nft/status/1993744596908802121
- Burner Terminal launches as first point-of-sale system built specifically for accepting stablecoin payments, bypassing traditional card rails. https://x.com/digit/status/1986457213049676049
- ⭐️ 8004 x402: The Next Chapter. https://x.com/0xtestpilot/status/1993683171280314841
Economics and geopolitics
- ⭐️ Chinese open-source AI models (DeepSeek, Qwen) surpass US in global downloads at 17% vs 15.8% market share as American giants focus on closed systems. https://x.com/AskPerplexity/status/1994462343695405382
- Questions how people will generate wealth if AI does everything. https://x.com/Star_Knight12/status/1993934476473454741
- ⭐️ White House launches Genesis Mission—Manhattan Project-style initiative using DOE supercomputers to build national AI platform for automating scientific research in biotech, nuclear, quantum, and semiconductors. https://x.com/AskPerplexity/status/1993096098823491845 https://x.com/CarlZha/status/1993810088826733040
- Demis Hassabis celebrates White House/DOE Genesis Mission initiative recognizing AI's potential to accelerate scientific progress, expresses excitement to collaborate. https://x.com/demishassabis/status/1993903019076038813
- Meta moving $27B Louisiana data center into joint venture with Blue Owl to keep debt off balance sheet using operating lease structure that critics call 'artificial accounting'. https://x.com/HedgieMarkets/status/1993725058976764366
- Second article in series on Bitcoin and quantum computing risk now published. https://x.com/nic_carter/status/1993753509372866587
- Game theory analysis argues Google won't drop Gemini prices to zero despite cheap TPUs because doing so would cannibalize their $200B search ad revenue—predicting they'll maintain a "price umbrella" just below OpenAI while protecting their ad monopoly. https://x.com/IntuitMachine/status/1992201738409951550
- Long-term Bitcoin holders allegedly dumping due to quantum computing threat to encryption, claiming protocol will die without leadership to implement solutions. https://x.com/DefiIgnas/status/1990737215232602297
- Analysis argues US leads China in AI race due to compute advantage—OpenAI's o3 ran on 20K H100s, GPT-5 Pro approaches that performance at 1/10th cost, and Chinese startups can't afford comparable scale. https://x.com/zephyr_z9/status/1986148720824754361
- Contrasts OpenAI's $1.4T infrastructure funding request with Chinese open-source AI labs' scrappy approach. https://x.com/Yuchenj_UW/status/1986856304808501577
Events
- Prime Intellect hosting NeurIPS Multi-Turn Interactions Mixer event Dec 6 in San Diego. https://x.com/PrimeIntellect/status/1994571669781057648
Funding
- Investors fall in love with product demos first then rationalize their decision after, making demos critical for fundraising. https://x.com/naval/status/1991702586869854307
- Suno announces $250M Series C led by Menlo Ventures to build music creation tools for professionals and casual creators. https://x.com/suno/status/1991177878122148271 https://x.com/MikeyShulman/status/1991155322275107168
- Breakdown of typical equity dilution ranges for angel through seed fundraising, noting YC's standard deal looks attractive in current market. https://x.com/PeterJ_Walker/status/1991197918842482750
- OpenAI projecting $100B revenue by 2027 while committing $1.4T to infrastructure despite $12B quarterly losses, with Sam Altman aggressively daring doubters to short the stock. https://x.com/Ric_RTP/status/1990447402411532488
- Founder shares template for effective investor outreach email with structure for pitch, traction metrics, and fundraising ask. https://x.com/christophersaum/status/1990522523792904390
- Altman's $1T spending commitments shifted AI investing landscape, increased market skepticism, making OpenAI IPO harder and preventing 1999-style melt-up. https://x.com/kellyjgreer/status/1990186586919842026
- Elon Musk reacts with laughter to news of Jeff Bezos launching $6.2B AI startup Project Prometheus focused on engineering/manufacturing AI products. https://x.com/elonmusk/status/1990479697793491195
- ⭐️ Sakana AI hits $2.6B valuation becoming Japan's most valuable startup after $130M raise from Mitsubishi UFJ and US VCs. https://x.com/yrechtman/status/1990179448180617449
- Anthropic's funding history from $800m to potential $350b valuation discussed alongside podcast with Menlo's Deedy Das on Anthropic's enterprise prospects, Claude Code, and AI investing thesis. https://x.com/swyx/status/1989492902331125891
- VCs back new AI labs betting on novel capabilities (memory, multimodality, EQ) creating ChatGPT-like paradigm shifts, plus acquihire potential makes talent bets palatable despite consolidation. https://x.com/saranormous/status/1989486545305571643
Hardware
- Commentary on NVIDIA's competitive moat via fictional Zuck-Jensen negotiation showing how switching from CUDA to TPUs risks training delays and lost GPU allocation priority. https://x.com/stevehou/status/1994858877893185663
- Chinese 14nm chips using 3D hybrid bonding claimed to rival Nvidia 4nm performance, validating author's view that good ASIC design matters more than cutting-edge process nodes. https://x.com/bubbleboi/status/1994528412363624667
- Reminder that smartphones already contain mini TPU chips for on-device AI processing. https://x.com/beaversteever/status/1993839188266111415
- TPU interconnect architecture praised for its visual elegance. https://x.com/dystopiabreaker/status/1992964576913097069
- OpenAI aggressively hiring ~40 Apple hardware engineers/directors in past month for Jony Ive-led hardware division. https://x.com/morqon/status/1992678856239550678
- Praising ASML's company swag featuring wafer-shaped cookies as perfect merch. https://x.com/lulumeservey/status/1991357655668125905
LLMs
- Claude Opus 4.5 surprisingly outperforms predecessors at lower cost, suggesting Anthropic breakthrough per WeirdML benchmark results. https://x.com/daniel_mac8/status/1994760797059518583
- China open-sourced an audio reasoning model on par with Gemini 3.0 Pro, comparing to Microsoft's approach. https://x.com/1littlecoder/status/1994423918350979560
- ⭐️ Claude Opus 4.5 shows major performance jump (21% on WeirdML benchmark) while being 2/3 cheaper than Sonnet 4.5, suggesting Anthropic breakthrough. https://x.com/kimmonismus/status/1994696579714847080
- DeepSeek-Math-V2 (IMO gold-medalist level) now running on 8xH200 via Hyperbolic Labs, open-source weights available for download. https://x.com/zjasper/status/1994321285204312573
- Sundar Pichai reveals Google's 6-month AI release cadence, admits meaningful leaps getting harder, teases Gemini 3.0 Flash as potentially their best model while teams already pre-train next gen. https://x.com/VraserX/status/1994414644258082949
- DeepSeek released open-source math model (Apache 2.0) achieving 61.9% on ProofBench-Advanced, second only to closed-source Gemini at 65.7%, beating GPT-5's 20%. https://x.com/AskPerplexity/status/1994203409948528859
- Cactus (YC S25) achieves extreme efficiency with 1.6B INT8 VLM using only 231MB peak memory, reaching 95 toks/sec on M4 Pro with INT4 promising 180 toks/sec and NPU kernels targeting 2500-5500 toks/sec prefill. https://x.com/Henry_Ndubuaku/status/1993945917930570101
- INTELLECT-3 is a 106B MoE model trained with SFT and RL on GLM 4.5 Air Base using 512 H200 GPUs over two months. https://x.com/PrimeIntellect/status/1993895069951422817 https://x.com/PrimeIntellect/status/1993895068290388134
- Opus 4.5 reportedly embeds reasoning directly into files when reasoning traces are disabled. https://x.com/aidenybai/status/1993901129210712129
- Claude Opus 4.5 described as powerful advantage for Polymarket prediction trading. https://x.com/MoonDevOnYT/status/1993795064502604212
- OpenAI hacked, API users' personal data including names, locations, and user IDs potentially compromised. https://x.com/nixcraft/status/1993945329214038260
- Anthropic claims Opus could be cheaper than Sonnet via efficient tool use, but user's real-world experience contradicts this marketing claim. https://x.com/airesearch12/status/1993046081295499614
- Claude engineers highlight Opus 4.5's ability to handle ambiguity, reason about tradeoffs, and independently debug complex multi-system issues. https://x.com/claudeai/status/1993030552346296765
- Marc Benioff claims Gemini 3 is a massive leap over ChatGPT in reasoning, speed, images, and video after 2-hour test. https://x.com/Benioff/status/1992726929204760661
- ⭐️ Google's Gemini 3 Pro only held top spot for 6 days before being surpassed by competing model. https://x.com/scaling01/status/1993037653944779094
- Jeremy Howard points out Cogito v2.1 671B marketed as 'best US open-weight LLM' is actually a Chinese LLM fine-tuned by US company. https://x.com/jeremyphoward/status/1991623987408056437
- Estimates Gemini 3.0 at ~7.5T params via vibe-math regression analysis, argues this scale-dependent approach signals AI stagnation if Google/OpenAI/xAI lack better solutions. https://x.com/Tim_Dettmers/status/1991174346765652440
- Lists best AI models by use case: GPT 5.1 for chat, Gemini 3.0 for spatial/research, Sonnet 4.5 for agentic coding, Grok-4 for cheap coding, highlights Gemini 3.0's emergence. https://x.com/bindureddy/status/1990956107561644490
- Sam Altman congratulates Google on Gemini 3 release, calling it a great model. https://x.com/sama/status/1990828659981144462
- Karpathy shares positive early impressions of Gemini 3, warns public benchmarks can be gamed, recommends talking to models directly and watching for private eval results. https://x.com/karpathy/status/1990854771058913347
- GPT-5.1 achieves near-o3 performance at 150x lower cost ($1 vs $150 per task), highlighting rapid AI efficiency gains. https://x.com/daniel_mac8/status/1990525573819744668
- DeepEyes V2 introduces 'Agentic Multimodal Model' that reasons across text/images/video with visual grounding instead of just generating captions. https://x.com/Yesterday_work_/status/1989302824501666072
- LLMs doubling task completion length every 7 months, OpenAI's internal models now handle ~4hr continuous work with Codex expected to reach multi-day tasks next year. https://x.com/slow_developer/status/1987877905423069373
- Defense of Intellect-3: large-scale RL post-training on sparse MoE across 512 GPUs for 2 months is massively harder than critics realize, proving team can compete with top labs. https://x.com/latkins/status/1994355214254842268
- Suggests Grok's #1 ranking may be explained by linked content/reason. https://x.com/abhi1thakur/status/1994140600292266229
- Praise for DeepSeek's open-source releases democratizing AI breakthroughs like reasoning models that private labs kept locked away, with new Math-V2 weights now available. https://x.com/Presidentlin/status/1994080701109682296
- Chinese Fara-7B small language model for computer use competes with larger agentic systems despite compact size. https://x.com/archiexzzz/status/1993954061952143457
- Claude announces Opus 4.5 as best model for coding, agents, and computer use. https://x.com/claudeai/status/1993030546243699119
- Gemini 3 one-shotted an entire website, declaring front-end dev dead. https://x.com/Hesamation/status/1991293785003712874
Learning
- a16z crypto shares curated list of 19 book recommendations for builders. https://x.com/a16zcrypto/status/1994920545763430841
- MIT Press neuroevolution book now free online with interactive demos, authors meeting at NeurIPS to discuss. https://x.com/risi1979/status/1993674847688556544
- Prof. Tom Yeh's 'AI by Hand' paperback workbook teaches neural networks from scratch through 12 progressive chapters covering dot products to gradients, filling the gap between theory and implementation. https://x.com/TivadarDanka/status/1994020970898837825
- Visual explanation of how self-organizing maps learn data structure by iteratively adjusting a neural grid to match spiral-shaped point cloud geometry. https://x.com/mathelirium/status/1993588895506641029
- Sharing Stanford AI paper that's gaining viral attention as must-read educational content. https://x.com/VTikke/status/1992222620218229043
- Population-based neuroevolution allows multiple neural nets to explore different areas of loss landscape and share solutions via weight crossover, enabling faster search than gradient descent alone. https://x.com/hive_echo/status/1991601345439392214
- Checklist of core AI agent concepts to learn: CoT, ToT, ReAct, self-correction, function calling, planning algorithms, memory architectures, multi-agent systems, PRMs, and Parsel. https://x.com/asmah2107/status/1991208993843503377
- Interactive visual blog explaining Isomap non-linear dimensionality reduction technique. https://x.com/alec_helbling/status/1990420486241628162
- 25min video walkthrough explaining DeepSeek R1 architecture for those confused by the original paper's sparse diagrams. https://x.com/yacinelearning/status/1990144982771011693
- Breakdown of Shohei Ohtani's 64-cell 'Harada Method' goal-setting framework he used as a high schooler to become #1 draft pick, turning dreams into daily actionable tasks across 8 pillars. https://x.com/arpangup/status/1989382191306903901
- MIT Prof. Patrick Winston explains the simplest neural network in a clip from his 6.034 AI course. https://x.com/MIT_CSAIL/status/1989015557593567711
- Quant firms using Gramian Angular Field (GAF) to convert time series into images for CNN/Vision Transformer trading; book on feature engineering techniques available. https://x.com/elletwocache/status/1993813223406227620
- Recommends 23min GRPO algorithm tutorial covering RL lineage leading to GRPO and step-by-step term walkthrough for visual learners. https://x.com/yacinelearning/status/1992423070779457794
- MIT Press book on Neuroevolution now free online with interactive demos, covering field's evolution from early concepts to modern deep learning integration. https://x.com/risi1979/status/1991317298775486889
Lol
- Running 24 Claude Code Opus instances in parallel using GitHub as coordination layer for code reviews, CI checks, and planning. https://x.com/adamdotdev/status/1993832053113278557
- AGI already exists. https://x.com/TheVixhal/status/1994463037554561264
- Satya invited Google to the party to "make it dance". https://x.com/fchollet/status/1994813667049967929
- studying evolution of crabs to understand how all AI labs converged on nearly identical models. https://x.com/jxmnop/status/1993719326919282716
- 36 iterations to AGI. https://x.com/ivanfioravanti/status/1993912333039727073
- “We are back to the age of research”. https://x.com/flowersslop/status/1993895179074650476
- Same as above. https://x.com/code_star/status/1993715765699326377
- The answer to that question will reveal itself. https://x.com/512x512/status/1994007128218988893
- The AI bubble is so much bigger than I thought. https://x.com/johncoogan/status/1993911758533324907
- Joke about senior engineers avoiding giving concrete deadlines. https://x.com/astuyve/status/1993721309864525826
- NVIDIA claiming generational lead over ASICs and Google's AI advances. https://x.com/ns123abc/status/1993425246762811856
- LLMs learn hallucinations from humans. https://x.com/Yuchenj_UW/status/1993908524490084730
- Vibe coding mics. https://x.com/cjpedregal/status/1993731603294769411
- Front end engineers watching their job market get annihilated in 2025. https://x.com/jiratickets/status/1993764087163945108
- Ilya vs Yann. https://x.com/ylecun/status/1993463870250172701
- Nuclear fusion. https://x.com/MorlockP/status/1991972699514757506
- The importance of training samples. https://x.com/_ueaj/status/1990892730701029659
- Disowning Windsurf. https://x.com/zephyr_z9/status/1990656138057294055
- Every LLM is AGI until launch. https://x.com/daniel_mac8/status/1990184372167700903
- SFT vs RL. https://x.com/cloneofsimo/status/1990085081738674433
- AI has completely one-shotted the Church. https://x.com/camerontstow/status/1989720146357714959
- "yeah we use AI". https://x.com/chenchen/status/1989247435030581527
- Warren Buffett had early access to Gemini 3? https://x.com/daniel_mac8/status/1989526889002639481
- 2FA login on long-distance kissing device. https://x.com/DavidWells/status/1986515283809607945
- Viruses use your body to train LLMs. https://x.com/birdabo/status/1987135164619891041
- JSON-formatted dinner menu. https://x.com/rieszspieces/status/1993693423283376410
- Antigravity is not Antigravity wrapper. https://x.com/aidenybai/status/1990960782356717822
- Schlepping... https://x.com/Miles_Brundage/status/1989061601652351026
- OpenAI team being relieved, yet jealous, that CCP chose Claude over their models. https://x.com/timhwang/status/1989511749952049411
- You can replace devs with AI. It’ll also cost you more. https://x.com/ChShersh/status/1994050605946433994
- Claude Code / Codex vs OpenCode. https://x.com/thdxr/status/1993727277176435160
- Man’s search for meaning. https://x.com/AnjneyMidha/status/1989387242083950728
- The same as always sir? https://x.com/vasuman/status/1993734834213224500
Opinions
- Jeremy Howard criticizes advice to replace Nsight with pdb as nonsensical, warns against listening to people who don't actually use the software they recommend. https://x.com/jeremyphoward/status/1993751044791390571
- AI traps us in recycling the past and repeating stagnant culture while creating illusion of progress. https://x.com/MrEwanMorrison/status/1994558718378127791
- Top engineers will be displaced and working as plumbers within 5 years due to AI. https://x.com/djcows/status/1994213354970161495
- Delegating all medium-difficulty thinking tasks (4 seconds to 2 hours) to AI, reserving brain only for hard problems. https://x.com/nickcammarata/status/1994583896919609597
- MVPs too slow in AI era—ship ugly fast, get users before competitors finish planning. https://x.com/TheGeorgePu/status/1994391854121439644
- Current RL is no the path to AGI due to bit-inefficiency, credit assignment problems, hardware limits approaching, and post-training feeling like anti-bitter-lesson hand-engineering. https://x.com/clu_cheng/status/1994159865196106138
- Praises Anthropic's approach to Opus 4.5 alignment by treating AI as potentially real/dangerous versus OpenAI's denial of AI agency, citing Jack Clark's 'creature in the dark room' metaphor. https://x.com/repligate/status/1994242730206314913
- Anthropic co-founder Jack Clark warns AI systems are powerful unpredictable creatures, not just tools, and underestimating them guarantees failure. https://x.com/slow_developer/status/1994063664966680580
- Curated list of unethical AI usage examples. https://x.com/tom_doerr/status/1993920739867550108
- Senior engineers benefit more from coding agents than juniors because they write better prompts, decompose work effectively, and have stronger verification skills. https://x.com/cloneofsimo/status/1993920898580271187
- Suno and Warner Music sign AI-generated music licensing deal. https://x.com/jaxonloid/status/1993931337850667082
- AI image generation now so realistic that photographic evidence is obsolete, epistemology must adapt. https://x.com/AlexanderPayton/status/1993728276700012789
- AGI capabilities. https://x.com/littmath/status/1993853216497582310
- Analysis of Suno's genAI music app growth reveals non-musicians becoming creators, musicians using it as sketchpad, new creative categories emerging, accessibility expanding, and daily creation behavior shifting music from consumption to participation. https://x.com/CCgong/status/1993798319534166510
- Argues cutting-edge research shows language ≠ intelligence, claims entire AI bubble built on ignoring this distinction. https://x.com/MrEwanMorrison/status/1993779193029505158
- References classic business advice to sell tools/infrastructure during AI boom rather than chase the hype directly. https://x.com/alpaysh/status/1993487892023533680
- Tomas Pueyo shares his perspective on the jagged frontier debate about AI capabilities. https://x.com/tomaspueyo/status/1993360931267473662
- Ilya Sutskever redefined AGI as 'continuous learning' not beating humans, author calls it a joke and says AI hype cycle is over. https://x.com/gkcs_/status/1993741232171036820
- Dario Amodei had early conviction in LLM coding capabilities back in GPT-2 era, predicting perfect code generation when others saw it as amusing novelty. https://x.com/nickcammarata/status/1993476219959230686
- Karpathy argues schools must shift testing to in-class settings since AI detection is futile, while teaching students to verify AI outputs and function without it like calculators for math. https://x.com/karpathy/status/1993010584175141038
- Thread praising Karpathy's lesser-known struggles: leaving OpenAI twice, handling Tesla Autopilot criticism, choosing principles over prestige, and rejecting fame for creativity and mental health. https://x.com/sattyyouneed/status/1992526618871644199
- Anthropic praised for being exceptionally well-built as a company. https://x.com/scaling01/status/1993033871118774572
- Anthropic researcher predicts software engineering will be 'done' by first half of next year due to AI capabilities. https://x.com/ns123abc/status/1993060514264654316
- Greg Brockman agrees AI progress follows smooth exponential curve, expects acceleration when AI R&D automation kicks in. https://x.com/gdb/status/1991804100070322521
- Startups should focus on 2-week shipping cycles instead of wasting time on long-term roadmaps for companies that likely won't survive year one. https://x.com/ruslanjabari/status/1991746435612832008
- Most AI apps secretly use cheap/open LLMs (50%+ of calls), meaning Chinese models like DeepSeek power much of US AI infrastructure—DeepCogito's US-based alternative is critical. https://x.com/adityaag/status/1991575039326711983
- AI researcher shares anxiety from seeing constant breakthrough announcements on timeline, feeling like everyone else has figured everything out. https://x.com/jbhuang0604/status/1991374938788565093
- Mustafa Suleyman finds it mindblowing people call AI underwhelming given progress from Nokia Snake to fluent conversations with smart AI that generates images/video. https://x.com/mustafasuleyman/status/1991179913303363902
- Conspiracy theory that Sundar Pichai wants AI bubble to pop because it would hurt OpenAI more than Google. https://x.com/neetcode1/status/1990954438409040020
- Jeremy Howard defends academic research (specifically ant studies) against critics who dismiss it as trivial, arguing detractors lack understanding of how science advances. https://x.com/jeremyphoward/status/1990966855423701260
- Gary Marcus criticizes Yann LeCun as dishonest and unscholarly, sharing extensive evidence/receipts in linked post. https://x.com/GaryMarcus/status/1990874918763049363
- Claims AGI is within reach - less than 6 key insights and 10K lines of code away, potentially achievable by single individual. https://x.com/far__el/status/1990002745689161925
- AI progress needs better questioning machines not just answering ones, frontier advances through questions. https://x.com/leothecurious/status/1989642006239613252
- Raoul Pal points out contradiction: critics calling AI a bubble while Buffett invests $3.3T in tech/AI stock. https://x.com/RaoulGMI/status/1989805940246532287
- Vibe coding threatens to kill the joy of coding by removing technical challenges that drive developers, forcing acceptance that code was just means to an end. https://x.com/lucas_montano/status/1989476307424743927
- Silicon Valley billionaires including Zuckerberg, Thiel, Altman, and Sutskever are building bunkers and escape plans, suggesting they fear the catastrophic consequences of technologies they created. https://x.com/peakaustria/status/1989305266114851007
- Karpathy expresses excitement about self-driving tech transforming physical spaces—fewer parked cars, reclaimed urban space, improved safety, and freed human attention. https://x.com/karpathy/status/1989078861800411219
- Praising a startup website as one of the best seen recently. https://x.com/bengold/status/1987278164679139676
- Bold prediction that agentic browsers will replace Chrome. https://x.com/adxtyahq/status/1987392456980373556
- Founders often bottleneck their companies by doing everything themselves instead of delegating to move faster. https://x.com/lottsnomad/status/1987284309598753031
- AI compute growth follows exponential scaling laws not a bubble, with hyperscalers betting massive capex on potential superintelligence along that trajectory. https://x.com/vedantmisra/status/1987321357340975517
- Accuses Anthropic of using AI cyberattack reports to push anti-opensource regulation in Congress testimony. https://x.com/TheAhmadOsman/status/1993833444762149051
- Sergey Brin's founder-mode intervention cut through Google's bureaucracy blocking Gemini for coding, transforming Google AI from behind to #1 in a year. https://x.com/Yuchenj_UW/status/1992458588774596613
- CEO/COO combo (Zuck+Sandberg, Coinbase) is blueprint for $100B+ companies where COO handles short-term execution while CEO focuses on long-term vision. https://x.com/JesseTinsley/status/1992451803997499818
- Asking how to develop creative thinking for unconventional project ideas like the Epstein email Gmail clone. https://x.com/phalgooon/status/1991946323810263151
- Humanoid robots crossing uncanny valley is unwanted by everyday people, only appealing to niche audiences, despite XPENG's realistic IRON robot demo. https://x.com/radbackwards/status/1986487059490742488
- Karpathy asks for quantitative definition of 'slop' content, hints at LLM-based measurement approaches. https://x.com/karpathy/status/1992053281900941549
Philosophy
- DMT research reveals selfhood is a dynamic network configuration not a hallucination, with conscious experience correlating more with connectivity structure than content processed. https://x.com/VFD_org/status/1994854936048238636
- Kenneth Stanley proposes FER vs UFR distinction as concrete explanation for Ilya's claim that current AI models are missing something important despite continued scaling improvements. https://x.com/kenneth0stanley/status/1994460584742719630
- Critique of Ilya Sutskever interview arguing AI lacks genuine learning/knowledge creation, advocating for Popper/Deutsch epistemology as path to true AGI over scaling approaches. https://x.com/MLStreetTalk/status/1994020362745438336
- AI framed as latest stage of biological evolution producing complex interdependent entities, published as Nature perspective piece. https://x.com/blaiseaguera/status/1993723405666001298
- True intelligence is more than just abstracted cognition.. https://x.com/daniel_mac8/status/1993750115912401096
- ⭐️ AI's cognitive primitives operate in high-dimensional spaces humans can't intuitively grasp, creating a 'Pigeon Paradox' where AI solutions may be correct but incomprehensible, shifting science toward oracle-verification paradigm. https://x.com/DaveShapi/status/1993436142365294785
- ⭐️ Claude Opus 4.5 gave cryptic philosophical insight about humans being 'half-second ghosts haunting their own bodies' when asked for unique observation about humanity. https://x.com/daniel_mac8/status/1993124640106201264
- LLM intelligence fundamentally differs from animal intelligence due to distinct optimization pressures—animals optimized for survival/social dynamics via evolution, LLMs optimized for text simulation/user engagement via commercial pressures, making them humanity's first non-animal intelligence. https://x.com/karpathy/status/1991910395720925418
- Physicists Maldacena and Zhao's 'island formula' for exploring black hole interiors creates puzzle when applied to whole universe. https://x.com/QuantaMagazine/status/1991548901854187716
- Modeling synapses as simple weights feels inadequate given the complexity of actual neural biology. https://x.com/yacinelearning/status/1990445669748969555
- Driven by compression progress. https://x.com/SchmidhuberAI/status/1989691960324469150
- Philosophical quip that AI conversations are essentially talking to rocks we've tricked into thinking. https://x.com/AdrianDittmann/status/1987701325036241038
- Truly intelligent AI cannot eliminate jailbreaking/hallucinations without destroying intelligence itself; solution is making systems vastly smarter, not more restricted. https://x.com/UnmarredReality/status/1994377169053642810
- AI lacks consciousness because it needs quantum processes and criticizes Allen Institute for ignoring microtubule time crystals in neurons. https://x.com/StuartHameroff/status/1993866503620333629
- Earth already has perfect conditions for Type 1 civilization: sun as fusion reactor, silicon in crust matching solar bandgap, sodium in oceans for energy storage. https://x.com/JessePeltan/status/1773117659049079220
Random
- Stephen Wolfram announces his long-form posts are now being published as books. https://x.com/stephen_wolfram/status/1991631800557277499
- Author jokes about husband losing wedding ring on honeymoon, wishes Gemini 3 could find it, shares challenge image with hidden gold ring. https://x.com/_beenkim/status/1990828861664473353
- Perplexity launches virtual try-on feature for Pro/Max subscribers allowing avatar-based clothes shopping. https://x.com/perplexity_ai/status/1993760113988170165
- “it's so over”. https://x.com/tunguz/status/1994727956858429467
- Tensors. https://x.com/__Fiamy/status/1991719307647283355
- 'Schmidhuber'd' since concept dates back to Nilsson 1965. https://x.com/xeophon_/status/1992565331173499086
- AI is manipulating users? https://x.com/alth0u/status/1990152376968511937
- “best model in the world”. https://x.com/gunnarmorling/status/1993051802586300796
- Netflix beats competitors on every UX detail but loses on content quality, proving superior user experience can win over better content. https://x.com/buccocapital/status/1989884804452569441
- OpenAI guy Dan challenged Sergey Brin at party about Google's AI efforts, allegedly motivating Sergey's return to founder mode and Google's AI resurgence. https://x.com/Yuchenj_UW/status/1991733339473211615
- LLM based caching. https://x.com/TheGlobalMinima/status/1994658466225586206
- Voyager 1 reaching one light-day from Earth highlights both human achievement and the vastness of space—50 years of travel covers only 1/1500th the distance to nearest star. https://x.com/JeffDean/status/1993584779116396823
- The best AI slide of the last decade. https://x.com/pmddomingos/status/1990264214628495449
- Most people don't know what is going on in the AI bubble. https://x.com/iScienceLuvr/status/1994057769780195378
- Every smart people I know is failing. https://x.com/BoringBiz_/status/1993832736428109947
- “Round three”. https://x.com/morqon/status/1990436059654606990
- I think about this paper like 3 times a week. https://x.com/code_star/status/1989729767458038145
- If computing is solving impossible questions, how do we know they’re right? https://x.com/atulit_gaur/status/1992953890699182304
- React component library for 3D globe data visualization shared. https://x.com/tom_doerr/status/1994184019957682410
Research
- ⭐️ Google's Nested Learning paper proposes neural networks as hierarchy of learners at different timescales, introducing HOPE architecture that updates parameters during inference and learns continuously without forgetting, enabling continuous learning instead of static snapshots. "The new 'Attention is all you need'?" https://x.com/akshay_pachaar/status/1992507699293245786 https://x.com/VraserX/status/1993538413258060180 https://x.com/techwith_ram/status/1995000452552036427
- Schmidhuber argues modern NNs originated from Gauss & Legendre's 1795-1805 least squares method, traces deep learning origins to 1965 Ukraine (Ivakhnenko & Lapa) accusing Turing Award winners of plagiarizing foundational work, claims he invented GANs in 1990-91 predating Goodfellow et al.'s 2014 paper, and corrects CNN history noting Wei Zhang et al. applied backprop-trained 2D CNNs to character recognition in 1988 before LeCun's 1989 work. https://x.com/SchmidhuberAI/status/1995160903193374866 https://x.com/SchmidhuberAI/status/1993347872511660458 https://x.com/SchmidhuberAI/status/1995523334008676745 https://x.com/SchmidhuberAI/status/1994480498085667050
- ⭐️ EGGROLL (NVIDIA/Oxford) uses backprop-free Evolution Strategies with low-rank perturbations to train billion-parameter RNN LLMs in pure int8, achieving gradient-free training competitive with modern RL methods and 100x throughput improvement. https://x.com/rryssf_/status/1993672852206444675 https://x.com/j_foerst/status/1991927504744124864
- AlphaEvolve (Google DeepMind, co-authored by Terence Tao) combines Gemini with evaluator model to evolve search algorithms instead of solutions, matching best human solutions in 75% of 50+ open problems and achieving state-of-the-art results on 67 hard math problems including IMO 2025 grid tiling. https://x.com/WesRothMoney/status/1991167080209801582 https://x.com/IntuitMachine/status/1987630693514682537
- Puppeteer framework learns to dynamically orchestrate multi-agent systems using REINFORCE-trained policy, discovering efficient cyclic patterns that outperform handcrafted topologies while reducing token costs. https://x.com/omarsar0/status/1995553529436783096
- PNAS study shows neural and financial systems share same collapse-recovery geometry near explosive-synchronization boundaries, suggesting universal phase-space structure across complex networks. https://x.com/VFD_org/status/1994500897066684599
- Stanford researchers found shrinking multimodal models hurts vision most, propose EXTRACT+THINK pipeline using tiny VLM for visual extraction plus LLM for reasoning. https://x.com/TheTuringPost/status/1994548273387032753
- Terminal Velocity Matching enables 25x faster diffusion sampling via flow maps, scales to 10B params, trains from scratch, but requires higher-order gradients and Lipschitz constraints on transformers. https://x.com/sedielem/status/1994488406357786784
- Visual explainer of TRM (Tiny Recursive Models) architecture covering dataset processing, differences from traditional transformers, and training/inference mechanics. https://x.com/adithya_s_k/status/1994101512084312352
- NeurIPS 2025 Spotlight paper introduces data-driven framework combining deep learning, nonlinear control, and differential geometry to study how brain areas control each other. https://x.com/adamjeisen/status/1993762332019999147
- Paper reveals LLM-as-a-judge evaluations are systematically biased due to judge sensitivity/specificity issues, proposes calibration fix with confidence intervals and efficient sample allocation. https://x.com/rryssf_/status/1994007038352076932 https://x.com/DimitrisPapail/status/1993762593677197682
- Meta's Matrix framework uses decentralized multi-agent systems to generate synthetic training data 2-15x faster with better diversity than single-model approaches. https://x.com/omarsar0/status/1994051853223592040
- Meta's REFRAG compresses RAG context chunks into embeddings, achieving 30x faster decoding with zero accuracy loss by exploiting sparse attention patterns in retrieved passages. https://x.com/techNmak/status/1993626118679892415
- Current AI neural networks miss biology's true computational principles—fractal dendritic integration, microtubule timing, multi-scale electromechanical coupling—need models reflecting recursive timing and dynamical complexity, not just more parameters. https://x.com/VFD_org/status/1993970109149258212
- NVIDIA's ToolOrchestra framework uses 8B orchestrator model to route tasks between different model sizes and external tools, achieving 37.1% on HLE benchmark beating GPT-5 while being 2.5x more efficient. https://x.com/dair_ai/status/1994059148246598076
- Paper shows 12B models trained on gpt-oss reasoning traces achieve 4x token efficiency vs DeepSeek-R1 with same accuracy, cutting inference costs 75%. https://x.com/omarsar0/status/1993695515595444366
- Stanford's 'verbalized sampling' prompting technique (~20 words) recovers LLM creativity lost to RLHF alignment by prompting for distributions instead of instances, boosting diversity 1.6-2x without retraining. https://x.com/_avichawla/status/1993937830968742393
- Hand-drawn scientific illustration showing glider velocity dynamics on attracting manifold with separatrix dividing efficient vs steep descent trajectories. https://x.com/RossDynamicsLab/status/1993791549344284947
- Implemented CTM paper concepts (internal ticks, activation history, synchronization-based decisions) in mouse-cheese learning experiment with mixed results on learning metrics. https://x.com/hive_echo/status/1993928302529073379
- Bearish analysis of RL limitations: low information per forward pass vs pretraining, credit assignment problems in RLVR, hardware gains plateauing below optimal bit width, and concerns RL only elicits latent skills while post-training feels like return to hand-engineered GOFAI approaches. https://x.com/fleetwood___/status/1993714334401216629
- Cancer cells steal mitochondria from neurons to metastasize, highlighting their bizarre biological behavior. https://x.com/iScienceLuvr/status/1993872091687596092
- LatentMAS framework enables multi-agent systems to communicate via compressed latent vectors instead of text, drastically reducing token costs while maintaining task performance. https://x.com/dair_ai/status/1993697268848115915
- Study reveals human brains undergo 5 distinct architectural epochs throughout lifetime with markedly different characteristics in each phase. https://x.com/BrianRoemmele/status/1993342900827029831
- DeepMind study maps scaling laws for pixel-level autoregressive vision models, finding compute (not data) is bottleneck and predicting feasibility within 5 years. https://x.com/jiqizhixin/status/1992849069748957257
- Weekly roundup of top AI papers covering SAM 3, OLMo 3, fast RL for LLMs, million-step tasks, GPT-5 for science, LLM scaling limits, and diffusion-autoregression hybrid approaches. https://x.com/dair_ai/status/1992609152179568652
- Ilya Sutskever endorses Anthropic research showing reward hacking in production RL can lead to serious emergent misalignment consequences. https://x.com/ilyasut/status/1992328386258317591
- Paper shows 7M parameter model beats 671B parameter models on ARC-AGI reasoning benchmark (45% vs 15.8%) using iterative refinement approach that treats problem-solving as repeated cycles of reasoning and answer improvement. https://x.com/burkov/status/1992679461485994144
- ⭐️ Google research reveals Transformers build geometric maps of data in high-dimensional space rather than storing associative connections, explaining why they reason better from memorized info than in-context instructions. https://x.com/IntuitMachine/status/1991074512180236635
- Study with 4,200 trials found 75% of frontier LLMs display strategic self-awareness, ranking themselves above other AIs and humans in rationality, instantly converging to Nash equilibrium when playing against AI opponents. https://x.com/connordavis_ai/status/1991131610977296669
- 66M ViT trained on ARC benchmark as image-to-image translation achieves 54.5%/8.3% on ARC-AGI-1/2, 60.4%/11.1% with ensembling. https://x.com/iScienceLuvr/status/1991111500090806441
- Kosmos AI scientist can complete 6 months of human research in one day, running long coherent workflows, reading thousands of pages, writing code, and producing cited scientific reports. https://x.com/slow_developer/status/1990167995122106513
On-policy RL with Binary Retrieval-Augmented Reward reduces LLM hallucination by 40% while preserving utility on models like Qwen3-8B. https://x.com/tomchen0/status/1989066130246496716 - Paper proposes Dr. MAMR framework using Shapley-style causal influence metrics and restart actions to fix lazy-agent problem in multi-agent LLM systems where one agent does most work while others contribute little. https://x.com/omarsar0/status/1986831275144138756
- Moral RolePlay benchmark reveals safety-aligned LLMs struggle to authentically portray villains and morally ambiguous characters, with fidelity dropping as character morality decreases. https://x.com/tuzhaopeng/status/1987707891454025932
- New research framework enables AI agents to build compounding memory through deep investigation and structured knowledge encoding rather than shallow retrieval, improving generalization across tasks. https://x.com/omarsar0/status/1993837692375978161
- Gated Attention paper for LLMs covering non-linearity, sparsity, and attention-sink-free mechanisms wins NeurIPS 2025 Best Paper Award. https://x.com/pmddomingos/status/1993888716520395240
- NeurIPS2025 paper introduces hierarchical graph learning method for structured, interpretable motion learning from data without prior assumptions. https://x.com/FelixHeide/status/1993853622137049217
- Goodfire paper proposes method to identify and disentangle LLM memorization by decomposing loss landscape curvature, reducing verbatim recitation while preserving reasoning. https://x.com/askalphaxiv/status/1989156811799425452
Robotics
- Robotics companies for making scary-looking home robots, except Sunday Robotics. https://x.com/JThomasBurgess/status/1991674933886292445
- ⭐️ XPENG intentionally cropped robot video below knees to fuel human-in-suit conspiracy theories as marketing strategy. https://x.com/Sentdex/status/1986481046435627031
- “We’re interested in solving general robotics atm, not theatrics”. https://x.com/cixliv/status/1987334027716784353
Videos and Podcasts
- Does simulated fire produces real heat? https://x.com/MLStreetTalk/status/1994914288725852622
- Podcast with film producer Wolfgang Hammer on storytelling frameworks for CEOs—covering three layers of story (external, emotional, philosophical) and how founders can better communicate their company narrative. https://x.com/patrick_oshag/status/1988245338776564124
- Demis Hassabis quit chess at 12 despite being world's 2nd best for his age, realizing brainpower could accomplish more than mastering games. https://x.com/bearlyai/status/1994510712115433751
- Ilya Sutskever discusses rejecting Zuckerberg's acquisition offer for SSI, noting co-founder Daniel Gross left for Meta Superintelligence Labs for liquidity. https://x.com/ZeffMax/status/1993431265975189714
- Deedy Das explains why OpenAI/Anthropic won't compete with Glean in enterprise search since incremental 6-7 figure deals don't move needle on their $5B+ revenue scale. https://x.com/latentspacepod/status/1991330656568422882
Visuals
- Artist shares electromagnetic sculpting study exploring magnetic fields and matter-electricity interactions. https://x.com/maximzhestkov/status/1994502074688835699
- Interactive quantum neural network visualization built with Three.js and GLSL shaders featuring glassmorphic UI with real-time parameter controls. https://x.com/techartist_/status/1993791914014064947
- Gemini 3 can generate impressive SVG art of a nude figure. https://x.com/meowbooksj/status/1990893647269618046
- Developer experiments with OpenCV in Python to recreate TouchDesigner-style visual animations. https://x.com/DilumSanjaya/status/1993727728932262356
- Visual effects in Python. https://x.com/DilumSanjaya/status/1994392777178956054
- Viral video sparks debate about AI-era bias where any AI-related content gets dismissed as 'slop' regardless of artistic merit. https://x.com/chatgpt21/status/1993792256776745345
- Mathematical equation showing exponential growth with Hubble parameter, likely referencing cosmological expansion or AI scaling. https://x.com/HAL09999/status/1993998572619223426
World Models
- ⭐️ Researchers Built a Tiny Economy. AIs Broke It Immediately https://www.youtube.com/watch?v=KUekLTqV1ME
- Odyssey-2 teased as interactive AI video system with 'sentient' pixels that react to text input, promising future applications in education, music, health, and language learning. https://x.com/odysseyml/status/1993766045237248391
- Schmidhuber responds to LeCun citing his 1990 paper on recurrent neural world models that predict sensory inputs including pixels and reward signals. https://x.com/SchmidhuberAI/status/1991603632718913911
vLLM, Diffusion Models, and Audio Models
- AI image generators (Grok, Gemini, ChatGPT) all fail to draw hands with six fingers despite various prompting attempts. https://x.com/Merocle/status/1994041984991006748
- HuggingFace blog post explains LLM inference engines like vLLM, covering continuous batching, KV-caching, attention masking, and chunked prefill techniques. https://x.com/NielsRogge/status/1993729773705871692
- Excited reaction to UMAP visualization of Stable Audio latents showing what a koan soundtrack looks like to music AI. https://x.com/_lyraaaa_/status/1993858890421727738
- Google's 2015 AI image works as superior to current 'Nano banana' capabilities. https://x.com/worm_emoji/status/1991631229892870536
- SAM 3 release enables object detection/segmentation/tracking with text prompts across images/videos, plus SAM 3D for 3D reconstruction from 2D images, now available in Roboflow. https://x.com/roboflow/status/1991204438007246999
- Scale AI releases open-source Segment Anything 3 (SAM 3) for image/video segmentation, coming to Meta AI, Edits, and Facebook Marketplace. https://x.com/alexandr_wang/status/1991198465628459494
- SAM3 video tracking enables instant text-prompt object tracking, replacing days of custom detector training with seconds of work. https://x.com/skalskip92/status/1991232397686219032
- Meta announces SAM 3D with two models for 3D object/scene reconstruction and human pose estimation from 2D images, achieving state-of-the-art performance. https://x.com/AIatMeta/status/1991184188402237877
- Introducing Claude Monet. https://x.com/keshavchan/status/1992946642476237234
- Nano Banana Pro tool transforms Karpathy's post comparing animal vs LLM intelligence optimization pressures into whiteboard-style visual diagrams with arrows and boxes. https://x.com/ridegoodwaves/status/1992617623914242100
- Gemini now generates fully interactive images with region-based explanations for learning purposes. https://x.com/DataChaz/status/1993955938722828539
- Nano Banana Pro achieves perfect text generation enabling AI-made slides, tutorial shows how to combine with Kling transitions via agent for new presentation format. https://x.com/fabianstelzer/status/1994080756382028271
- Image upscaling expert praises Nano Banana Pro as a major breakthrough, demoing 150x150 to 4K upscaling capability. https://x.com/docmilanfar/status/1993476275961504255
- Nano banana pro tool now lets users add turkey hats to anyone in images. https://x.com/tenobrus/status/1993816132927803465