Why SWE-bench Verified no longer measures frontier coding capabilities
SWE-bench Verified was the gold standard for AI coding benchmarks. It isn't anymore — and that should change how you read every model leaderboard out there.
32 articles
SWE-bench Verified was the gold standard for AI coding benchmarks. It isn't anymore — and that should change how you read every model leaderboard out there.
The US is weighing massive sanctions over alleged state-sponsored AI theft from American companies — right as a Trump-Xi summit looms. China isn't having it.
Anthropic just turned Claude into an AI that actually does things — not just talks about them. Spotify, Uber, and Resy are already on board.
OpenAI just dropped GPT-5.5 with agentic coding and research upgrades. Here's what actually changed and what still needs proving.
A man could go to prison for generating fake AI images of an escaped wolf that gripped the entire nation. This case is setting a serious legal precedent for AI misuse.
NotebookLM quietly turned Gemini into something genuinely useful. This isn't just another AI chatbot story — it's about a smarter approach to working with information.
A viral red carpet moment blew the cover on a group of AI-generated hunks with massive Instagram followings. Turns out their fans already knew — and didn't care.
Anthropic ran an experiment where AI agents acted as buyers and sellers in a classified marketplace — real goods, real money, no humans in the loop.