Show HN: Time travel debugging AI for more reliable vibe coding
nut.newHi HN, I'm the CEO at https://replay.io. We've been building a time travel debugger for web apps for several years now (previous HN post: https://news.ycombinator.com/item?id=28539247) and are combining our tech with AI to automate the debugging process.
AIs are really good at writing code but really bad at debugging -- it's amazing to use Claude to prompt an app into existence, and pretty frustrating when that app doesn't work right and Claude is all thumbs fixing the problem.
The basic reason for this is a lack of context. People can use devtools to understand what's going on in the app, but AIs struggle here. With a recording of the app its behavior becomes a giant database for querying using RAG. We've been giving Claude tools to explore and understand what happens in a Replay recording, from basic stuff like seeing console messages to more advanced analysis of React, control dependencies, and dataflow. For now this is behind a chat API (https://blog.replay.io/the-nut-api).
We recently launched Nut (https://nut.new) as an open source project which uses this tech for building apps through prompting (vibe coding), similar to e.g. https://bolt.new and https://v0.dev. We want Nut to fix bugs effectively (cracking nuts, so to speak) and are working to make it a reliable tool for building complete production grade apps.
It's been pretty neat to see Nut fixing bugs that totally stump the AI otherwise. Each of the problems below has a short video but you can also load the associated project and try it yourself.
- Exception thrown from a catch block unmounts the entire app: https://nut.new/problem/57a0b3d7-42ed-4db0-bc7d-9dfec8e3b3a5
- A settings button doesn't work because its modal component isn't always created: https://nut.new/problem/bae8c208-31a1-4ec1-960f-3afa18514674
- An icon is really tiny due to sizing constraints imposed by other elements: https://nut.new/problem/9bb4e5f6-ea21-4b4c-b969-9e7ff4f00f5b
- Loading doesn't finish due to a problem initializing responsive UI state: https://nut.new/problem/486bc534-0c0e-4b2a-bb64-bfe985e623f4
- Infinite rendering loop caused by a missing useCallback: https://nut.new/problem/496f6944-419d-4f38-91b4-20d2aa698a5e
Nut is completely free. You get some free uses or can add an API key, and we're also offering unlimited free access for folks who can give us feedback we'll use to improve Nut. Email me at hi@replay.io if you're interested.
For now Nut is best suited for building frontends but we'll be rolling out more full stack features in the next few weeks. I'd love to know what you think!
> it's amazing to use Claude to prompt an app into existence, and pretty frustrating when that app doesn't work right and Claude is all thumbs fixing the problem.
Such an in interesting sentence. App that doesn't work doesn't seem like it's yet come into existence.
This has been my (limited) experience so far. I haven't been able to get an AI/LLM to help me build an app. Even React apps it fails at. I have been able to get an LLM to help with coding questions similar to Stack Overflow questions though (though not always)
You can replace "Claude" with any WYSIWYG no-code solution from back in the day like Dreamweaver or whatever and it's basically the same.
I know it's much more powerful than that tool was, but the experience described is similar between both
lol agreed, also I’m of the opinion that if you want a working app it’s much more frustrating to debug a heap of code generated by an AI than it is to build yourself (maybe with the help of AI, if you really need it). at least with the latter if you really built it yourself you understand all the components (to a certain abstraction point at least)
I've made a lot of apps with claude.e.g I made a pretty complex swiftui app recently even though I don't know swift. Usually you have to help Claude debug them and sometimes point it in the right direction.
Since the vast majority of Swift and SwiftUI documentation online is outdated, I've found that concatenating the best of "what's new in Swift 5.x / 6.x" blogs then asking it to organize that into a prompt for itself, then adding that to the system prompt, helps the LLM produce idiomatic and current code.
While these changes may require "new ways of thinking" in humans, the LLM seems to have these conceptual approaches embedded already thanks to other languages that did these things earlier. The what's new just shows it the syntax for these concepts in Swift.
The first pass often executes but the "thumbs" come in when you fix corner cases or iterate on it.
> AIs are really good at writing code but really bad at debugging -- it's amazing to use Claude to prompt an app into existence, and pretty frustrating when that app doesn't work right and Claude is all thumbs fixing the problem.
LLM's are not "really good at writing code". They generate statistically relevant text based on their training data set.
Expecting people who do not understand code to use LLM’s for making solutions is like thinking “non-pilots” can successfully fly a 747.
This is the worst kind of pedantry. There is now code in a text file that wasn't there before. It doesn't really matter to me if they code came from human fingers, an auto generator tool, an LLM or a ouija board. It looks like written code and it compiles like written code - it's written code. You can rightly criticise the use of LLMs in many ways but it is more useful to focus on the actual reasons they cause harm than to set red lines over language.
> It looks like written code and it compiles like written code - it's written code.
And more often than not it's just plain wrong code. LLMs aren't actually writing code, they are guessing at what might satisfy an input. Writing code is more than guessing, it's about assembling instructions with intent, for a purpose. LLMs lack that intent and purpose.
> You can rightly criticise the use of LLMs in many ways but it is more useful to focus on the actual reasons they cause harm than to set red lines over language.
The "actual reasons" they cause harm are inherent to how LLMs work. The real problem is people not understanding how they work and placing too much trust and in them and believing the hype. They are not some miracle, they aren't even all that helpful, and in my experience they are more of a waste of my time.
[flagged]
Exactly my annoying exert as well. LLMs are good at debugging not writing code.
It’s just a statistical rubber duck, “what other obvious (common) things haven’t I thought about yet Mr duck?”
My counter:
My 5 year old is recreated duck hunt with almost zero assistance in an hour.
I helped with the copy paste.
If anyone is wondering why it looks like Bolt, it’s because it’s using Bolt.DIY, an open-source fork of Bolt (https://github.com/stackblitz-labs/bolt.diy). The catch is that it's still using WebContainers from StackBlitz, so it's not really possible to run it commercially. You need to get rid of WebContainers and find something different.
Thanks, yeah we're really thankful to StackBlitz for open sourcing the early version of Bolt.new and to the Bolt.diy community for continuing to develop it.
We don't have a commercial offering yet and are planning to migrate off WebContainers for the upcoming full stack features -- WebContainers show their limits pretty quickly in a full stack context (e.g. CORS issues) and we need observability into the server side of the app for full stack debugging.
Regardless, our interests here are only lightly commercial. We're not really developing Nut to drive revenue but to help us develop the debugging API and push forward the SOTA for AI development as effectively as we can. That API is what we want to sell.
It does seem like the same type of UI as v0.dev ("What do you want to do?") is a strange UI/UX for fixing a bug, although I see there is a way to import code. The ideal UI/UX seems to be a CLI tool or VS code extension where I give it access to the directory and describe the bug and it just figures out what to do
Yeah, we allow importing projects so you can fix bugs you've encountered elsewhere, but we want to streamline the app building process and the UX used in v0/Bolt etc is already really well done. We want this app building to be more reliable though, it's easy to hit bugs that prevent progression.
We're also interested in using our API with MCP so that e.g. Cursor could be used to fix bugs you're seeing locally, and plan to explore that angle before long.
What is vibe coding?
The way I understood it: Cobbling a program together by simply prompting AI assistants over and over, blindly using the generated code, and repeating until it barely approaches satisfying the requirements. Not worrying about things like correctness, proper design, code cleanliness, understandability, performance, code size, security, data protection, maintainability, or even bugs unless they catastrophically stop the user from running the program.
I really hope this doesn't actually catch on in "real" engineering, beside as a meme joke.
Yes, that's all true. Even so, vibe coding empowers anyone who can write clear instructions to build software, but the limits of the technology get hit pretty quickly by non-developers and they have little recourse. This blog post https://addyo.substack.com/p/the-70-problem-hard-truths-abou... is a great overview.
The tech will get better and better (I couldn't imagine we'd be doing this a year ago) but to be truly useful it has to reliably produce reasonably well engineered code, and effective debugging is a key piece of that.
I’m sure it will. My pessimistic take is that the worst case is that thousands of bozos create crappy little apps that only cause minimal harm. And people just endure it instead of pushing for better guard rails.
Best case is some high profile shit show caused by software made mostly or entirely by ai that hopefully is bad enough that legislators wake up and realize that in the modern world software is essential enough that you can’t let just anyone sell it or services based on it. Just like you can’t allow anybody design/build bridges or hardware or whatever.
But I’m sure thats wishful thinking. Hacks and buggy software causing consumers harm is just accepted and software industry folk all hope to be billionaires so nobody cares.
It's the infinite improbability drive in hitchhiker's guide
would be funny though, who produces the result faster a Fiver or an AI in a loop for a day
It spits out urls to sites and sends em to Fiver QA people, take a shot every time the app doesn't work
Wonder the cost effectiveness, have a randomizer start producing/hosting code auto submit it to Product Hunt
Judging by how many people blindly posted Stackoverflow answers, there will be a significant amount of code ‘written’ this way.
Here's where it all started: https://x.com/karpathy/status/1886192184808149383
An even lazier form of LLM assisted coding where you blindly spam the tab key without even taking the bare minimum amount of time reviewing the garbage that it's busy outputting.
Karpathy "coined" the term and I absolutely hate it. It's up there with "asshat" and "awesome sauce" for profoundly stupid terms.
I find it more akin to prompt engineering; something else that is nothing more than 'typing some shit until it does something useful to someone' and then acting like it's actually a skill.
But we are very good in our profession to make up garbage terms to do anything but describe garbage.
It's truthiness for software. It doesn't matter if the code is correct, as long as it "feels" correct.
It's when an AI writes the code for you.
You say it's called Nut because it's about cracking nuts. Wouldn't a truer reason be it's a play on bolt? (Nuts and bolts)
I'd be interested to read a blog post or technical write up. I think conceptually it's an interesting idea
Well, it can be both. This post https://blog.replay.io/the-nut-api is the best technical overview of the API and discusses several examples.
Just letting you know the about page has black text on a black background
Can’t expect too much reliability from the result of vibe coding.
These are the results you can expect from someone who says they are 'vibe coding'.
Thanks for the report! The about page is fixed now when looking at it in dark mode.
Black text on black background is also used on the problems page, and the background only extends downwards one page length.
Thanks! These are both fixed now. Clearly we need to do some more dark mode testing...
I now see only a giant nut when viewing the main landing page.
Edit: Now everything is made of buttons.
Hmm, strange, it's loading alright for me but we've had a couple reports of rendering problems. If you have a chance to file an issue here https://github.com/replayio/nut.new/issues I'd appreciate it, thanks!
Working fine now, but I'm on a different device. Apologies, I didn't get a chance to troubleshoot.
Theres 0 chance this response was not written by an LLM. I don’t know what, but it screams genai.
Nut looks like a fork of Bolt, how does Nut differ?
Yes, Nut is a fork of https://bolt.diy and like bolt.diy you can add your own API key and use it as much as you want (Nut is hosted though so you don't have to set anything else up).
The improvements we're making are under the hood. When you ask Nut to fix a bug it should do a much better job -- we record the app's behavior and analyze it so the AI has context for the changes it needs to make.
We've also added some UI to approve or reject the changes the AI makes. For now we're using this to gather feedback so we can improve Nut, but down the line we'll also refund the user any credits when they reject changes -- you shouldn't have to pay when the AI screws up, a big issue with these tools (and vibe coding in general).
Interesting, I'll check it out. Any plans to open source Nut like Bolt is?
Yeah, the source is here: https://github.com/replayio/nut.new
We'll continue to keep it open source as we develop it.
Looks fun and it does create something - https://nut.new/chat/prince-of-persia-platform-game
I couldn't start the game though, but it seems runnable given some debugging. Great work!
Thanks! Unfortunately the chat links aren't shareable yet. We're planning on adding this within the next couple weeks along with the other full stack features (database integration and easy deployment).
Can you share the prompt you used to generate this game? (given chats aren't sharable)
Hm, the link isn't working for me
LLMs at best are a good auto-complete. I have been trying to use LLMs to code since GPT 3 came out. While Claude 3.7 has made some progress still it can't generate an app into existence, although it's great for boilerplate or hinting to which direction I should try to find the documentation.
In the past week I created two apps within 3-5 iterations. No manual edits (other than .env).
Another one I thought was pretty cool was handing it some API docs for something, then had it build a UI and admin interface from scratch.
The dev agent tools these days aren't half bad, and they're getting better.
That hasn't been true since 3.5; it can and does generate full working code and sometimes with only a few attempts. Sometimes it cannot figure it out and then you need to fix it yourself as it will just loop forever. These things can be absolutely trivial (and thus frustrating) at times, but it's also fairly magical when it just does work.
Have you tried this tool?
I just generated two tetris games, one with ascii art and one with WebGL, and I found it quite impressive. Maybe simplistic apps, but still quite impressive with it's ability to create functional games and fix bugs with minor prodding.
A great auto-complete, I'd say. But that's as much as I've gotten out of it too. Still a huge productivity booster
Lots of cynical comments but the examples look great. I'll be taking this for a spin when I get home :)
[dead]
[dead]