Loading video...

Video Failed to Load

Go Home

How well can Qwen3.5 models debug code? I built BugFind-15 — 15 buggy snippets across Python, JS, Rust, and Go. Docker sandbox compiles and validates every fix. Two trap scenarios where the code is correct and the model must resist "fixing" it. Tested every Qwen3.5 size from 0.8B to...

35,006 views • 2 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos