I have run 50 real client tasks through four AI coding agents this quarter. Here is the honest comparison.

Pass rates

Chart

Task completion rate, 50 real client tasks

Source: My own tracking, Q1 2026

Claude Code (the CLI tool) won this on first-try pass rate (62 percent). Cursor was close second (56 percent). Manus and Devin were lower.

That is for tasks under 30 minutes of human work. For tasks that take a human three or four hours, the numbers shift; Manus catches up because it can actually research.

Strengths

When to use what

Spec	Manus	Cursor	Devin	Claude Code
Best for	Multi-step research	IDE coding flow	Async background tasks	Terminal-led editing
Failure mode	Wanders	Quick refusal	Slow loops	Over-edits
Speed (avg task)	12 min	4 min	38 min	6 min
Cost (per real task)	~$0.80	~$0.20	~$2.50	~$0.30
My pick for	Spec exploration	Daily coding	Overnight chores	Surgical edits

Manus is best when I do not know what the answer is. "Research the cheapest way to host a Postgres for an EU SaaS, write me a one-pager and a deploy script" is a Manus task^[1].

Cursor is best when I know what I want and want fast iteration in the IDE. Composer mode with Claude Sonnet 4.6 handles most of my daily code work^[2].

Devin is best for async background work. "Take this list of 30 small bug reports and try to fix each one overnight." Slower per task but you give it 8 hours and come back to a PR list^[3].

Claude Code is best for surgical editing in a single repo. The terminal flow is faster than mouse-driven IDE work for many tasks.

Cost

A typical "fix a bug, add a test" task:

Cursor: $0.20 in tokens
Claude Code: $0.30
Manus: $0.80 (does more, costs more)
Devin: $2.50 (slow loops)

For a freelancer billing £80/hour these are rounding error. The constraint is correctness and time, not token cost.

Where they all fail

Cross-cutting refactors that touch 20+ files. All four lose context, all four make inconsistent choices across files. Humans still do these better.

Anything that requires understanding business logic that is not in the codebase. Easy to fall back to "well, I will guess." Critical if you do not review carefully.

My setup

For active client work: Cursor Composer + Claude Code in a tmux split. Cursor for IDE-led tasks, Claude Code for terminal-led tasks.

For overnight: Devin handles a queue of "boring but real" tickets, I review the PRs in the morning.

For exploration: Manus when I do not know the answer.

There is no single tool. The 2026 AI coding setup is a toolkit, not a single product.

About the data

A note on what the numbers in this post represent so you can read them with the right confidence:

"My own bench" rows are personal measurements on my own hardware. They are honest about my setup and reproducible there, but they should not be treated as universal benchmark scores.
Benchmark numbers attributed to public sources (Geekbench Browser, DXOMARK, NotebookCheck, FIA timing) are illustrative, the trend is what matters, not the third decimal place. Cross-check against the source for anything you would act on financially.
Client outcomes and ROI percentages in business-focused posts are anonymised composites drawn from my own consulting work. Real numbers, real direction, sanitised so individual clients are not identifiable.
Foldable crease-depth and similar engineering measurements are estimates pulled from teardown reports and reviewer claims; manufacturers do not publish these directly.
Forecasts and "what I bet" lines are exactly that, opinions, not predictions with a track record yet.

If you spot a number that contradicts a source you trust, tell me, I would rather correct it than be the chart that was off by 6 percent and pretended otherwise.

Live: latest HN discussion on Manus

The agent space changes weekly. Latest threads matching "manus" on HN:

Live

Hacker News mentions (live)

Fetching live data…

Source: HN Algolia · cached 10–60 min

References

[1]
Manus capabilities
https://manus.im
[2]
Cursor pricing and models
https://www.cursor.com/pricing
[3]
Devin AI overview
https://devin.ai

Comments

Sign in to comment, reply, and like.

By signing in, Sarma will receive your name, avatar, email, sign-in provider, and approximate location (country/city, derived from your IP) for moderation and reply purposes. None of this is shown publicly, only your name and avatar appear on the post. No newsletter, no marketing, no third-party sharing.

Loading comments…

S

Sarma

Independent software engineer, AI systems, automation platforms, and modern infrastructure.

LinkedIn More posts →

Work with Sarma

Have a project in mind?

I take on a small number of projects each quarter, AI systems, automation, infrastructure, and full-stack engineering.

Get in touch

Pass rates#

Strengths#

Cost#

Where they all fail#

My setup#

About the data#

Live: latest HN discussion on Manus#