Image-to-Code: Designing Frontends Before You Write Them

Contents+

Why prompt-to-code keeps failing
The workflow, in six steps
Step 1 — Generate the site section by section
Step 2 — Use style words that actually mean something
Step 3 — Generate multiple directions, pick the best parts
Step 4 — Extract the assets
Step 5 — Turn images into code
Step 6 — Refine with screenshots
What to pay attention to
Why this unlocks creativity, not just speed
Image first. Code second.

Most AI-generated websites still look the same — centered hero, gradient blob, identical card grids, average spacing. The models aren't the problem. The workflow is. Asking a coding agent to invent visual taste at the same time as it writes implementation, manages assets, handles responsiveness, and ships accessibility is asking it to optimize across too many axes at once.

Leon Lin (@LexnLin) wrote a short manifesto on this recently — "Image-to-Code" — and his framing clarified something I'd been doing piecemeal. The core idea: separate the visual design step from the coding step. Generate high-quality website images first. Turn those images into code, section by section, after the look is decided. The visuals below come from Leon's original thread; the synthesis below is my own.

Why prompt-to-code keeps failing

Frontend quality is visual before it is technical. A good website depends on spacing, hierarchy, composition, typography, color, and a hundred small decisions about negative space. A coding agent prompted with "make it premium" has to invent all of that simultaneously — and the easiest path to a passing answer is the average of every premium site it ever saw. That's why AI landing pages converge: the agent is solving a different problem ("make plausible code") than the one you're actually asking ("make it beautiful").

Image generation collapses that. The model gets a much narrower brief — "this composition, this typographic weight, this restraint" — and the only output is pixels. Steering taste in an image model is much faster than steering taste through code review.

Comparison of prompt-to-code vs image-to-code outputs

The workflow, in six steps

Step 1 — Generate the site section by section

Don't ask for one full-page screenshot. You lose detail, the model under-decides layout, and you end up with something that's hard to faithfully recreate. Generate one image per section: hero, about, product grid, services, testimonials, CTA, footer. Each image is high-resolution and focused on one composition problem at a time.

Give the model a real brief, not vague adjectives. "Make a cool website" produces nothing useful. "Minimal but creative landing page for an AI note-taking app for students — calm, slightly playful, not generic startup" gives the model something to work against.

Section-by-section image generation example

Step 2 — Use style words that actually mean something

"Modern" is dead. It means nothing now, because every model has averaged a thousand definitions of it. Replace it with style directions that constrain the layout: "editorial," "warm," "Swiss-grid," "industrial brutalist," "magazine spread," "Bauhaus." Combine with composition cues — "asymmetric hero," "full-bleed photography," "two-column with vertical rail."

Counter-intuitively, "be creative" works in image models in a way it does not in code models. In code it produces messy state and weird abstractions. In image gen it nudges the model away from default templates. Pair it with constraints: "be creative, but keep the layout realistic and implementation-friendly."

Step 3 — Generate multiple directions, pick the best parts

Run the same prompt three times in different chats. The first run is usually the best hero. The second has a better feature block. The third has a stronger CTA. Mix the best ideas across runs. The cost is a few image quota slots; the upside is you stop settling for the first decent thing the model gave you.

Comparing multiple generation directions

Step 4 — Extract the assets

The generated section images carry real assets: product photos, 3D shapes, illustrations, textures, device mockups, decorative objects. Don't crop them out — regenerate them at higher resolution by asking the model to "generate and extract" the asset against a plain background. The phrasing matters; "extract" alone gets a low-res crop, "generate and extract" produces a clean isolated render.

Strip backgrounds offline with a U²-Net / ISNet based tool (the chatgpt-image-gen skill on this site uses imgnobg locally). Then convert to WebP before shipping.

Step 5 — Turn images into code

Now the coding agent enters — Cursor, Codex, Claude Code, whatever you use. Resist the urge to dump the whole site into one prompt. "Copy all of this exactly" produces mid results. Build section by section: hand the agent the hero image, the extracted assets, named file paths, the tech stack, and a tight instruction to recreate that one composition. Review, refine, move to the next section.

Building section by section in the coding agent

Step 6 — Refine with screenshots

After each section ships, screenshot the rendered result and compare it side-by-side with the reference image. Annotate the screenshot — circle the misaligned button, draw an arrow showing where the hero should sit, mark the spacing that's too tight. Hand both back to the agent with specific feedback. "This is close, but the hero image is too small. Move it 20% up and increase its size by 20%." Repeat until the screenshot matches the reference.

What to pay attention to

Alignment — text, cards, images, and grids on a consistent baseline. Misalignment instantly reads as cheap.
Spacing — generous, intentional, and consistent. The difference between premium and messy.
Responsiveness — check tablet and mobile before you celebrate the desktop view.
Brand consistency — every section should feel like the same product. Colors, typography, image style, button shapes.
Assets — strong assets carry a frontend. Weak ones drag it down regardless of layout.
Smoothness — transitions between sections should feel composed, not stitched.

Why this unlocks creativity, not just speed

The bigger win is what happens in the visual design phase. You can explore six directions in fifteen minutes — Swiss-grid editorial, industrial brutalist, soft Y2K, sumi-ink minimal — before committing to the one you build. That kind of exploration was never realistic when each direction cost an evening of front-end work. The workflow gives you a real design phase back.

You still need taste. The model doesn't know which direction is right — you do. But the cost of trying directions has collapsed, which changes how generously you explore before locking in.

Image first. Code second.

That's the whole thesis. Generate the design. Extract the assets. Code section by section against the reference. Refine with annotated screenshots. None of it is novel work — every step is a tool we already have. What changes is the ordering: visual decisions before structural ones.

Practical caveat: this is more work than "one prompt does everything." That's the point. The reason AI-built frontends feel same-y is that the easy path collapses all the decisions into a single token stream. Putting visual decisions on a different surface lets each tool work at what it's actually good at.

Original thread by Leon Lin (@LexnLin): https://x.com/LexnLin/status/2048791596137632126. This synthesis is mine, the visuals are his. Worth following if you spend any time building with AI tools.