Designing with Codex: A 6-Day AI Workflow Experiment

01.

Why this exists

A six-day experiment in AI-assisted product design. I wanted to test how far one designer can get from concept to working prototype using Codex, MCP servers, and modern AI tooling as design and engineering partners. Lanify, a personalised onboarding lanyard creator, was the vehicle. The product is the means. The workflow is the point.

This article documents what worked, what did not, and what I would change next time. It is not a product case study. There were no users, no stakeholders, no measurable outcomes. The signal here is about the design and development workflow, not the product itself.

03.

The self-directed brief

I gave myself a constrained brief to test the AI workflow against a realistic onboarding product:

A tool that lets new employees personalise their lanyard during onboarding within 2 minutes, while keeping output on-brand.

The brief was a vehicle, not a validated problem. I did not interview users, audit existing onboarding tools, or test against real adoption. Anyone reading this case study should treat the product framing as a creative constraint rather than evidence of market need.

What the brief gave me was useful scope:

A clear user task (upload, personalise, submit)
A natural tension (creative freedom versus brand consistency)
A measurable surface goal (under 2 minutes, fewer than 10 clicks)
Enough complexity to stress-test the AI workflow (image processing, state management, theming, responsive design)

04.

My role

Solo. I owned product framing, UI design, visual direction, prompt engineering, and code review. Codex generated implementation code under my direction. I treated it as a junior engineer who needs clear constraints, not as a magic box.

The interesting part of the role was the division of labour between me and the AI tooling. I'll come back to this in the workflow section.

05.

Process

Defining the experience

I framed the product around the onboarding journey rather than the design tool itself. The user task was simple: add a profile picture, choose a card direction, customise within brand rules, submit for approval.

This framing helped me avoid building a generic editor. Each screen needed to move the user closer to a finished lanyard, not towards a more powerful tool.

Reducing the flow

The first iteration asked users to make too many choices before seeing value. The flow required more than 10 clicks, which was too slow for the 2-minute target.

I changed the structure from manual creation to guided personalisation. Instead of asking users to build a card from scratch, Lanify shows generated card variations first. Users select one and customise it.

This was the strongest design decision in the build. Starting from a complete design rather than a blank state cuts decision fatigue and gets the user to a finished output faster.

Designing brand-safe customisation

Full colour pickers and unrestricted pattern controls would create inconsistent results. I limited customisation to company-approved options:

Brand colour swatches
Supplied SVG pattern options
Controlled pattern colours
Adjustable motif size, opacity, rotation, and grid
Approved lanyard and holder colour combinations

The constraint was the feature. Personalisation works because it is bounded.

Improving the portrait

The portrait drives the emotional value of the card. I added background removal so the person stands out clearly against the card design, then added an editing modal with crop, rotation, undo/redo, and visual filters.

06.

Key Design Decisions

1. Guided flow instead of a full editor

Tension: The early flow had too many choices and more than 10 clicks.
Decision: I split the product into focused stages: upload, variation selection, card customisation, lanyard customisation, submission.
Why: Users should move quickly through onboarding rather than learn a design tool.
Rejected: A single advanced editor with all controls visible. More flexibility, higher cognitive load.

2. Preset variations before advanced controls

Tension: Starting from a blank card made the experience slow and intimidating.
Decision: Show ready-made card variations first, then let users customise one.
Why: Users see value early and only refine what matters to them.
Rejected: Asking users to pick every colour, pattern, and layout before previewing a result.

3. Brand-safe choices instead of unrestricted customisation

Tension: Personalisation risks off-brand results.
Decision: Constrain colours, patterns, and lanyard styles to approved options.
Why: Employees still personalise the card, but the company keeps design consistency.
Rejected: Free-form colour selection. More control, weaker brand governance.

4. Background removal as a visual quality layer

Tension: Uploaded photos with messy backgrounds made the card feel unpolished.
Decision: Use background removal so the portrait sits cleanly on the card design.
Why: The person becomes the focus and the pattern system works better behind them.
Rejected: Cropping the original photo into a rectangle. Faster to implement, weaker visual results.

07.

Solution

The final Lanify experience includes five main stages.

Profile Picture Upload

Users start by uploading a profile picture or taking a photo. The screen explains the requirement clearly: use a simple, uniform background.

Once the image is ready, the state changes to show edit and remove actions.

Portrait Editing

Users edit their current picture before saving it back into the profile. They adjust crop, rotation, and filters inside a focused modal.

Card Variation Selection

After the portrait is ready, Lanify shows multiple card designs using different colours and patterns.

This step helps users move fast by selecting a strong starting point.

“Variation selection shifts the task from creating from scratch to choosing and refining.”

Card Customisation

Users customise the selected card with brand-approved colours, SVG patterns, pattern colours, and advanced controls.

The live preview makes each decision visible immediately.

“The interface supports quick choices first, with advanced controls available for detailed tuning.”

Lanyard and Holder Customisation

The final step lets users choose the lanyard colour and card holder finish.

This closes the loop between the digital card design and the physical object employees receive.

08.

The AI workflow

Why Codex

Four reasons:

First, access. Codex is provided to employees at CGI, where I work, so the tooling cost was zero and the integration with my existing workflow was already in place.

Second, no token limits. Codex did not throttle on long sessions, which mattered for a 6-day build where I was prompting heavily across multiple stages. Cursor and Claude Code both impose usage caps that would have forced me to ration prompts.

Third, familiarity. I had been using it on day-job work for several months, which meant I was not learning the tool and the project at the same time.

Fourth, MCP support. The Chakra UI, Next.js, ESLint, ARIA, and Motion MCP servers I planned to use were all stable in Codex.

The trade-off was output quality. Codex code was good enough to ship a prototype but not consistently excellent on first pass. I accepted that because iteration speed mattered more than first-draft quality for an experiment of this scope. On a longer build with production stakes, I would re-evaluate.

Prompt patterns that worked

The most useful prompt pattern I developed was the "restart-safe plan":

Plan a way to [task] in a restart-safe way

This forced Codex to write a plan to a file before executing. Two benefits: I could review the plan before it generated code, and if the session crashed or I needed to restart, the work-to-date was logged. This pattern alone saved me roughly half a day of redone work across the project.

MCP servers I used and what each contributed

Chakra UI MCP: Component library access and design token consistency. Without this, Codex would invent component APIs that did not exist.
Next.js DevTools MCP: Application runtime access. Useful for debugging without leaving the AI loop.
ESLint MCP: Lint feedback inside the AI session. Caught issues before I had to.
ARIA MCP: Accessibility specification access. Made it possible to enforce ARIA correctness without me being the bottleneck.
Motion MCP: Animation patterns and best practice. Reduced trial-and-error on transitions.

The pattern across all five: MCP servers turn AI tooling from a generic code generator into a context-aware collaborator. Without them, you spend most of your prompts correcting the AI's assumptions. With them, you spend prompts directing intent.

Where Codex genuinely accelerated me

Codex implemented the SVG pattern overlay system in roughly 30 minutes. This is the system that lets users adjust pattern size, opacity, rotation, and grid alignment on the lanyard card. Manually, I estimate this would have taken me one or two days, mostly on the maths for grid alignment and the rotation transforms.

The acceleration came from a specific prompt pattern. Instead of describing the system in abstract terms, I gave Codex one example SVG and asked it to generalise the structure into a configurable component. Codex picked up the pattern from the single reference and built the controls around it. This was faster than describing the requirements in prose and faster than starting from a blank component.

The lesson: AI-assisted development works best when you give it concrete examples to generalise from, not abstract specifications to interpret. One good reference is worth several paragraphs of prompt.

Where Codex made a wrong call and I overrode it

The clearest override happened on the image editing surface. Codex's first implementation made three UX mistakes, each rooted in the same default: expose every control as its own visible surface.

Mistake 1: Tabs for adjustment controls. Codex split brightness, contrast, and filter controls across separate tabs. This forced the user to navigate between tabs to make a single adjustment, then guess whether they had finished before moving to the next step in the flow. The structure prioritised feature organisation over user task.

Mistake 2: A long form for granular controls. Undo, reset, zoom in, zoom out, rotate, and flip were rendered as a vertical list of form fields below the image. The most-used controls (zoom and rotate) were buried below the least-used (reset). The form pattern was wrong for direct-manipulation actions on an image.

Mistake 3: All controls treated as equal weight. Filters and adjustment values were given the same visual prominence as crop and rotation. There was no hierarchy between "primary task" controls and "fine-tuning" controls. I overrode all three with one principle: match the control to the user task, then apply progressive disclosure.

Concretely, I directed Codex to:

Move undo, reset, zoom, rotate, and flip into the image preview itself, attached to the artefact they manipulate
Replace the tabs with a preset filter row (mono, silverstone, noir, dramatic, vivid, cool, warm) that lets the user move forward with a single click
Group brightness, contrast, and granular filter values into an advanced accordion, collapsed by default, available when needed

The lesson generalises beyond this build. AI tooling defaults to the most common pattern in its training data, which for "image editor with controls" is "tabbed UI with form-like controls". That pattern fits Photoshop. It does not fit a 2-minute onboarding flow where the user is not a designer and the goal is "good enough quickly", not "complete control". Catching this required design judgement that the AI did not contribute to. The override matters more than the override rate. AI-assisted development needs a designer who knows when the default is wrong.

Where AI tooling fell short for design work

The clearest limitation was visual identity. Codex's default output was not broken. The components worked, the layout was sound, the spacing was reasonable. The problem was that it looked generic in a specific way: the same way Bootstrap sites looked the same in 2014 and Material Design sites looked the same in 2018. In the AI era, the equivalent is "looks like it was generated by an AI", and that signal is now legible to users. Two corrections fixed this.

First, I built a theme system in Chakra UI using CGI brand tokens (colours, typography, radii, spacing) as the base layer. Codex was no longer choosing visual decisions from its training-data defaults. It was rendering against a constrained system that I controlled. Same component library, distinct visual outcome. Second, I rejected the default form patterns Codex reached for. Long vertical forms with labels above inputs were the AI default for "let the user configure something". I replaced those with preset buttons, accordions, and direct-manipulation controls wherever the task did not genuinely require a form. The UX got simpler. The product stopped looking like every other AI-generated tool.

The gap here is taste, not capability. Codex could render any visual system I defined. It could not tell me, on its own, that the default would make the product look like every other AI-generated app on the internet. That diagnosis required a designer who has watched this pattern repeat across three tooling generations and recognised the fourth. AI-assisted development accelerates execution. It does not replace the design eye that knows when "works correctly" is producing "looks like everything else".

What I would never delegate to AI

Product framing. The "guided flow over full editor" decision was the strongest design call in this project, and it came from product judgement, not implementation skill. Asking AI to make that call would have produced a generic editor with every control exposed, because that is the most-frequent pattern in its training data. The decision to constrain the surface to a 5-stage flow required understanding what a new employee was actually trying to do, not what a designer building a tool would build by default.

Modelling the user. AI cannot think as the user. It can simulate plausible user behaviour from training data, but it cannot sit with the actual tension a real person feels when faced with a blank state, an unfamiliar tool, or a 2-minute time pressure during their first day at a company. Every decision in this build (preset variations first, brand-safe constraints, background removal, controls attached to the artefact they manipulate) came from imagining a specific user in a specific context. AI can implement those decisions once made. It cannot make them.

Judging what is worth building. AI compresses the cost of building anything. That makes the question "should this exist" more important, not less. The default mode of AI-assisted development is to build more, faster. The override is to build less, with intent. The strongest design move in this project was the things I did not include: no full colour picker, no free-form pattern editor, no advanced typography controls. Each was implementable in minutes with Codex. Each would have made the product worse. The pattern across all three: AI accelerates execution. Execution is no longer the differentiator. Judgement is. The designers who matter in the AI era are the ones who can answer "why this, not that" in a way that holds up to scrutiny. AI cannot answer that question. It can only execute the answer once you have it.

What I would always delegate to AI

Three categories, each with the same cost-benefit shape: high cost to write manually, low cost when the AI gets it slightly wrong, because I review every change before it ships.

Boilerplate scaffolding. Form validation, error states, loading states, empty states, accessibility attributes on standard components. None of these decisions are interesting once the design is settled. They are pattern-matching tasks where AI tooling outperforms my willingness to stay focused. On this build, Codex handled the full set of states for the upload, editing, and submission stages without me writing any of it manually.

Implementation of a defined system. When I knew what I wanted, Codex was reliably faster at building it than I was. The clearest example was the SVG pattern overlay system. I gave it one reference SVG and a description of the controls (size, opacity, rotation, grid alignment). It generated a configurable component in roughly 30 minutes that would have taken me an afternoon. The same pattern held for repeating-component work: variant cards, swatch grids, accordion sections. Once the design decision was made, implementation was the cheap part.

Specification compliance. This is where MCP servers earned their place. Chakra UI component APIs, ESLint rules, ARIA roles, Motion easing curves. These are all rule-based domains where the correct answer is documented somewhere and AI tooling can apply the rules consistently. I am not faster than a tool that has the specification loaded into context. Trying to be is a waste of time.

The pattern across all three: AI is good at what is already decided. It compresses the cost of executing decisions, applying rules, and rendering patterns. The review burden is real but small, because the artefacts being generated are constrained enough that wrong outputs are visually obvious. I check every change. I do not write every change.

09.

What this experiment did not test

I want to be explicit about the limits of what this case study proves:

No real user task completion. The 2-minute target is unverified. Real users may take longer or get stuck at points I have not anticipated.
No drop-off measurement. I do not know which of the 5 stages users would abandon.
No validation of the underlying problem. I did not interview HR teams, new joiners, or workplace teams to confirm that personalised lanyards are an unmet need.
No adoption signal. No company has tried this. The brand-safety system is theoretical.
No measurement of design impact on onboarding sentiment. Whether a personalised lanyard actually improves the onboarding experience is an unanswered question.

If I continued this project, the next move would be to find one company willing to pilot it during onboarding and measure the things above.

10.

Reflection

The strongest learning was about division of labour between designer and AI. Codex accelerated execution but did not improve product judgement. The decisions that made Lanify work (guided flow, preset variations first, brand-safe constraints, background removal) all came from design thinking that the AI did not contribute to. The AI made those decisions cheaper to implement, not better.

The second learning was about MCP servers. They are the difference between AI tooling that wastes your time and AI tooling that compresses it. Picking the right MCP stack for the project deserves as much thought as picking the framework.

The third learning was about scope. Six days for a working prototype with this surface area was tight but achievable, and it was only achievable because I time-boxed ruthlessly and did not chase polish. A side project with no user pressure is the wrong place to perfect anything. The right move is to ship something interrogable and write down what you learned.

If I ran this experiment again, I would start with the workflow questions, not the product idea. I picked the lanyard concept first and then learned about Codex through it. Reversing that order would have produced sharper workflow learnings and a worse product, which for the goal of this experiment would have been the right trade.

11.

Github Code

https://github.com/gil00pita/lanify

TL;DR