1. Transcripts:
We tested manual transcription against AI transcription on eight one-hour interviews. Manual transcription took roughly two hours per interview. With AI tools like Zoom AI, that time dropped to nearly zero—16 hours saved across eight interviews, and more than 100 hours per quarter for a moderately active research team.
Where we see real value:
- Fast, searchable records of conversations
- Instant recall of stakeholder discussions
- The ability to focus on listening during interviews instead of note-taking
Where we draw the line:
- AI transcripts are raw material, not analysis
- We always review the recordings alongside transcripts to capture tone, pauses, and non-verbal cues that AI misses
Key takeaway: AI captures words. It does not capture meaning. Even in this obvious use case, human oversight is essential to preserve the richness of qualitative research.
2. Catching gaps in planning: AI as a strategic sparring partner
Using AI to challenge research plans delivered tangible benefits. Claude, in particular, excelled at prompting critical questions like:
- "What are your success criteria?"
- "How does this tie back to business objectives?"
Even senior researchers can overlook these under tight deadlines, making AI a valuable safety net.
We tested Claude Sonnet 4 and ChatGPT‑5 as simulated “experienced UX researchers” to:
- Question our methods
- Identify logical gaps
- Flag missing success criteria
Crucial point: AI was used as a supplement, not a replacement. Insights were valuable only when critically reviewed by human researchers who understood the context and nuance.
3. Challenging methodology choices:
What we asked AI to do
We positioned the model not as a passive assistant, but as a critical thinking partner. Its role was to:
- Analyse our plans and assumptions with rigour, not agree by default
- Identify gaps, risks, and flawed reasoning we may have overlooked
- Offer alternative perspectives and correct us clearly—with evidence-backed rationale
Help us reach sharper clarity, stronger logic, and higher-quality decision-making aligned to product outcomes
- This framing was designed to support three key business goals:
- Reduce churn and increase customer retention, driving revenue growth
- Resolve high-impact experience friction, improving product usability and adoption
Strengthen brand positioning within the monday.com ecosystem, reinforcing value and differentiation
Keeping business context front and centre:
AI often reminded us: AI often reminded us of the gaps of Information we could be missing and relating the methodology suggested to those gaps
Why Claude stood out:
- Structured, scannable outputs with clear step-by-step reasoning
- Easy to extract actionable insights quickly
- Less verbose than ChatGPT‑5, which tended to produce long, generalised responses that were harder to parse
The critical catch: In testing, roughly 55% of Claude's recommendations required correction—it forgot context, invented plausible-sounding KPIs, or suggested unrealistic timelines. This is not a rounding error. Experienced people must remain informed at all times.
In practice, we now treat these tools as:
- A thinking partner for experienced researchers
- A way to improve quality, not to save time (planning still takes roughly as long as doing it yourself due to iterative back-and-forth).
4. Methodology reminders and theory refreshers
Another genuinely valuable use case: using AI as an on-demand research textbook.
Tools like Google AI Studio, Gemini, and ChatGPT‑5 helped us:
- Remind ourselves of less commonly used methods, helping to achieve triangulation.
- Refresh knowledge on when particular methods are appropriate
- Compare approaches at a high level
This is especially useful because researchers tend to fall into methodological ruts, defaulting to familiar approaches (e.g., in-depth interviews) even when alternatives might be better. AI prompted us to reconsider methods we already knew but weren't actively utilising.
Rule of thumb:
- Use AI for "what exists?" and "what's this method for?"
- Ignore AI's suggestions on timelines, sample sizes, and budgets—they were consistently disconnected from reality (e.g., recommending 30–35 participants over three months when we had two weeks and a budget for five participants).