Back

CVS annotation app

CVS Annotation App is a cross-platform tool for labeling Critical View of Safety (CVS) criteria in laparoscopic cholecystectomy frames to generate training data for an ML model. As a Product Designer, I led the end-to-end design from MVP to iteration. My role was to translate clinical labeling needs and technical constraints into a simple, reliable workflow for surgeons. Key responsibilities included defining the annotation flow, designing Flutter-friendly UI patterns and states, running surgeon interviews, synthesizing feedback, and iterating from video-based tasks to an image-first experience with per-task comments and visual hint references for each criterion.

Role Product Designer (Solo)
Work type New product (0→1)
Team Product Manager, Flutter and Backend developers, QA
Contribution Product discovery, User experience, Mobile app design, User interviews
CVS annotation app hero

Project overview

The CVS Annotation App is an internal labeling tool used to annotate whether the Critical View of Safety criteria are achieved in laparoscopic cholecystectomy frames.

While building an AI video analytics platform, our team discovered a key blocker: we didn’t have enough labeled examples of CVS to train and validate the model. Buying data or outsourcing labeling wasn’t an option, so we decided to generate high-quality labels in-house. That required a simple, reliable tool that surgeons could use for repetitive annotation work—delivered quickly and cheaply.

Problems

Users

Monotonous, high-volume labeling needs to be fast and easy

Business

Lack of labeled CVS data

Challenges

MVP and tight timeline

Build and ship quickly with minimal scope

Low cost

No separate native apps

Technical constraint

Flutter for one shared UI across iOS and Android

No market analogs

No ready reference patterns for this exact clinical labeling task.

MVP v1: Video-based tasks

Work process

Project overview

The CVS Annotation App is an internal labeling tool used to annotate whether the Critical View of Safety criteria are achieved in laparoscopic cholecystectomy frames.

While building an AI video analytics platform, our team discovered a key blocker: we didn’t have enough labeled examples of CVS to train and validate the model. Buying data or outsourcing labeling wasn’t an option, so we decided to generate high-quality labels in-house. That required a simple, reliable tool that surgeons could use for repetitive annotation work—delivered quickly.

Initial hypothesis

Based on the SAGES CVS Challenge dataset format, I hypothesized that each labeling task should include a short video clip, and the labeler would annotate the last frame. I validated this approach with a surgeon interview — they initially agreed that video context could help identify the surgery phase and understand the surgeon's actions leading up to the frame.

MVP v1: Video-based tasks

In the first MVP, each task included a thirty-second segment split into three-second clips, a decision on whether CVS criteria were achieved on the final frame, and a text-based description of the annotation rules and criteria.

User feedback: “The video creates more problems than it solves”

After launch, I collected feedback from labelers and followed up with the original surgeon and other participants. Two key issues emerged.

Mismatch between clip and final frame
Labelers often saw CVS achieved during the clip, but the final frame was sometimes uninformative — occlusion, camera angle, tissue position, or instruments blocking anatomy. Even when users understood the rule of labeling the last frame, the decision felt unreliable.

High cognitive load
Labeling is repetitive, and users frequently got interrupted. Re-watching the same clip to regain context was frustrating, increased fatigue, and raised the risk of inconsistent labels.

MVP v2: Image-first workflow

Based on these findings, we shifted to an image-first approach and added features to support uncertainty. Video clips were replaced with a set of selected frames, giving clearer decision moments and removing the need to rewatch. Per-task comments let labelers flag wrong phase, poor quality, or uninformative cases. Criteria descriptions were enhanced with reference images, because interviews showed that even experienced surgeons can hesitate when deciding whether CVS is achieved.

MVP v2: Image-first workflow
Next project

AI video analytics platform