UX to Entice Users to Train SaaS Machine Learning Model - Case Study

I’ve been responsible for the UX and overall Product Design of several web applications that all faced the same issue: In order to improve our machine learning model, we need our users to label information and correct the ML’s output. How do we entice our users to label information in order to train our machine learning models within the user flow without making it feel like work? How do we balance the ML’s initially minimal output with getting users to manually label data? 

I’ve seen this design problem with use cases varying from property management analytics dashboard, to a chatbot analytics dashboard, and most recently with a resume formatting tool for recruiters. This blog post will be a case study of the full design process for a resume formatting tool with a focus on the ML labeling design later in the process.

Background

This client came to me with an existing web application targeted towards agency recruiters and their support teams. The problem being solved by this web app is that recruiters need to reformat candidates’ resumes all into the same agency-branded template, and sometimes remove any of the candidate’s identifiable information. The status quo of how this is currently done is manually, with a lot of copy-paste from the candidate’s original resume over to the agency’s branded resume template and front-sheet template. Turns out that’s why recruiters who message you on LinkedIn also ask you to type up your resume in a docx format (am I the only one who thought that was weird and annoying?)

As part of the questions that I asked the client, I discovered that their fully functional existing web application only had users who were friends and family, none of which were active users. Forms of marketing had not been fruitful. As such, I identified the core problems as a lack of user retention, user engagement and therefore a low percentage of users training the ML models.

Creative Process

I communicated to the client that the approach I take will involve doing a redesign without any constraints to the development, but then will be followed with reeling back the designs to iterate on their existing designs with a focus on what dev work will have the highest impact. As a part of this process, I also emphasized that I won’t be looking at their existing designs until a later point, as I didn’t want it to limit or create a bias in my creative thinking.

As this was a specialized B2B use case that I fully didn’t understand, since I’ve never worked as a recruiter, I started with User Interviews as a means to gain a deep understanding of the problem the web app is trying to solve and to understand the motivations and goals of the target users.

User Interviews

The client, who was previously a recruiter and understood the market, naturally seemed confident that most recruiters think as he does and formats resumes like he does.  I wanted to see the status quo of how users are solving the problem that the client was trying to automate. I’ve found that with User Discovery Interviews, there is diminishing returns in terms of new information gained after about the 5th interview from the same user segment. Below I’ve included the user interview questions that I asked on the Zoom calls.

User Interview Questions and Why

I’d like to understand your work-flow of a recruiter’s resume editing process. I will be asking you some questions and watching you perform some work on screen-share. For the questions I ask, I’m looking to understand your perspectives based on your personal experiences in the industry.

  1. Can you show me a typical end to end work-flow of what you do with 1 job candidate, from receiving the email up until you submit it to a client? When you do this, can you please think out loud and narrate what you’re doing, I might interrupt with some questions. [Why? Trying to observe the tasks that may be overlooked or under-reported by the recruiter because they’re so used to it. Observing for key points of frustration or boredom communicated by human error or changes in their tone, rather than words. Digging deeper with questions about why they did certain things the way they did, and questions to lead to what’s important to them.]

  2. What is the most frustrating part of this process? [Why? Trying to see if what they verbalize as the most frustrating part matches my observation from the previous steps, as it will reveal the deepest pain point]

  3. Are there some parts of this process that you made up yourself, perhaps short cuts, or something to be more efficient? [Why? Usually, if there are any such “hacks”, it makes for a good product feature or selling point.]

  4. What do you enjoy most about this process? [Why? This is generally what motivates them to do their job and reveals their motivation, further allowing design centered thinking.]

  5. What is your biggest hurdle in achieving success in this process? [Why? This is a higher level question when compared to #2 and allows for further design centered thinking like #4. Success in this context is important to talk about since their status quo process is just a means to that end.]

  6. What is your title, and responsibilities? [Why? To see if there are any correlations amongst such demographic data and insights found unique to each interview.]

Initial Sketches

Based on watching these users show me their process and express their motivations I started sketching all of my ideas on how it could be better done within the confounds of a web application. It was particularly important to get these ideas out of my mind before moving on to the next step, where for the first time I look at the user’s current design and user flow. These designs were never sent to the client, they were for me to reference at a later point (hence the rough presentation).

MLUXSketch.jpg

UX Audit

I have written extensively about my UX Audit process on this third party blog, so I won’t get into too many details in this post. 

Since the problem was identified as user retention and engagement, without which the ML will be completely useless, I focused on a new user’s experience uploading their first resume, making corrections and downloading the ML output in the form of a reformatted resume (the happy path).

I approached it as if I was doing a user test on myself and annotated all of the parts of the process, considering important UX heuristics.

Here is a small set of the issues I pointed out in the UX audit:

SaaSUXAudit.png

Redesign

By going through the process as it was originally designed, I found myself constantly running into the question of “What do I do next?”. To fix this problem, part of the redesign required improving the user onboarding and navigation structure by fixing the visual hierarchy and limiting the choices the user has. A great analogy for this part is Turbotax. Step-by-step, drawing attention to things displayed when the user needs them. As a part of this, I proposed a very simple 3 step process: Upload > Edit Labels > Preview and download.

MLUXRedesign.png

Machine Learning UX

The important part of designing a way for the users to correct ML output, and label data from scratch, is designing for the edge cases and finding a happy medium. After speaking with the team’s deep learning AI specialist (the ML guy), I identified the following to design for:

-ML model labels nothing and the user needs to label everything

-ML model labels a few things and the user needs to label everything else

-ML model labels everything and the user needs to check them and make corrections

As a part of my newly proposed 3 step navigation structure, this would all be in the Edit Labels section. As a part of this section, a great analogy to communicate my thinking process was Duolingo’s interface. After a user creates their first label, gamification and positive reinforcement would be extremely important to encourage the user to create subsequent labels.

Duolingo.jpg

As a part of it, a user should not be able to see a rough bare-bones template output (as was the case in the designs at the time) and instead, would need to label a certain amount before ‘unlocking’ a more satisfying, complete looking output. Of course, as time went on and the ML model got better, they would automatically see good outputs. But in order to get there, the designs must first incentivize users to make labels.

I identified that the mechanism of the ML labeling interface would need to communicate the following:

-what things were labelled by ML 

-what to label (I.E. first name)

-how to label something

-how to correct a label (I.E. last name was labelled as first name)

-how to correct the highlighted text (I.E. Arvand Alviri is labelled as first name, but the user should be able to change the highlight to be only Arvand, instead of removing the label and re-highlighting it from scratch)

-how to remove a label

UXmachinelearning.png

Final Steps: User Analytics to Correct Machine Learning Models

For the final steps, I gave the client a user analytics tracking plan, that would help them make future product decisions, as well as gather extra data for the ML model. Things that I suggested they track in their user analytics tool included:

Screen Shot 2020-09-16 at 2.43.32 PM.png

As an example, these two would help identify which labels are most commonly missed by the ML, and which are most commonly mislabeled by the ML. 

Other user analytics events that focused on user retention and engagement over the overall product as well.

Client LinkedIn recommedation

Client LinkedIn recommedation

Why User Analytics are Under Utilized

11 years ago when I was doing my first ever tech project, which turned into my startup that self-taught me everything and started my tech career, I remember finding out about analytics tools and being blown away. “You mean you can find out exactly where users are dropping off, which features they use the most, and essentially gauge how happy they are with specific features?”

As I went on in my career, working at various startups as a UX Designer with Product Management duties, I was surprised to see how underutilized these analytics tools were. I thought “oh it’s because they’re just a scrappy startup”. They seemed like something every startup had installed, because they’re supposed to, but never actually used. When I started working as a Product Manager working with a Y Combinator team that had just been acquired into a bigger corporation, ‘scrappy startup’ was no longer an excuse in my mind as to why user analytics was under utilized. In this case I realized it’s because as one of a few Product Managers for a banking iPhone, Android, mobile web, and web app, there just wasn’t enough time to balance all the agile meetings and feature requests from c-level executives, developers, the marketing team, app store user feedback, and also make data based product decisions.

For the last 5 years I’ve been working with various Fortune 500 and startups, and I’ve also noticed that at best “yes, we track analytics and everything is installed, yes, we want to be data driven but no, we haven’t actually looked at our data in a way to guide product decisions”. Maybe I just haven’t come across a company that has the culture of being truly data driven, but otherwise I have some theories as to why user analytics data is underutilized in product decision making.

  • Not enough bandwidth for product managers to analyze and make sense of data

  • Analytics tracking gets out-dated with every new iteration and feature

  • Data doesn’t feel statistically significant

  • Too many questions on “how” to interpret the data

As a part of UX consulting, I have been conducting usability tests, and have found that a simple Net Promoter Score survey on each user test is an easy way to gauge improvements upon each iteration of the clickable prototype or product shown in the user test. For example when there is a 2 point jump with every 5 user tests, the latest iteration, without a doubt, is much better.

So I came up with a hypothesis, what if products ran NPS surveys, within various sections of their product, and maybe even compared the NPS scores against each other. Since it’s one data point, there isn’t a need for much bandwidth for interpretation. Since NPS is a type of qualitative data, they don’t need much traffic for it to have statistical significance. Since it’s simple, “how” to interpret the data doesn’t become a point of argument.

I looked online and found some solutions that provide this functionality, but found them too complex in terms of user experience or pricing, or they lacked the capability to measure NPS scores contextually.

So I built a basic minimum viable product to test this hypothesis, Userglee.com for actionable Net Promoter Scores. There is a lot I can add to it and a lot of directions I can take it, but currently looking for feedback on the bare-bones MVP to see if there is anything worth pursuing further.

How to Manage a Freelance User Experience Designer or Researcher

Having worked remotely as a User Experience Researcher and Designer for the last 5 years, I’ve had clients that have been able to utilize me to my best abilities, push me to improve and even increase my own standards. I’ve also also had clients that have barely given me the opportunity to show what I’m capable of. This post will talk about how not to be the latter client.

Be strict with your business goals, but flexible in your design approach

Design is a tool to reach a business goal. Some clients want the designers to literally be anyone who can use Sketch or Adobe XD or Figma or whatever other tool, because the client just doesn’t have time to download it and click around. I avoid these clients at all costs. Figma, or InVision or Adobe XD are not skills, they’re tools like a pencil, designed to communicate ideas. The ideas that are being communicated is what’s important. A UX/UI Designer or UX Researcher is someone who solves business problems by solving design problems through making software easier to use and more intuitive. It is their job to say yes to your business goal and how your product vision will help you achieve it. It is also their job to question how you want to arrive there in the design implementation and present alternative, more intuitive solutions, solutions that they’ve seen result in happier user experiences dozens of times before, from their other work.

Set aside time to regularly communicate with the consultant

A great freelancer is mindful of their client’s limited time and aggregates all of their questions, concerns and latest designs for a 1 hour meeting where everything can be addressed rapid-fire, which might lead to a request for side meetings about specific things that went unanswered and require other team members. But sometimes it’s really hard for the freelancer to get this 1 hour meeting in the first place. Obviously the frequency varies for each project, but in my experience 1 to 2 hours per week for a 20-40 hour engagement with the right stakeholders in the client company is all it takes, that is after a project or design sprint has kicked off. The design process is very iterative and requires client feedback, from the right stakeholders. The right stakeholders vary with each client and project, but typically consist of industry experts within the company (for B2B or enterprise design), the CTO, or Product Owners. Anyone who can say things such as:

-“How will this design address this other use case that we now realize is super important.”

-”We previously had a similar design for this specific piece, and found that it decreased user engagement because of x, y and z.”

-”It’s really important to also do user interviews with this other demographic.”

After the meetings, the designer should have enough direction to do a bunch of work until questions or requirement for feedback on new designs or new research starts to pile up until they’re blocked and literally cannot continue to work until they get the answers they need from the client.

In the beginning of a project or sprint, this meeting time requirement is much higher because at this discovery stage the scope is being set for the design. There should be discussion about what problems are being solved, what the design goals are, why the designer was even hired and how all of this achieves the business goals. A good designer will always push back on requirements and challenge the client’s ideas, which also adds to the time requirement. 

Top Mistake of Designing a Minimum Viable Product

This bothers me a lot as a UX Designer, I see it from Fortune 500 companies, to venture funded Silicon Valley startups, to some-guy-in-a-basement startups. The issue has to do with how new features and new products are defined. Regardless of the client’s experience building digital products, SaaS, mobile apps, web apps, the majority generally tend to screw this up. As a Product Design Consultant, I haven’t worked on one feature where I didn’t have to provide clarity to strip down and simplify the scope. The real issue is that the client tends to give little thought to the product scope (amount and complexity of features) for the first version and how it feeds into the second version, through user feedback and analytics data, and how that feeds into the third version. Whatever you think your Minimum Viable Product is, can be stripped much further down.

I have a few theories as to why clients make this mistake. 

  • They’re too passionate and excited about the new feature or product, so there is a lot of attachment to the final product vision, which makes for emotional decision making

  • They have a really grand vision and they *think they’ve stripped down the scope to the core, when they haven’t

  • They think they have to build everything because of a bureaucratic reason

  • They think the user won’t like the feature or product, because the real value is in the combination of all of those features being in one place

This mistake is important because building too much, too fast, will create a larger time frame between their starting point and their point of receiving positive feedback from users (qualitative feedback, product market fit, user analytics data or some KPI). This larger gap will lead to them running out of resources faster, whether it’s funding, or psychological motivation to continue.

I will use a couple of examples based on real clients I’ve worked with to explain my theory and how to go about the thinking. 

1- The client was building a SaaS analytics dashboard for an enterprise use case. As with any data visualization dashboard, giving the user the ability to see an overview snapshot of data to answer their underlying business question is the primary goal, and the secondary goal is for them to dig deeper into specific data points to further uncover information that will guide their business decisions. The first step is to build the primary overview dashboard with the snapshot 30,000 ft view of the data and optimize that in the first version, then focusing on iterations. The second step is to build the deep dive views. If there is no interaction with the overview of the data because it’s deemed not interesting enough, because it’s answering the wrong business questions that the user doesn’t actually care about, then the deep dive views don’t matter, the user won’t try to dive deep into something irrelevant. The way to measure the interaction with the snapshot view of the data without having built the deep dive views was to allow click actions into that overview data, but just provide static deep dives, rather than fancy interactive deep dives that lead to more deep dives. In short, the secondary view, which in this case was the analytics deep dives, had to be stripped down a lot.

2- The client was building a social application. “We need the ability for users to comment, and for other users to respond to these comments via text, voice, and video.” If the initial user doesn’t comment, the corresponding feature of text, voice and video is utterly useless and literally unusable in the UI. So we built the initial comments section, measured the interaction using an analytics tool, or sometimes manually tracked what percentage of users left the initial comment. The idea was that if that percentage is high enough, we would continue with the rest of the product hypothesis, if it’s low, rethink the initial comment.

In short, design the core, core-core, like the actual core. Then strip it down further, and build it. And measure it, optimize it. And once it reaches a certain level, then you can start to build the rest, otherwise “the rest” will never be used, and won’t be given a fair chance to be measured.