Teachable Machine Outlier Filter
User Research
PROJECT SOURCE
05-685 Prototype Algorithmic Experience
A prototype exercise about how to teach Teachable Machine users to identify and clean the outliers in training dataset by introducing human-in-the-loop method
COLLABORATORS
Individual
PROJECT YEAR
2022
Problem Statement
OUTLIER PROBLEM
TEACHABLE MACHINE DESIGN ARGUMENT
DESIGN IMPLICATION
Large datasets with unexpected outliers: Hard to find but may have huge effect on prediction result.
Webcam intruder from collection process: Consecutive capture make the removal process difficult.
Human-introduced diversity: Human can't precisely control the level of outlier insertion, which may impede the prediction accuracy.
Teachable Machine takes all input as training data but has no ability to recognize outlier from the dataset.
Prediction detail hides too deep for easier understanding of users without ML background
User can recognize outlier if we introduce human-in-the-loop methodology
Simple prediction detail with proper explanation can help user find out the potential outlier’s influence.
RESEARCH GOALS
How might we give feedback to let the user learn how would the outlier affect the accuracy of the trained model and thus provide a higher-quality training sample?
- (RQ1): How could the interface alerts people that they might have accidentally introduced outlier into training dataset?
- (RQ2): How could the interface guide the user effectively filter the outlier suggestions from the Teachable Machine algorithm?
PROTOTYPe design
Initial Solution 01: Human supervised classification
SOLUTION 1
Initial Solution 02: Passive outlier identification and correction
SOLUTION 2
Final solution: "Alert" Interface and "Outlier filter" Interface.
PROTOTYPING SESSION
I tested the prototype on 5 participants. 1 participant comes from CS background, 2 from design background (architecture), and 2 from interdisciplinary background (HCI & cognitive science/management). Most of them have basic knowledge of machine learning or statistics.
Round 1: Tutorial (Control Group)
Round 2: Train new model with human-filtered training set
Round 3:  Apply new model with new training set
Let participants run TM and observe how the current dataset works regard to classifying the designated samples.
Choose the favorite alert system among two alternatives as the test prototype of outlier filter interface in the following rounds.
Let participants optimize the training set manually by removing the suggested outliers in the checklist.
Run TM again on new dataset and observe how the accuracy changes while bearing the outlier pattern in mind.
Let participants manually filter a new dataset based on the learned outlier pattern from Round 2, run TM again and observe how the accuracy changes.
After-session interview and questionnaire
paper prototype for prototyping
checklist for human-in-the-loop
Paper prototype for alert system interface
Outlier checklist
(Red: suggested, Cross: participants' decision)
Quantitative Analysis
The chart below visualizes the prediction accuracy of sample images in 3 rounds for 5 participants. The suggested round (model filtered) has the highest accuracy. But compared with the initial round, the accuracy of manually filtered one has significantly improved due to potential pattern learning from outlier suggestion function.
accuracy result in 3 round prototyping
Qualitative Analysis
The survey and interview after prototyping session gave me more information beyond the prediction accuracy, participants expressed their finding, feeling and suggestion in this process, making design implication synthesis possible.
Interface Design
For alert interface, the second interface is preferred by participants because:

- They can see the accuracy from it and understand the intention of deleting outliers.
- It encourages them to think about further optimization for better performance.

For outlier filter interface, participants attributed the better performance to the simple and clear design.
Other Findings
Participants express stronger curiosity to Teachable Machine's mechanism than expected. While they are willing to learn more about it, participants try to keep an interesting distance from the algorithm, round 3 result disappoints participants with professional backgrounds while they are confident about their judgment in round 2.
Insights & Implications
The “alert” interface stimulates their desire to learn what affect their prediction performance.

The “outlier filter” interface provides a possible paradigm for users to learn about how to distinguish outliers and what effect they have on the model.
NEXT STEP
More Research Questions
Participants' feedback, especially their confusions, also brought some interesting questions for further speculation, which includes:

        - Why Google puts “under the hood” into the advanced function?

        - How to balance human bias (users’ choice) and ML model bias (algorithm bias)?

        - How well could the filter system perform on unpredictable new datasets (generalization)?
Interface Iteration Suggestion
They also provided valuable suggestion on the further interface iteration, which includes:
How to view the dataset:

        - Row layout (integrated into the current TM upload section)

        - Page view (added behind the upload section)

How to identify the marked outliers:

        - In-site marking

        - Outliers section for all groups

        - Outliers section each group

How to backtrack:

        - Archive section (backup the best-performance filtered dataset)

The challenge
Bias brings users convenience through accurate recommendations while potentially harming them from different perspectives. How does harmful biases in TikTok’s Algorithmic Systems affect user experience?
Problem Statement - Usability test on Tiktok's ad
We tested the usability on current TikTok functionalities to find the bias issue among the users.
1
Racial Bias
2
Gender Bias
3
Brand Bias
4
Need Bias
5
Negative Action to Biased Ads
TikTok only recommend posts about users' own race to experienced participants but recommend posts of other races to the naive participants.

Some participants swipe away the other races’ posts because they feel these posts don’t fit their lifestyle.
The male participants got advertisements about sport shoes, games and sexual contexts, while the female participants got advertisements about cosmetics and shopping malls.
TikTok only recommends certain brands of products to participants, which including electronic device and food in our test.
Participants kept receiving ads on the products they don’t want or need to buy.

Even though participants pressed "not interested" to these items, TikTok still recommended these items to them.
Most participants only stay a short time on biased advertisement and swiped them away.

Few of them tried to disinterest the first few biased advertisements they saw but gave up after a while.
RESEARCH GOALS
How might we support users by personalizing their advertisement settings while facing potential algorithm bias?
Qualitative Analysis
We have conducted 3 rounds of interview, including a Think-aloud protocol, a pilot testing interview, and a semi-structured interview with direct-storytelling method on 3 participants.

We collected interpretation notes from those interviews, analyzed them through Affinity Diagram and User Journey Map, and generated higher levels of insights. Here are a few questions that we pay most attention to:
"How do you feel about the advertisement in TikTok regarding algorithm bias?"
"How do you feel about the current advertisement setting user flow? You can share both the positive and negative views with us."
"Are there more functionalities about advertisement you wish TikTok to add in the future? Why? How do you think this can contribute to deal with the algorithm bias and make the algorithm suit your needs?"
"Is there any concern for the possible functionality you just mentioned?"
"What improvement would you like us to do to the advertisement setting interface/user flow?"
Data analysis and synthesis - Interpretation Session + Affinity Diagram
Through interpretation session, we analyze users' need, motivation and behavior, then we grouped and labeled them in an affinity diagram, synthesizing the insights in first-person angle. Users claimed their preference, activities and findings in TikTok, shared their needs, suggestion, complaint, and concern.
affinity diagram
Models - User Journey Map
We built models, user journey map, to help us better summarize and understand users' stance from the interview. This bridges the opinions from users and the following speed dating design implications.
Quantitative Analysis
We have also conducted a survey with 13 questions in 3 categories: app features, advertisements and purchase behaviors on Google Forms to verify our preliminary insights from qualitative research. We received 32 responses, and these results helped us iterate on our insights and following speed dating session.
Most of our participants are between 22-25 years old and use TikTok less than an hour every day.
The survey proves our findings from the interview about the current issues of ad settings and people’s strong need for a tutorial. The ad settings should be easier to find, and the interactive hints are popular among respondents.
The survey verifies that people encounter biased ads on TikTok despite sometimes never realizing it, including buying products from ads (good bias) and receiving inappropriate or fake ads (harmful bias).
The survey unveils respondents’ interest to recommendation ads when they use TikTok to purchase items.
Insights
1
Good and bad bias
2
Desire to mitigate bias
3
Diverse ad preference setting
4
Need for improved navigation
5
Seamless function integration
Users enjoy advertisements with good bias and acknowledge bias-related issues despite not realizing them sometimes.
Personalizing interested categories reflects users' desire to mitigate the bias by controlling the advertisement types they encounter.
Users welcome diverse advertisement preference-setting mechanisms but prefer simple and intuitive ones.
Deep nested ad-related operations and settings create a cumbersome personalization experience for users, leading to a need for improved navigation.
Seamless experience when using TikTok requires the integration of personalization features and existing functionalities.
Low-Fidelity Prototype
Speed Dating
We used speed dating on our insights to help us explore possible futures, validate needs and identify risk factors.
The first one is recognized by most participants.
speed dating 01
- Most participants express interest in real-time feedback from TikTok.
- Users prefer simple and intuitive functionalities.
- Users prefer simple interactions on their TikTok “for you” page.
speed dating 02
- We expected users to consider adding a plugin to eliminate ad biases, but most regard it as too complex and unnecessary.
speed dating 03
- Despite the surprise the sharing preference idea brings, users mostly feel weird about it.
- It triggers their unsettledness towards privacy and daily social interaction pattern.
LOW-FIdelity Prototype
Building on the initial speed dating concept, we conducted a contextual prototyping session using our lo-fi prototype with 7 participants. During these sessions, we identified three critical moments in their user experience: the moment they open TikTok, when they scroll consecutively, and when they watch an ad to the end multiple times. To further explore these moments, we created a physical overlay for our interactive hints and tested them in real-life environments where people typically use TikTok, directly on participants’ phones.
physical prototype
scenario 01
scenario 02
scenario 03
Evalulation
The overall feedback of the prototyping session is positive, our solution increased users' satisfaction a lot. But there are some flaws in physical prototyping process design, which affect the workflow consistency. Considering the final deliver method, the positive feedback weighted more on our final decision.
Final Design Prototype
Here we present our final design solutions for biased ad personalization. Since our solution is represented in the form of extension, we leverage the similar component/style of current TikTok UI design and integrate our extension into the current user flow.
Scenario 1: Tutorial
We’ve moved the deeply nested "Not Interested" button to the right-hand column, positioned next to the "Like" button. When users open TikTok for the first time, the extension will display an overlay tutorial explaining its functionality.
Scenario 2: Interactive Hints
As users continue scrolling through videos, interactive hints will appear. Depending on their choice, the extension will respond differently by either adjusting recommendations (showing fewer or more of similar content) or navigating to the report page.
Scenario 3: Personalization
If users continue watching an ad without taking any action several times, the extension will prompt them to specify their preferences and guide them to the ad personalization page. Here, users can customize their preferences by selecting personalized tags, giving them more control over the ads they see.
Ideation
The dining experience is shaped not only by the food or the rituals around it, but also by the personal memories it evokes. Even an ordinary meal with family, friends, or loved ones holds deeper meaning. We wonder: What if we could capture these moments in a tangible form and share them with others?

Dining connects us to memories and sparks ideas of food, culture, and the stories that unfold around the table. Inspired by the art pieces shown, a tablecloth could serve as a canvas for documenting our meals —where we sit, what we eat, what we do, and who we share these moments with. This medium, simple to fold, pack, and gift, becomes a meaningful invitation. Our friends who receive it are invited to join us as guests to celebrate these intimate and cherished moments of our lives.
case study
PROCESS & DEVELOPMENT
We began by narrowing down to three ideas from our precedent study.

- On-the-go fine dining experience
- Using projection to enhance dining experience
- Designing utensils to simulate conditions that hinder eating to build empathy.

The final idea ended being a combination of aspects from the first two ideas: design and create a tablecloth that would transform to fit different table sizes and arrangements. We wanted to bring in elements of cultural exchange that we explored in the second idea, by drawing out how we eat back home with our family.
ideation 01
ideation 02
Our first prototype was a pill-shaped table linen that would cover the entire surface of the dining table. Placements for plates are marked on the linen to dictate the position of the plates on the mat. We sew on buttons and draw fold lines, which allows the user to fold the linen to serve variety of arrangements and numbers of guests. The paper prototype aimed to achieve a similar result, but in a rectangular form.
prototype 1 01
prototype 1 02
We discovered however, that in the process of creating these foldable linens, we lost our original vision for the project: To share my own personal dining experience to other people. The demarcations of plate placements made it sterile and typical. We explored using different types of fabric to allow varied arrangements, but we ended up backtracking to our very first idea. Instead of creating a table mat that can conform to any table size, we decided to simply create 1 table mat, designed for 1 person of 1 particular dining experience.
There are areas demarcated by different colors of clothes representing the family characteristics, with names attached to each plate. We also added placements for phones and tablets, with QR code to the show or app that they would often watch/use while eating. The compass in the corner is less about exact position of the table mat in reference to the Earth’s magnetic field, but rather to indicate where the TV, the center of family dinner ritual, should be in reference to the table. There is a flexible area for shared dishes defined by cute illustration.
pattern design 01
The final product is a true collaboration, with every team member contributing to each stage—from sewing and pattern design to painting. Together, we have worked to preserve these memories, as if we ourselves were part of the family.
fabrication process 01
fabrication process 02
FINAL PRODUCT
The final tablecloth mat has different colored fabric stitched on, representing different family members. This particular mat represents Brie’s family. The Southeast corner is reserved for her brother, who often eats away from their family and in his own room. The corner is therefore detachable and reattached via buttons. Those receiving this can also scan the QR Code on the iPad area and cloth edge to see what the gift senders see during the meal and get the digital version of the manual.
final product
Additionally, we made a manual to illustrate the dining experience on the tablecloth, including an introduction to the family, the legend for dishes, utensils, and accessories, and the scenario of everyday dining routines as storyboards. We encourage those who receive this gift to experience life on the cloth immersive by imitating the indirect dining settings.
Coming soon :)
THE PROJECT
An AI-driven web-based plugin that provides better communication between users and generative AI tools
Discover & UNDERSTAND
Our journey starts from the question below: As a designer, have you ever used Generative AI tools in design process? And have you ever felt misunderstood by Generative AI tools in design process?

We collected opinions from designers across different disciplines and integrated them into an affinity diagram. The takeaways from this status quo guided us to our design space: human-AI communication.
surveyinsights
define question
How might we ease the process of communication between designers and generative AI tools by fostering mutual understanding?
IMAGINE
Our proposal is to leverage a web-based plugin with language process ability. Users label the results after comparing the current outcome with their mind and iterate the suggestion on the plugin. The human-in-the-loop method will provide users preferred results by achieving consensus.
workflow
RAPID PROTOTYPING
init
Feature 1: User Domain Settings (web-based plugin)

Once the plugin is installed and logged in, the interface will prompt user to select their corresponding domain in design professions, which parameters will be applied on image generation and suggestion-making processes of the plugin.
suggestion
Feature 2: User-oriented Image Classification / Generation (web-based plugin)

Once user begin to execute text-based image generation processes, the plugin will provide users with tailored user-friendly image labels; which will allow the user to  revise the labels based on one’s interpretation of content.
iteration
Feature 2: User-oriented Image Classification / Generation(web-based plugin)

Image re-generation based on revised image classification labels will help users acquire updated contents more precisely; while helping the algorithm to learn user’s domain knowledge and pattern - providing data for the algorithm to improve performance overtime.
personalization
Feature 3: User Profile Settings (dedicated web interface)

The web-based plugin also provides a standalone website that allows the user to retrospectively review domains and labels associated with their profile, recycling previously removed parameters while gaining more insights about their interaction patterns.
PRECEDENT STUDY
We wanted to build an object and experience that would embody what it would be like to be underground. We explored diorama as a possible final product as we wanted to design an experience, but also build a physical prototype. We took heavy inspiration from underground, ant colony dioramas.
precedents
PROCESS & DEVELOPMENT
Our initial sketch consisted of three levels: underground, ground, and canopy. Both the canopy and the ground level would have holes cut through the base, which would allow us to drip water through to simulate rainfall. The trees we build would play a vital part, not just in terms of aesthetics but also in terms of the experience itself. We wanted to design it so that the water that falls from the canopy would drip down, saturate the tree and its roots, carrying the water all the way down to the underground level. We also designed a staircase inviting guests to imagine what it would be like to go down into a cave, covered with roots, soil, and the countless creatures hidden underneath.
sketch
For the underground level, we built the ‘cave’ with black aluminum foil, poured soil over around the edges, and punched holes on the ceiling through which we snaked through tube cleaners. This would carry the water from above ground to below ground, once saturated. We also cut a large hole in the front panel, inviting guests to reach in and touch the soil as the water drops from the cave’s ceiling. We wrapped the black foil over and around the opening in the center, to help keep the soil contained and sealed. The cotton was threaded through thin copper wires which hung over the two side walls.
interaction
FINAL PRODUCT
The final model is completed with organic soil and manmade trees and clouds. The environment comes to life in our film through sprinkling water over the clouds and seeping into the soil through the tube cleaners. A play on both the Wildcard and Synthesis, this experience allows one to feel the senses of the Earth 6 ft under.
final 01
final 02
final 04
final 03
IDEATION
For this workshop, I chose my conversation history with Alexa as my small data set. As a graduate student living on a tight budget, I shared an old house with three roommates, which had no smart home devices installed. My primary interactions with Alexa were limited to playing music and setting alarms to wake me up in the morning. Despite living with the device for nearly a year, the data it contained was surprisingly sparse. This session gives me a good chance to reflect on my relationship with it - a companion spending endless days and nights with me.

This workshop provided an opportunity to reflect on my relationship with Alexa, a companion that silently shared countless days and nights with me. The day before the workshop, I downloaded my interaction data from Amazon and formatted it into three columns: content, date, and time.

During the session, I started by reviewing the most recent conversations. To my surprise, I discovered several interesting, and at times, puzzling, exchanges that had slipped past my attention. As a non-native English speaker, I noticed how miscommunications occurred between Alexa and me. Alexa, being far less sophisticated than today’s advanced language models, added another layer of complexity. These misunderstandings and the relationship we built inspired me as I worked on this zine.

I categorized the conversations into three themes:

Type 1: Correct recognition: lucky me :)

Type 2: Alexa's fault: Due to limited speech recognition ability, Alexa can't read ambiguous commands beyond its skill set and learning algorithm. In this case, it can't parse the command and show:

        "Audio could not be understood."

But what happens more frequently is that I can parse what I say but wrong.

Type 3: My fault: Voice recording only works when I press the recording button, but my English skills can't help me figure out what to say in such a short period, so it often captures nothing, which is shown as:

        "No text stored."

Sometimes, I would also overestimate its ability, so when I said something that it apparently can't recognize, this is what it thought:

        "Audio was not intended for this device."


By examining the data within these categories, I naturally became curious about when these conversations occurred, how their frequency evolved over time, and how both Alexa's and my attitudes and impressions shifted after each interaction. This curiosity led me to focus on two key aspects for my data visualization:

1. A timeline of conversations within each category.
2. A trend map illustrating patterns across these categories.

However, diving into quantitative analysis too early felt restrictive to my creative process. Instead, I took a step back and imagined these trends in my mind, carefully reviewing the dataset and jotting down notes. Through this process, I found myself working as a detective: each conversation’s content and timestamp brought up vivid memories of what I was doing at that moment. These emerging stories felt so powerful and alive that I decided to represent this experience as an interaction within the zine itself.
data before
data after
PROGRESS
With a clear purpose in mind, I began prototyping. To best represent a timeline, I decided on a long diagram, which naturally lent itself to a physical form: a multi-folded map. A set of memory cards served as keys, guiding viewers to pinpoint when the stories on the cards occurred. For the comparison between myself and Alexa, I designed a two-direction slider—an intuitive medium for examining the relationship between two entities. This slider also works as a protective cover for the delicate map. With these goals in mind, the physical form took shape smoothly and naturally.

Next, I moved on to visualizing the data using Python. To enhance readability, I focused on refining the design of the legends, selecting thoughtful color schemes, and optimizing the overall layout.
proto 02 - 1
proto 02 -2
First Functional Prototype
python
ai
Data Visualization and Processing
proto 03 - 1
proto 03 -2
Second Prototype: Test for Iteration
FINAL PRODUCT
The final product is a high-fidelity prototype of the zine described above. The colors represent the two entities: me and Alexa, and also subtly reflect the passage of time throughout the day (day and night). The cover is designed as a ruler, with key attitude shifts marked along. Readers are invited to explore these shifts by locating them on the map, creating an engaging, self-guided discovery experience.
final overview
final 01
final 02
takeout
map
me
front
back
alexa
REFLECTION
Due to time constraints, I didn’t have the opportunity to iterate further on the prototype, which is a bit disappointing. Additionally, I didn’t receive much feedback from the audience during the showcase. Reflecting on my work, I realized that trying to find a cohesive story in the chart might have been somewhat futile, as the chart itself didn’t provide any additional insights. Similarly, the trend map didn’t reveal the changes I had anticipated before visualizing the data.

It seems that my relationship with Alexa hasn’t evolved in the way I had imagined. Alexa hasn’t grown better at understanding me, nor have I become more adept at interacting with it. Moreover, the dataset from Alexa doesn’t include its own feedback, so the conversation is ultimately one-sided: focused solely on my questions. Although I recalled some amusing responses from Alexa, I couldn’t represent them due to the lack of corresponding data.

Overall, the workshop was a valuable opportunity for me to reflect on my relationship with personal artifacts and to approach them with an interactive mindset. Despite the limitations, it allowed me to think critically about these interactions in a new and engaging way.