Over 600 subscribers
Building With Public: This is part five of a five-part series of how I've been incorporating student voices and perspectives into the early product conceptualization for my new learning app, and why the age of AI demands that we not only build in public, but build with the public.
Series Intro - Building WITH the Public
Case study 1 - Co-designing a meerkat with high school artists
Case study 2 - Battle testing startup economics with grad students
Case study 3 - Getting elementary-aged learners to evaluate AI
In this final case study, you'll see what happens when you build with an inclusion-first mindset, inviting neurodiverse young adults to share what doesn’t work about a piece of technology.
A few months ago, I started to notice that one of the benefits (and perils) of sharing half-finished software with a broader audience is that it can sometimes encourage bad actors to intentionally sabotage it. At the time, I was in the midst of converting a web app into a mobile app with the help of a developer friend, but I knew my prompt engineering had a long way to go before I could definitively hone in on a particular ideal customer profile (ICP) or niche.
I started to run small-batch cohorts tests with parents and their kids at museums and parks on weekends, but the sessions were too few and far between to extract meaningful data and trends. I also started to hear about new audience groups, such as parents of neurodivergent learners, whose kids might interpret the world a little bit differently. But I didn’t have a lot of exposure or access to that community on my own.
When I got the app live in TestFlight in early April, one thing that happened was that parents would install the app on their phones, then use it right away, whether or not their kid was present. They’d take a picture of something random (maybe, a photo on their desk, or a coffee mug on the kitchen table), listen to Miko’s story, and then share feedback about how it failed to generate meaningful learning data for their child.
While certainly useful feedback (particularly about how I need to optimize the overall user onboarding flow), I noticed that a lot of these parents never invited their child into the conversation at all. In other words: My user tests never made it to the end user.
This was a pretty big problem. I knew I needed a lot more user testing in order to smooth out some of the rough edges. And I knew I needed to engage with younger audiences. But I didn’t have a good way of reaching those groups on my own. That’s when Tech Kids Unlimited entered the picture.
Tech Kids Unlimited is a NYC-based nonprofit that empowers neurodiverse students through tech-based, social-emotional learning programs and work-readiness skills. The TKU Digital Agency is an initiative that invites neurodiverse young adults to partner directly with technology companies on services like web design, graphic design, content creation, and product feedback.
As soon as I learned about this program, I applied to participate with a cohort of students to help me run quality control on my app, MuseKat.
Over the course of 8 weeks, we set out to accomplish three main objectives:
User Feedback: Capture real-time, unfiltered feedback, from neurodiverse learners about the problem I was potentially solving for a younger version of themselves.
QA Testing: Learn more about the parameters where MuseKat performed better and worse, in real-world contexts (including what breaks the app).
Unmet Needs: Understand more about the point of view of something who learns a little bit differently and start to understand the viability of building something more robust for this audience in the future.
That first week, I gave everyone in the program early access to a specialized group in TestFlight just for beta testers and asked them to try MuseKat. I noticed some students scanned artifacts of cultural or scientific interest, whereas others took pictures of Pokemon or other well-known pop culture characters.
For our next session, I asked the TKU Agency collective to critique Miko’s readouts against several parameters to help me come up with a more fully formed hypothesis about where it excels vs. where it falls short.
Specifically, I wanted to know:
How does it perform across multiple child ages (e.g., 3, 5, 8, 10, 12?)
What bugs do you encounter? When do these happen?
Does Miko ever say anything confusing or unclear?
I received quite a bit of qualitative feedback from the students, which is what helped me hone in on an age range target of 5-10 year old learners (and their parents), at least for this app’s current iteration. But the students also noted things like misidentified characters, hallucinations, one unexpected JSON errors in the text readouts.
One day when reading the real-time logs of Miko’s responses, I noticed a couple of surprising responses.
“Chào bạn! Miko đây - chào mừng bạn đến với bảo tàng lịch sử tự nhiên! Hãy cùng khám phá nào! Wow, bức tranh này trông giống như một cánh đồng nắng rực rỡ mà bạn có thể chạy nhảy trong đó! Nó đầy màu sắc và sống động, như thể sự vui tươi đang nhảy múa trên bề mặt! Thực ra, tranh này giúp chúng ta thấy vẻ đẹp của thiên nhiên trong quá khứ, khi mọi thứ đều tươi mới và hoang dã. Chú Miko có một người bạn tên là Flaco, một con cú rất dũng cảm, nó sẽ thích ngắm nhìn cảnh vật từ trên cao như thế này! Nếu bạn có thể bước vào bức tranh này, bạn sẽ khám phá được những điều thú vị nào?”
As it turned out, one of the students figured out how to get Miko to speak Vietnamese. (Apparently, the audio worked too!) The next day, I saw something equally concerning: A check-in at a location that was decidedly not G-rated. (A good reminder to put some parameters on the app for kid-appropriate content.)
In the end, this student agency collective approached the evaluation process in a totally different way than I would have thought to do on my own. They scanned stuff in their own houses, in Vietnamese grocery stores, and pop culture characters. They figured out how to “prompt inject” Miko to do things that a curious little meerkat definitely should not be doing, and they shared ideas for how to make the experience better and more useful for themselves today, and a younger version of themselves.
During the final week of the program, the TKU agency interns shared several pages of qualitative feedback and bugs. Ultimately, that inspired me to work with AI to categorize the major themes from their work and come up with an evaluation framework to critique the Miko responses in real time.
The weekend after the internship concluded, I made an AI evaluation dashboard that compares how well the photos and age parameters of any individual “Miko scan” compares to the generated description. Based on the student feedback, I created a list of parameters that mattered to me, such as how age-appropriate the content was, and how well the description matched the scanned images.
On our final day of the TKU Agency, both the students and the teacher reflected back on how much fun everyone had in trying to bust through the boundaries of my app over the past 6 weeks. (Turns out, it’s fun to break things...)
This isn’t unique to this group. Every time I’ve invited a new student group into the development process, the exchange has been mutual. Students learn more about how AI products are built, and also, start to form critical opinions about “good” vs. “bad” AI-generated outputs. Meanwhile, as a creator, I get real-time feedback from people who are much more representative of my ideal end user.
This experience didn’t just help improve MuseKat. It changed how I think about product development in the age of AI. To me, this means structuring build cycles to not just invite user feedback but co-creation and partnership. To avoid building in silos of hiding data decisions behind black boxes or complicated algorithms, and instead offer transparency into the generative process of AI outputs. Of being well aware of the ethical, safety, and privacy concerns, and really listening to those concerns with empathy and integrity.
There are still too many more questions than answers about where all of this will go. But all of these experiences have crystallized this thesis for me:
Interested in Being a Design Partner for Phase 2?
Getting from 0 to 1 with MuseKat has taught me a lot about what students expect to hear from AI-powered solutions, and now I've got a few better ideas for what's possible in future iterations. But I need your help. If you work with elementary-aged learners and are interested in being a design partner for phase two, let's talk.
Bethany Crystal
Wrapping up this week's mini-series, "Building WITH the Public" with one final case study This spring, I partnered with Tech Kids Unlimited, a NYC based nonprofit that works with neurodiverse young adults, to help test the viability of the AI-generated outputs from my learning app, MuseKat I ended up receiving a ton of real feedback from students ranging from 16-22, which I eventually codified into creating a more automated "evals agent" This is the final example of 4 student-led activations I ran all spring. It's been really fun to have design partners at this early stage. Hope this mini-series inspires a few more creative approaching to building alongside your users too! https://hardmodefirst.xyz/building-with-the-public-breaking-my-own-app-with-neurodiverse-student-interns