jimmie@portfolio:~/work/ember$ cat README.md

Designing an AI reading companion for struggling readers

2x practice time per student without additional tutor hours. Voice-first AI reading companion built on Science of Reading methodology. Validated by Johns Hopkins (n=1,872).

role: Staff Product Designer (first design hire)

client: Ignite Reading

dates: 2024 to 2025

team: Staff Product Designer, 1 PM, 3 engineers, 1 ML engineer

scope: [0 to 1] [ai-native] [voice-first]

Ember in action: voice-first AI reading practice between tutoring sessions

## Frame

Ignite Reading's core product was working. 15-minute daily 1-on-1 tutoring sessions, delivered by trained tutors, grounded in Science of Reading methodology. The Johns Hopkins study (n=1,872) had already shown 5.4 months of additional reading growth per student with zero achievement gaps across demographic subgroups. The model was validated. The constraint was the model itself.

A child gets 15 minutes a day with a tutor. That leaves 23 hours and 45 minutes where the tutoring stops and the forgetting starts. The gap between sessions was not a gap in the product roadmap. It was a gap in the child's learning. Ember was the attempt to close it. An AI-powered reading companion that extended the tutoring into the hours where no tutor was present, without requiring any additional tutor time to operate.

## Diagnosis

I spent the first two weeks in classrooms, watching what happened before and after the 15-minute tutoring window. Three things became clear.

The forgetting curve is real and it is fast. Students who nailed a phonics pattern in their Tuesday session could not recall it on Wednesday morning. The research calls this the spacing effect: skills that are not practiced in the interval between instruction decay faster than they were built. The 15-minute session was building something. The 23 hours after it were eroding it.

Existing practice tools were not designed for this population. The ed-tech market is full of reading apps. Almost none of them are built for students who are reading below grade level, which is the only population Ignite serves. The apps assumed reading fluency that these students did not have. Asking a struggling reader to type answers into a reading comprehension app is like asking someone who cannot swim to practice diving form. The modality was wrong.

The tutor relationship was the trust layer. Students showed up to sessions because they trusted their tutor. They did not trust apps. Any practice tool that did not carry some version of the tutor's warmth and encouragement into the independent practice window would get opened once and abandoned. The design problem was not "build a reading app." The design problem was "build something that feels like a continuation of the session, not a replacement for it."

## Exploration

Before committing to a voice-first AI companion, we explored three directions over three weeks of rapid prototyping and classroom observation.

Direction A: Gamified reading app. Flashcard-style exercises with points, streaks, and badges. The ed-tech default. We built a prototype and tested it with eight students in the target population. The engagement mechanics worked for the first two sessions. By session three, students were gaming the system, tapping through cards to earn points without reading. The incentive structure rewarded speed, not decoding. A struggling reader who taps fast looks identical to a fluent reader in the analytics. The product could not tell the difference, which meant it could not teach.

Direction A: Gamified reading app with flashcards, point scoring, streaks, badges, leaderboards, and session completion celebrations

Direction B: Text-based practice with typed responses. A simpler approach: show a word, student types it, system evaluates. We rejected this before testing. The target population reads below grade level. Asking a student who struggles to decode "bright" to also type "b-r-i-g-h-t" adds a motor skill barrier on top of the reading barrier. The input modality was wrong for the user.

Direction B: Text-based typing practice with phonics breakdown, keyboard input, error correction, and session results tracking

Direction C: Voice-first AI companion with session-aware exercises. The student speaks. The AI listens, evaluates, and responds. Exercises are pulled from the tutor's most recent session data. A warm character guides the interaction. This was the hardest to build and the only one that matched all three constraints: the student's reading level (no typing), the tutor's time (no additional work), and the pedagogical requirement (session-specific practice, not generic content).

Direction C: Voice-first AI companion with Ember character, microphone input, and celebration feedback for correct answers

We chose Direction C knowing the technical risk was high. Children's speech recognition was unreliable, but we shipped it anyway because the alternatives all failed on the fundamentals.

## Decision

Three calls shaped the product.

First, voice-first, not text-first. The students Ignite serves are struggling readers. Asking them to read instructions and type responses defeats the purpose. Ember was designed around voice interaction from the first sketch. The student speaks. The AI listens, evaluates, and responds in real time, with speech recognition tuned specifically for children's voices and reading patterns. This was a hard technical bet. Children's speech recognition was (and still is) meaningfully worse than adult speech recognition. We shipped it anyway because the modality was non-negotiable. A text-based reading practice tool for students who cannot yet read fluently is not a product. It is an insult.

Voice interaction flow: Student Speaks, AI Listens, AI Evaluates, Distinguish between decoding error and recognition failure, AI Responds appropriately, loop continues — The voice interaction loop. The critical design decision was distinguishing decoding errors from recognition failures. Each gets a different response.

Second, adaptive practice tied to the tutor's session data. Ember does not generate generic reading exercises. It pulls from the specific phonics patterns, sight words, and decoding strategies the tutor covered in the most recent session and builds practice around those concepts. If the tutor spent Tuesday's session on consonant blends, Ember's Wednesday practice reinforces consonant blends. The AI is not teaching. The tutor is teaching. The AI is extending the teaching into the gap.

Third, the character is the interface. Ember is a warm flame character that guides the student through practice. This was not a branding decision. It was a trust decision. The character gives the AI a face, a voice, and a personality that students can build a relationship with. The character celebrates effort, not just correctness. It says "I heard you working hard on that one" when a student struggles through a word, because the tutor would say the same thing. The encouragement system was designed by studying what Ignite's best tutors actually said in sessions, then encoding those patterns into the AI's response framework.

## Work

Voice input prompt, Read the word above with Click to Speak button

The product shipped as a web application with three core surfaces: guided reading practice with real-time voice recognition, adaptive exercises that target the specific concepts from recent tutoring sessions, and a progress view that shows students (and their tutors) what was practiced between sessions.

The voice interaction was the hardest surface to get right. Children's speech recognition fails in predictable ways: background classroom noise, inconsistent volume, pronunciation patterns that are not errors but developmental stages. The feedback system had to distinguish between "the student made a decoding error" and "the student pronounced the word correctly but the speech model did not recognize it." Getting this wrong in either direction was damaging. False negatives (marking a correct reading as wrong) destroyed confidence. False positives (marking an error as correct) undermined the learning.

The prototyping process used AI-accelerated design: Figma for interaction design, rapid prototyping for voice flows, and in-classroom testing cycles that were measured in days, not sprints. We tested with students in the target population every week for eight weeks. The character's personality, the feedback language, the pacing of exercises, and the difficulty progression were all tuned against real student behavior, not assumptions about student behavior.

The Science of Reading methodology was the constraint, not the inspiration. Every exercise type, every feedback pattern, and every progression rule was reviewed against the methodology's evidence base. The product was not a reading app that happened to use Science of Reading. It was Science of Reading methodology delivered through an AI companion. The distinction matters because it meant we said no to features that would have been engaging but were not evidence-based, and yes to features that were less flashy but pedagogically sound.

## Iteration

Eight weeks of in-classroom testing with students in the target population. Every week changed something.

Weeks 1–2: The false negative problem. The speech recognition model marked 34% of correct readings as incorrect. Students who read a word correctly and were told they got it wrong shut down. One student closed the app and said "it doesn't listen." We tuned the confidence threshold from 0.85 to 0.70 for the initial release, accepting more false positives, letting borderline pronunciations pass, to protect against the confidence-destroying false negatives. The error rate on correct readings dropped to 8%. We continued tuning the model against children's voice data over the following weeks.

Weeks 3–4: Pacing and fatigue. The first version presented exercises at a constant pace, one word every four seconds. Classroom observation showed students fatiguing after 6–8 minutes. We added adaptive pacing: after three correct answers in a row, the pace quickened slightly. After an incorrect answer, the system slowed down, added encouragement, and offered the word again with a phonetic hint. Average session length increased from 7 minutes to 12 minutes without any increase in reported fatigue.

Weeks 5–6: The character's voice. The first iteration of Ember's encouragement language was written by the design team. It was fine. It was not right. We recorded four tutoring sessions (with permission) and transcribed the specific phrases tutors used when students struggled. "I heard you working hard on that one." "You almost had it. Try the first sound again." "That was a tricky one and you stuck with it." We replaced the designed encouragement with the transcribed encouragement. Students responded to Ember's feedback the way they responded to their tutor's feedback, with effort, not avoidance. The language did not come from a copywriter. It came from the tutors.

Weeks 7–8: Session-to-practice alignment. The initial version pulled exercises from a general phonics progression. We built the pipeline that connected Ember's exercise selection to the tutor's actual session notes. The first week with real session data, tutor satisfaction scores jumped. Tutors could see that students were practicing exactly what they had taught. One tutor said "it's like the app was in my session." That was the design intent. The data confirmed it worked.

## Outcome

2x practice time per student without adding a single tutor hour. Students who used Ember between sessions doubled their independent reading practice compared to the control group, and the practice was targeted at exactly the concepts their tutor had identified. 100% alignment with Science of Reading methodology, verified by the curriculum team and external reviewers.

The downstream signal that mattered more than the practice metric: Ember contributed to the evidence base that supported Ignite Reading's continued growth from Series A ($10M) to Series B ($36.75M). The platform was no longer just a tutoring product. It was a tutoring product with an AI-powered practice layer that extended the value of every session into the hours between sessions. The unit economics changed. The cost per student-hour of effective reading practice dropped because the AI was multiplying the tutor's impact without multiplying the tutor's time.

The Johns Hopkins validation (5.4 months of additional growth, zero achievement gaps) covered the full Ignite platform including Ember. The practice layer was not studied in isolation, which is the right way to evaluate it. Ember was not a standalone product. It was an extension of the tutoring relationship, and the outcomes reflect the system, not the feature.

## Reflection

The thing I learned building Ember that I keep coming back to is that AI products for vulnerable populations require a fundamentally different design posture than AI products for general consumers. When the user is a struggling seven-year-old, every false signal, whether a wrong correction, a missed encouragement, or a difficulty spike that was too steep, has a cost that is not measured in churn metrics. It is measured in a child's relationship with reading.

The design decision I am most proud of is the one that is least visible: the decision to make Ember feel like a continuation of the tutoring session rather than a separate product. Students do not think "now I am using the app." They think "Ember is helping me practice what my tutor taught me." That continuity was the entire design problem, and it required the AI, the character, the exercise design, and the voice interaction to all point in the same direction. The hardest part of AI product design is not the AI. It is making sure the AI serves the relationship the user already trusts.