Note · the 2026 reality If you sell SAT prep, you have spent 2026 watching Google give it away free inside Gemini, with every headline calling it the end of the tutoring business. Then you read the fine print: the questions in Google's free tool are not the AI's, they are licensed from a specialist prep company, because not even Google trusts a general model to write a calibrated SAT question.

Look at how students actually use what's out there and you see a market scattered across tools, each one used for a single job. Khan Academy to learn a topic. Bluebook to take a full-length practice test. UWorld to grind questions harder than the real exam. Google's free Gemini tests for cheap volume. None of them is a full prep platform, and students know it, so they bounce between four apps because no single one does the whole job. The AI tools widen that gap instead of closing it. They are confidently wrong: tutors who tried Gemini's free prep hit real mistakes within minutes, and a 2025 benchmark found a third of AI-written SAT questions carried errors or bad difficulty. The opening here is not another question bank. It is the one tool that teaches, gets it right, and earns enough trust to replace the other four.

Note · the reliability bar In March 2025, one bad setting in the Bluebook app auto-submitted the tests of more than 10,000 students before they had finished. The College Board issued refunds, a public apology, and a makeup exam. That is the bar you build against: the official app, made by the people who own the test, still went down on test day. Yours has to survive what theirs did not.

The easy half, the content and the basic app, is free now. The hard half is what's left exposed: a scoring engine that matches the real exam, a timer and a session that survive a dropped connection without opening a cheating window, math that renders the same on a screen and in a PDF without taking your server down. None of that is hard to describe. All of it is easy to get wrong, and a junior or an AI gets it wrong in a hundred quiet ways that only surface in production, or worse, on a student's test day. That gap, between describable and reliably built, is the whole business, and it is the one thing nobody hands you free.

And the stakes make the gap unforgiving. For a lot of the students on the other side of the screen, this one test is the thing their family reorganized its life around, and when your platform tells a kid she's scoring 1300 and the real SAT hands her an 1150, you didn't ship a bug, you sold a family a number that wasn't real. We've built one of these. Here is the hard half, section by section, in the order it will hurt you if you get it wrong, and which parts of it are still standing in 2028 once AI has finished eating the easy half.

Here is the shape of that hard half, and why it lands on a web team specifically. The real SAT runs in Bluebook, a downloadable, locked-down app on the student's own machine, so every job below runs there, on the device, for free. Build prep as a web platform and you inherit all of them on your server. Click through them.

where the work actually runs · bluebook vs your web platform
Bluebook · native app, on the device

The scoring engine you don't own

The digital SAT is not a simple quiz. Since March 2024 it runs inside the College Board's Bluebook app, and it adapts to the student. The first part mixes easy and hard questions. How the student does on that first part decides the second part. Do well, and the second part is the hard one that can reach 800. Do poorly, and you get an easier part that quietly caps the score near 600. Copy this routing wrong, and you have not built a practice test. You have built a liar.

Note · what changed with the digital SAT In 2024 the SAT moved from paper to a digital app called Bluebook. It is now shorter, about two hours. It adapts to the student, so the second part gets harder or easier based on the first. And a calculator is allowed the whole time. Any prep platform built before 2024 had to be rebuilt for this format, not just patched.

That is the worst mistake you can make here. A score is a promise. The whole value of your platform is one question: does a 1300 on your screen mean a 1300 in June? Getting there is not about having more questions. It is about scoring them right. The College Board uses a method called Item Response Theory. Every question has a known difficulty level. Your job is to match that difficulty to the real exam, without ever seeing the real exam's data.

That is why a big question bank is no longer your edge. AI can now write SAT-style questions that hold up well against human-written ones. So by 2026 a large bank is just the price of entry, and by 2028 it is worth almost nothing. The hard part is calibration: knowing how hard each question really is. The College Board does this by slipping a few unscored questions into every section to collect real data. You should do the same. Ship your new questions as unscored practice first, watch real students answer them, and learn their true difficulty before they ever count toward a real score.

raw AI-written SAT questions · what a 2025 benchmark found
69% ready to use
14.5%
10%
69% production-ready. Usable with little or no human editing.
14.5% math scope errors. Out-of-scope topics, typos that make a question unsolvable.
10% verbal build errors. Scrambled question order, missing or wrong question types.
6.5% bad difficulty. The item is not as hard or easy as it claims.
AI drafts most of a question bank well. The last 31%, the errors and the bad difficulty, is exactly the human-review work that separates a real platform from a quiz app.
Source: Pursu benchmark, Sept 2025 (100 GPT-4o items vs 100 official SAT items, scored with Item Response Theory) plus 2026 expert reviews.

Math that has to be exactly right

Here is the wall almost every team hits in the first week of real content. SAT math is full of equations, roots, fractions, and geometry. It has to look perfect in three places at once: on the web, in a mobile app, and in PDFs, because parents and tutors still want printable tests and score reports. The common tools each have gaps. KaTeX is fast in a browser. MathJax handles more, and can output a screen-reader-friendly format. But neither one was built for a mobile app screen, or for a server making five thousand custom PDFs on report night.

The PDF path is where servers go to die. Making math-heavy PDFs on the server means running hidden browsers that eat a lot of memory. So when many parents download reports at 9pm, the whole app goes down for everyone. The fixes are boring, and they work. Draw each formula once, save it, and reuse it instead of drawing it again. Build the common full tests ahead of time and serve them as ready-made files. And run the PDF jobs on their own separate machines, not on the main app servers.

One part of this is not a nice-to-have. It is the law. US accessibility rules require that a screen reader can read the math out loud. A formula saved as a flat image cannot be read, so you have to output the math in a format screen readers understand. Get this wrong, and an accessibility complaint can turn into a legal one.

Three ways to lose a student's trust in one afternoon

The first is the AI tutor that lies. If you bolt a raw AI model onto your app as a math tutor, it will teach the wrong method with total confidence. AI models guess the next word, they do not actually do math, so they fail more than half of multi-step problems. The fix is in the design. Catch anything that needs real math, send the calculation to a normal, reliable engine, and let the AI only explain the answer it was given. It is the same idea we use in production RAG.

The second is cheating, and AI cuts both ways. The same models that tutor your students also make cheating easy. That is why honest, watched practice is worth more by 2028, not less. The cat-and-mouse game is real and a little absurd. Students will even smear lotion on a webcam to blind the camera. So the goal is not scary surveillance. It is simple, honest signals, with a real person to review the odd cases. We have built this kind of proctoring as a browser extension.

The third can end your company. The moment you use a webcam, a face scan, or a child's data, you are in the strictest part of privacy law. Pearson paid $18.2 million to settle one biometric case in Illinois. European law says you cannot flag a student by software alone, a human has to review it. And US child-privacy law covers every piece of data you collect from a minor. In 2026, an edtech platform has to be a privacy fortress first and a learning tool second. Most build guides never say this. Get this corner right before you write a single line of proctoring code.

The moat was never the content

Take away the parts AI made cheap, and what is left is the thing that keeps customers: knowing why a student missed a question. A report that just says "640 Math" is useless. A report that can tell the difference between a careless mistake, a real gap in understanding, and simply running out of time, then drills the right one, is the real product. There is data behind this. Score gains flatten out after about seven practice tests, unless you turn those mistakes into focused practice.

This is the part that grows stronger over time. Every student who uses your platform shows you which study paths actually raise scores. By 2028 that private data, not your question bank and not your design, is the edge no one can copy.

Build it or buy it

You do not have to build all of this yourself. There is a real market of ready-made platforms you can put your own brand on. If you run a tutoring business and want a working platform tomorrow, building from scratch is the wrong call. It costs roughly three times as much as buying one. Buy when the platform is just a way to deliver your tutoring. Build when the platform is the product, and your real advantage is the scoring engine, the data, or the anti-cheating. If you are a non-technical founder making this call, the hard part is knowing what you do not know. That is the whole point of the skills a non-technical co-founder needs.

Note · building for the SHSAT or another admissions test? The hard parts are the same: a test engine that feels real, calibrated questions, and honest anti-cheating. But the test itself is different. The SHSAT, for New York City's specialized high schools, is not adaptive, it uses its own scoring, and it has its own question types. So the same playbook applies, except you build a simpler linear engine instead of the SAT's adaptive one.

Make it feel like the real thing

Last, the part that decides whether a student stays. The closer your app feels to Bluebook on test day, the less stress the student carries into the real exam. That means the same calculator, the same way to cross out wrong answers, the same flag-for-review button, and the same rhythm of one short passage and one question, instead of dense walls of text that make students quit. The app also has to teach pacing without nagging, since the real test gives about seventy seconds per question on Reading and Writing, and ninety-five on Math. Get this wrong, and the student feels every bit of that stress on test day.

the SAT-prep value chain · defensibility by layer2026
20242028
Content & question bank
App & UX
Adaptive engine (calibration)
Diagnostics & outcome data
Integrity & compliance

What's still standing in 2028

Put it all together, and the next two years are a sorting. The early panic over the digital SAT is wearing off. Parents are noticing that a generic AI question bank, with no real calibration, does not actually raise scores. The market splits into two piles. On one side, thin AI wrappers, which lose all their value along with the cheap content they make. On the other, serious platforms that own the hard parts: accurate adaptive scoring, a tutor that does not lie, deep diagnostics, and solid privacy and compliance.

The hard truth is what this asks of you. Test prep is quietly turning from a content business into a heavy engineering and data business. You do not get to be the team that just ships fast and chases engagement. The edtech graveyard is full of well-funded teams that learned this too late. AltSchool burned through about two hundred million dollars. Knewton raised about a hundred and eighty and sold for parts. Byju's was worth twenty-two billion before it fell apart. None of them shipped bad software. They chased the wrong number and mistook activity for learning. The teams still standing in 2028 treated the real stakes, the families, the scores, and the law, as the actual job.

What 2muchcoffee covers

We build real edtech and AI systems, and we have shipped in this exact space: an adaptive digital-SAT platform, AI study tools, and proctoring, all on real client deadlines. Maybe you are a prep company or a founder, and you can see the gap between a demo that wows a room and a platform a family will trust with a college decision. That is the conversation we like to have early, before any code locks you into the wrong design. The simple way in is the AI work we do.

One concrete action

Before you plan a single feature, do one thing. Write down the score a student will see on your platform, and the sentence you will stand behind when the real SAT comes back different. If you cannot defend that number, you do not have a content problem or a design problem. You have a calibration problem. Solve that one first, because everything else in this article is built on top of it.

Dmitriy Melnichenko Founder and engineer at 2muchcoffee Builds production AI and edtech systems, and the architecture that keeps them honest under real stakes.