The Introspection of Illusions
How the algorithms created to recommend movies came to know us better than we can know ourselves
The algorithms we know today, the ones we continually curate and carefully consider before clicking on a YouTube video about canned hamburgers because it may corrupt our recommendations for a month, those algorithms were still in their infancy back in 2006 when Netflix faced what seemed like, at the time, an insurmountable problem.
Cinematch, their all-powerful recommendation engine, had met its match in a 2004 indie comedy whose plot exposed the many levels of the human condition through, among other things, the heartwarming exchange of tater tots.
Back then, Napoleon Dynamite was wildly popular, a film so quirky and ironic that it looped back around into demonstrating a true, wholesome reverence for its characters and their lives, but recommending such an experience to the uninitiated carried a lot of risk. It had earned two million ratings by the time it became a problem for Netflix. That’s a lot of valuable data, but most ratings were either one or five stars. Depending on who you asked, it was either among the funniest, most uplifting movies ever made or the cinematic equivalent of eating freshly baked medical waste.
Here’s how this became a problem for Netflix. Imagine you have a user who loves Die Hard but who has yet to see Napoleon Dynamite. Recommend it to the wrong users and they may start to wonder if the monthly subscription is worth it. Risky. Well, maybe there are clues in other movies they like. About half of users who love both Die Hard and Steel Magnolias seem to like Napoleon Dynamite, but the rest hate it. Risky. For people who might like Napoleon Dynamite, what is it about those two movies that they loved? For those who may hate it, what is it about those movies that could predict their future distaste?
So, Netflix offered a $1 million dollar prize to anyone who could figure out how to improve their algorithm by making its recommendations 10 percent less risky. By 2008, more than 30,000 people were vying for the prize using a dataset of about 100 million ratings from half a million users. Netflix even created a website with a leaderboard, updated live. And though many teams made a lot of progress early on, most approaches fell short once Napoleon Dynamite was added to the mix.
One of the major issues with figuring out why people either loved or hated a movie about awkward highschoolers bonding over cafeteria food and Jamiroquai was that the people who loved or hated it didn’t know. Worse yet, they didn’t know they didn’t know. And it wasn’t the only movie that escaped explanation in this way. I Heart Huckabees, Sideways, and anything by Wes Anderson routinely generated the same love/hate responses that couldn’t quite be articulated, no matter how strongly one felt.
Psychologically speaking, users found it easy to access the feelings that prompted them to give those films one star or five. Explaining why they made one feel that way would require the kind of guided metacognition that the Netflix interface simply couldn’t offer.
Even when you stepped away from the code and the spreadsheets and asked people in person, they might not be able to tell you. They could make a guess. They could attempt to explain, justify, and rationalize their feelings, reactions, and star ratings, but without a conversational tool, a back and forth to get past all that to something honest and perhaps previously unexplored, you ran the risk of precipitating a psychological phenomenon known as the introspection illusion which would likely result in yet another phenomenon known as confabulation.
THE ILLUSION OF INTROSPECTION
There’s an entire literature of books and papers and lectures and courses devoted to this side of psychology. To put it very simply, we are unaware of how unaware we are, which makes us unreliable narrators in the stories of ourselves.
Put another way, you often don’t know the true sources of your thoughts, feelings, choices, decisions, actions and so on – the actual motivations, the underlying drives, the skulking triggers – you are, however, amazing at constructing stories as if you did know the antecedents of those things when explaining yourself to yourself and/or others.
This is thanks to a module that lives rent-free in your left hemisphere called the “left brain interpreter” which serves as a spokesperson for your entire human organism. There’s plenty of speculation as to why this module would have evolved. I favor the ones that presume it functions to maintain your social identity as a trustworthy individual. Its preoccupation, it seems, is to provide good reasons to others for whatever it is you are thinking, feeling, doing, or planning to do.
Whatever its evolutionary origins, your left brain interpreter sometimes serves as a press secretary, sometimes an internal narrator, sometimes a lawyer arguing for your most idealized self. But at all times, this portion of you is primarily concerned with generating justifications, explanations, and rationalizations for whatever is salient from one moment to the next, depending on the audience.
There’s a word for when the output of the left-brain interpreter is erroneous – confabulation – and there’s no better example of this than the research conducted on split-brain patients.
Certain conditions like epilepsy were once treated by partially severing the tissue connecting the two hemispheres of the brain. Patients who underwent such a procedure would effectively be of two minds, one generated by the left hemisphere, another by the right. Starting in the 1960s and continuing through the 2010s, cognitive neuroscientists like Michael Gazzinaga studied the effects of these corpus colostomies and found that it laid bare just how much confabulation we are all doing on a minute-by-minute basis.
Since the left hemisphere (mostly) controls the right side of the body and receives inputs from the sense organs of the right side (mostly), and vice versa, scientists could design experiments in which only one side at a time received information from the outside world. Show a word or an image to only the left visual field, and the left-brain interpreter wouldn’t see it. Psychologists could then ask it questions for which it didn’t know the answers.
For example, when scientists showed only the right hemisphere a terrible image, like a car crash with mangled corpses, a split-brained person would often recoil and feel uneasy. But when asked why they felt that way, they might say they had eaten something that didn’t agree with them or that the room seemed creepy. The left-brain interpreter never “saw” the image, but it did feel the feelings, or at least noticed that feelings were being felt by the rest of the body. It didn’t know why, but had no qualms about dutifully spinning a false narrative as an explanation.
In another experiment, researchers simultaneously showed a person’s left hemisphere the word “music” and the right hemisphere the word “bell” making sure neither side saw what the other side was seeing. Thus the portion of the brain that could speak for the rest of the body only had the word “music” to go on. Researchers then produced a card with four illustrations – a bell, a drummer, an organ and a trumpet – and asked subjects to point at what they had seen with the hand controlled by the right hemisphere. When the left-brain interpreter saw the hand controlled by the other hemisphere point to the bell, researchers asked why. One split-brain patient said it was because the last music they heard was coming from the college's bell towers. Others had other explanations, but none had anything to do with actually seeing the word “bell” before pointing to an example of “music.”
One major takeaway from this research is hard, empirical evidence for the fact that when we can’t pinpoint the source of our own thoughts, feelings, and behaviors nor the motivations or drives generating our judgments and decisions, we have no problem explaining, justifying, and rationalizing as if we can. Often, we do this without a moment of doubt or any tangible sense we might be confabulating. This is why, psychologically speaking, reasoning is usually just coming up with reasons for what we think, feel, and do. And since we are social primates, most of the time those reasons take the form of that which we intuit will seem reason-able to others.
To get past this during a session of guided metacognition, practitioners of conversational and therapeutic techniques like Deep Canvassing, street epistemology, and motivational interviewing allow people to produce confabulations without calling those knee jerk justifications into question. They then ask questions that dive deeper and deeper until people begin to notice some conflict in their own responses. Once a person feels a little cognitive dissonance about their own explanations, the urge to resolve that dissonance often then leads to more honest answers, appraisals, and sometimes epiphanies and reassessments that foster new opinions.
But Netflix couldn’t offer this sort of thing to people who loved Die Hard and Steel Magnolias for reasons they, themselves, may have never considered, and this brings us back to Netflix’s problem with Napoleon Dynamite and how that led to today’s algorithms that know us better than we know ourselves.
THE INTROSPECTION OF ILLUSIONS
To spoil the ending of this little drama, the solution was to hand all this over to a computer that could notice the patterns in a person’s tastes that they may not be aware of, patterns that no engineer could determine without some algorithmic heavy lifting. As one analyst told the New York Times Magazine, “They’re able to tease out all of these things that we would never, ever think of ourselves.”
If enough movies shared similarities when it came to the evocation of strong emotional reactions, the algorithm could group people up based on how they responded to dozens of individual factors that humans would need to label if they wanted to take the time to do so. But thanks to a mathematical tool called singular value decomposition, no one needed to articulate anything at all. Plug lots of movies along one axis and lots of users with lots of rankings along another, and you get a nice, complex matrix of correlations that knew the users better than they knew themselves.
Netflix adopted the term “factors.” Neither the algorithm nor the user needed to know what those factors were exactly, just that viewing histories and ratings were surfacing them. Add some machine learning and soon the algorithm could surmise you love movies with car chases, but only if those car chases happen at night, and only in movies made in the 1990s, and only if someone ate a hotdog on screen within 30 minutes of that car chase. You may have no idea that these are your cinematic dealbreakers, but rate enough stuff, or just watch it to completion or bail after 15 minutes, and the info was there in code.
The recommendation engine could then use all those many algorithmically disclosed qualities of its catalog to crunch a lot more data about what people tended to like and dislike about the movies they loved and hated. Napoleon Dynamite wasn’t just a comedy, Crank wasn’t just an action movie, 3:10 to Yuma wasn’t just a Western. A movie like The Prestige was a thriller, a mystery, and a drama – but it was also a mix of science fiction and fantasy. Deeper still, it featured old-school stage magic, Victorian electrical engineering, and David Bowie wearing an exquisitely tailored suit.
The algorithm didn’t know any of this, it was just spitting out correlations. And freed from the constraints of broadly descriptive but narrowly predictive genre categories like action, drama, and comedy, Netflix started plugging more data into their recommendation engine like the cast, the release date, and the running time. So far so good. Now add the exact time and date you watched a movie. Was it near a holiday? Near your birthday? Oh yeah, add your birthday, gender, and region. Add the devices you use. Note when you pause and rewind and fast forward. Add all your searches. If you watch while traveling, note where you travel and how often. The correlations mounted.
The algorithm could now tell that among the people who loved both Die Hard and Steel Magnolias some loved Die Hard for the irreverence of its protagonist and Steel Magnolias for the quirkiness of the family dynamic. They’d probably love Napoleon Dynamite. Others loved Die Hard for the bank heist elements of the early action scenes and connected with Steel Magnolias because of the strong male head of a household taking charge of a difficult situation. They’d probably hate Napoleon Dynamite. With a system for surfacing all of that, but not understanding it, it became possible for the algorithm to recommend Napoleon Dynamite to some, but not all, of the people who loved Die Hard and some, but not all of the people who loved Steel Magnolias. And vice versa. And for everything. Everywhere. All at once.
THE WHITE NOISE OF MORAL DUMBFOUNDING
The algorithms have since become far more advanced and far more of a feature of our everyday lives. They can now tell us what we might like to buy, to eat, places we might like to visit, people we might like to meet. We’ve learned to tread lightly through TikTok and YouTube, lest we misrepresent ourselves or, shudder to think, reveal ourselves as something we’d rather not be.
Maybe there’s a future in which the kind of algorithmic knowledge offered by streaming movie services could collude with the kind derived from cognitive neuroscience, a cyberpunk form of therapy, a Douglas Adams supercomputer kind of self-knowledge we could never gain on our own gleaned from answers to questions we cannot ask ourselves alone.
The prospect of all this seemed to me ever closer when Netflix recently suggested I’d probably like the movie White Noise, which at the time of this writing was just released to extremely mixed reviews.
Based on the recommendation, I read the book and liked it. Then I thought, ‘Well sure, I’ll give it a shot.’ In the end, I too gave the movie version a mixed review. Why? Well, that’s the thing. The algorithm didn’t know why I would watch the entire movie, but I did. I was glad it showed up in my recommendations, and that’s all that matters to Netflix in the end. For Netflix, that was a success. The algorithm worked. But I was left perplexed, which reminded me of another finding in the study of the introspection illusion which could serve as some kind of conclusion to all this.
Long before you get to the hormones and neurons and molecules and atoms that are truly responsible for the feelings that inform our behaviors, there’s a psychological territory whose borders we simply cannot cross, and it is in that space some of our strongest motivations lie.
Psychologist Jonathan Haidt demonstrated this in a series of experiments in which he and his colleagues asked participants if they’d be willing to wear a sweater once worn by a serial killer, or if they’d drink from a glass of juice after a sterilized cockroach had been dipped into it, or if they would sign a contract to sell their souls after death for two dollars.
In each experiment, the researchers carefully prepared to meet every possible argument against these acts with a logical refutation. Who knows the history of the average thrift store sweater? The roach was completely sterile, and there’s bug parts in lots of the food you eat. The contract was completely make believe and in no way legally binding. The idea was to head off every fact, every possible justification, every reason a subject might produce to defend their position.
After preventing every attempt at escape, most people grew frustrated. Haidt called this “moral dumbfounding.” The true source of the person’s attitudes was hidden to them, inaccessible, something deep, evolutionary, primal – wholly ineffable. Yet, the left-brain interpreter dutifully produced weaker and weaker explanations until it argued itself into a corner. At a loss, bewildered, most said something like, “I don’t know why I think it is wrong; I can’t give a reason, it just is.”
There are parts of us we can’t access, sources of our emotional states we can’t divine, and I find some strange poetry in the fact that, like us, the algorithms can’t always articulate the why of what we do and do not like. Yet, through millions of A/B tests slowly zeroing in on more and more successful correlations, the Netflix Recommendation Engine can produce a glimpse of something a bit like the sort of profound, soul-exposing knowledge earned via an intense introspection that we could never achieve. Something a few fathoms deeper than “I don’t know, it just wasn’t for me.”
My latest book, which explores the power of guided metacogntion: How Minds Change
SOURCES
The Full Guide on Netflix Recommendation Algorithm: How does it work?
Moral Dumbfounding: When Intuition Finds No Reason