Inside the "Mind" of ChatGPT

Computer scientist Cal Newport lifts up the GPT hood, and talks with me about whether it will replace and/or nuke you

Apr 25, 2023

This is a post about generative AI, so I used the image-generation program Midjourney to make this cover illustration. (It’s not a picture of Cal Newport, but there is a lot going on in his brain.)

On the internet, and IRL, I’ve been inundated with articles and conversations about ChatGPT. Almost all of that writing, and those discussions, have featured awe or terror. They haven’t satisfied my curiosity about what it’s actually doing.

So I was excited to devour a New Yorker article by the always-fascinating Cal Newport, a Georgetown computer science professor who is widely known for his work-world books, like Deep Work, which is about cultivating room for focus amid technological distraction.

You should read Cal’s article. (You can read a few New Yorker articles per month for free even if you don’t subscribe.) It takes one of my favorite approaches — going through the historical progression of ideas that led to ChatGPT — and includes some excellent conceptual explication. This passage, for instance:

“...For example, if the program feeds itself an excerpt of Act III of ‘Hamlet’ that ends with the words ‘to be or not to,’ then it knows the correct next word is ‘be.’ If this is still early in the program’s training, relying on largely random rules, it’s unlikely to output this correct response; maybe it will output something nonsensical, like ‘dog.’ But this is O.K., because since the program knows the right answer—‘be’—it can now nudge its existing rules until they produce a response that is slightly better. Such a nudge, accomplished through a careful mathematical process, is likely to be small, and the difference it makes will be minor. If we imagine that the input passing through our program’s rules is like the disk rattling down the Plinko board on ‘The Price Is Right,’ then a nudge is like removing a single peg—it will change where the disk lands, but only barely.”

I invited Cal to discuss the article, and the likelihood of ChatGPT either replacing you or nuking humanity. Below is our back-and-forth.

David Epstein: Cal, just a quick preliminary question here: We’re discussing an article you reported for the New Yorker, but to what extent does your academic background in computer science inform your understanding of ChatGPT?

Cal Newport: This background was for sure important, as it allowed me, relatively easily, to read the relevant research papers and get up to speed on the underlying systems and mathematical techniques behind these latest models. By far the bigger challenge was figuring out how to render this mess of complicated information into something legible for an audience with no interest or familiarity with linear algebra.

DE: Got it. So: in all the bluster and bemusement about ChatGPT, I feel like the “Will it replace me?” and “Will it destroy us?” conversations have buried the “How does it work?” discussion. To me, the issue of how it works seems like a crucial precursor to those topics. In this article, you’re basically lifting up the conceptual hood. How did the reporting you did here influence your take on the “Will it replace me?” and “Will it kill me?” questions?

CN: A lot of the more concerned reporting seemed to be treating these tools like an alien Skinner Box that we can understand only by probing with questions.This allowed people to essentially speculate wildly – what if it could do this?! – and then get really worried about the speculations, leading to even wilder speculations. This self-reinforcing cycle spun up to a furious speed. It took less than five months to get from amusement about ChatGPT’s ability to answer esoteric questions to calls in the New York Times for world leaders to step up to the challenges presented by this moment. It seemed clear to me that grounding discussions of this technology in the concrete details of its implementation would help ground some of the more high-flying rhetoric.

DE: You mention that Times opinion piece — written by some big-name thinkers — in your article. In it, the authors write:

“We have summoned an alien intelligence. We don’t know much about it, except that it is extremely powerful and offers us bedazzling gifts but could also hack the foundations of our civilization.”

That is, shall we say, a decidedly different tenor than your piece. Based on your writing, it sounds like we do know plenty about it at a conceptual level, even if some of the specific inner calculations that generate responses are inscrutable. (The same might be said about our brains…) Your piece is, I think, about deconstructing the “mind” of ChatGPT, and makes it sound sort of like a monumentally more powerful version of those toy robots that can always win at “20 questions” by asking yes/no questions and then recalculating the probability of a given answer after each response. (I should say: I find those fun and impressive!)

To what extent do you think the “summoned an alien intelligence” reaction is driven by what I think of as the “Boston-Dynamics effect” — where the most awe-inspiring successes of a technology go viral, even if they don’t give a fair picture of the technology’s strengths and weaknesses?

CN: As best I can tell, there was a self-reinforcing strengthening of the concern response to the GPT. I do think a lot of this was a Boston-Dynamics effect. (A clear example of this is Kevin Roose’s now famous NYT article where he talked about the Bing chatbot “deeply unsettling” him after it tried to convince Kevin to divorce his wife and revealed a “shadow self” named Venom.)

There are at least a couple other effects at play as well. One is virality dynamics. Concern attracted clicks which pushed people to one-up existing concerns in search of more clicks. Another dynamic is the current distrust between the mainstream media and online tech circles. The same “tech bros” that were all in on crypto have gone all in on GPT, and are pushing hardcore utopian visions where we’re weeks away from AI transforming the world. (I get lots of emails from this crew). The mainstream media has an immune response to the tech bros, so their enthusiasm re-oriented the coverage toward skepticism, which then coupled with viral dynamics to push both sides farther and farther toward their extremes.

DE: I think this gets at another reason why I appreciated your article. So much tech coverage feels to me like either lightly-reported hype or lightly-reported cynicism, and rarely do I read an article excavating the conceptual underpinnings of new tools.

To go back to the strengths and weaknesses issue, I’ve been playing with GPT-4 a little bit, and I want to share with you my most and least impressive experiences.

Most impressive: I wanted a word to describe the feeling of realization when you find the last piece of a puzzle. At first, GPT-4 gave me some obvious choices (“Eureka!”) that weren’t quite what I wanted. I kept trying: “Give me a word, in any language, that isn’t an exclamation, but describes the visceral emotion when you’re trying, for example, to pick a lock and the final tumbler falls into place.” At that point, it came up with a half-dozen really cool words, like the German “Aha-Erlebnis,” which, it said, “describes the feeling of sudden realization or insight when a puzzle has been solved or a problem has been overcome. It is akin to the ‘Aha!’ moment in English, but ‘Aha-Erlebnis’ carries a more emotional and personal aspect of the experience. This term conveys the joy and satisfaction that comes with understanding or figuring something out, capturing that visceral emotion of achievement.”

Least impressive: I was prepping to interview the novelist Isabel Allende, and ChatGPT provided some information about her based on what she had said in an interview with the Paris Review in 1991. But I couldn’t find the interview. ChatGPT said that, as an AI language model, it could not give me a link, but that I should just search "Isabel Allende Paris Review Interview 1991," or, "Isabel Allende The Art of Fiction No. 121.” I was skeptical, so I started asking, “Are you sure?” Each time I asked that, it would apologize and change the supposed number of the interview to some other made-up figure. Turns out, Isabel Allende never did the interview in question. I’ve had this experience asking scientific questions too, where I ask for sources or reading recommendations and it makes up fake scientific papers that sound real. After reading your article, I think I get it now. It’s just predicting the next likely word based on previous words, not actually searching sources.

Am I understanding that correctly? And what do you think my best and worst GPT-4 experiences thus far have to say about how we should understand this tool?

CN: You’re understanding is correct. Unlike the human brain, these large language models don’t start with conceptual models that they then describe with language. They are instead autoregressive word guessers. You give it some text and it outputs guesses at what word comes next. The reason why you were getting the Paris Review response was that GPT-4 likely encountered lots of examples of authors being interviewed in the Paris Review and people talking about authors being interviewed in the Paris Review. Its feature detectors know that Allende is a novelist, so when guessing next words it likely amplified words that came from discussions of novelists, and thus you got responses that included the Paris Review. In my piece, I noted that this approach makes these models into something like an “unrepentant fabulist.” For them to be usefully deployed in many professional settings, they will likely need to be yoked with another fact-checking mechanism, which may end up essentially replicating the Google searches that you did on your own in this case.

Your impressive example [“Aha-Erlebnis”] is a request that falls right into the sweet spot of these models, as guessing next words based on features from the request is exactly what they do.

DE: Earlier this month, I saw a talk by economist Tyler Cowen in which he displayed a chart showing all the tests that GPT-4 had aced. It included medical exams and bar exams. If I recall correctly, he said that one of his colleagues gave an exam he gives to students to GPT-3.5 and it was mediocre, but then GPT-4 was his top performer. I believe Cowen also said it was doing some diagnosis as well as or better than current doctors. Outside the room after his talk, every nook of the hotel lobby was occupied by someone on a phone talking about how they’re about to be replaced. In the talk, Cowen said that if you’re competing with GPT-4 rather than working with it, you’re done for. Apropos of writers, he said that if your job primarily involves the production of words, you’re essentially replaceable immediately, it just takes some time for GPT-4 integration.

But you argue that while ChatGPT can “generate attention-catching examples, the technology is unlikely in its current form to significantly disrupt the job market.” What makes you say that, given that it can pass all these tests?

CN: If you read OpenAI’s main research paper about GPT-3, you discover that what they’re most proud of was not any one individual ability of the model, but instead the breadth of its abilities. They demonstrated that it could do well on many different well-known tests of natural language processing. The key, however, is that it didn’t do *better* than the state of the art programs for most of these tasks, but instead that it could do similarly to the state of the art in many different areas without having to specialize. This is critical because it reminds us that in many of these areas (processing medical records, producing natural text on a specific topic), we have already had specialized language models that do very well, in many cases better than recent GPT models. But these have failed to fundamentally disrupt these fields.

These models do particularly well on tests because they’re very well suited for its word-guessing approach. They have been specifically trained to be good at responding to questions – as this is what is expected in the chat interface. To respond to a test question, all it needs is to have seen a similar question and its model will push for words from relevant answers.

Ultimately, however, the best summary of what these models can do is the following: in response to a user request, write natural text on arbitrary combinations of known subjects in arbitrary combinations of known topics, where “known” means encountered them enough during training. In doing so, it has no ability to actually check if what it’s saying is true or not. The key question to ask is how much of your current job could be replaced by this ability?

DE: I’m thinking about your phrase above, “unlikely in its current form…” Based on your article, and I guess what I’ve gleaned about the brain over the years, I think one fundamental difference in our intelligence and ChatGPT intelligence is that we work with basically this giant store of templates, or models of the world, into which we can fit and connect all sorts of information. And they’re kind of infinitely malleable, in the sense that fitting and connecting new information causes the template constantly to update. My understanding from your article is that ChatGPT has an impressive language model, but it doesn’t have that updating capacity that we rely upon to deal with our changing view of the world. If that’s the case, it would seem entirely to obviate the issue of ChatGPT potentially becoming sentient. So, first, am I in the ballpark in what I’m saying here, or am I off-base?

CN: That’s more or less right. I actually had a long section in an early draft of the article that dove deeply into the question of machine sentience and argued why large language models are not at all on the right trajectory to get there. The fact that they are static is a big part of this. Once trained the models never change. Inputs cascade through the various layers and a vector of word probabilities comes out the other end. Self-awareness requires malleable memory so you can update your understanding of yourself and your relation to the world. Another issue is that the architecture is way too simple. In order to blow these models up to a huge size they had to make their underlying wiring as simple as possible, so they could be efficiently distributed among various processors. The neural networks within the GPT transformer blocks are simple, static, feedforward networks in which information passes only in one direction, with no recurrence or updates. This is much too simple to enable something as complex as self-awareness.

DE: And second: “in its current form” feels like a big caveat given the apparent pace of progress. In terms of economic disruption, let’s say each profession takes ChatGPT and then combines it with some model of the work done in their specific area. Based on a quick Twitter perusal, that seems to be what people are trying to do. So that wouldn’t be a single bot to rule them all, but could it replace us all just with a bot army customized by profession?

CN: ChatGPT itself is a bit of a red herring in this scenario. I do think AI will disrupt knowledge work. But I think it will be in a more fragmented and less sexy manner than a single tool suddenly dominating. It will instead come out of our continued efforts to build custom AI tools for specific key tasks in specific industries. A doctor doesn’t need a massive model defined by 175 billion weights to summarize her EMR entries or provide diagnosis guidance. IBM has been working on that very specific problem for years, for example, and I’m sure their bespoke solution does just fine.

DE: Now for an optimistic question. You mention in the article that “ChatGPT won’t replace doctors, but it might make their jobs easier by automatically generating patient notes from electronic medical-record entries.” I was wondering if perhaps it will fundamentally change the job in another way. Back in the 1950s, psychologist Paul Meehl was showing that healthcare professionals would often make better predictions about what would help a patient by relying upon actuarial tables rather than their subjective judgment. (This isn’t to pick on healthcare. In basically every field Meehl examined, simple actuarial tables made better predictions than seasoned professionals.) So I kind of think we’re probably overdue for some ubiquitous decision aids in a bunch of fields. I wonder if AI can fundamentally alter healthcare (and other jobs) to allow humans to focus more on the areas where we can uniquely add value. If a lot of diagnosing can be automated, say, then maybe the more important skill for a doctor becomes understanding the context of a patient’s life and spending more time strategizing with them about how to respond to a diagnosis.

Or to give an analogy I like: a few years ago I was looking at news coverage from the early 1970s when the ATM was introduced. Some of the coverage was apocalyptic — 300,000 bank tellers are going to be out of work overnight! But instead, over the next 50 years, as there were more ATMs, there were more bank tellers. ATMs made branches cheaper to operate, so banks opened more branches. Fewer tellers per branch, but more tellers overall. But more than that, it fundamentally changed the job, from one of repetitive cash transactions, to one where the person is, say, a customer service rep, a marketing professional, a financial adviser, etc. They needed a much broader mix of more strategic skills to add value. Do you think there’s any chance that we come up with a bunch of applications where this new technology frees people to spend more time in the more strategic parts of work?

CN: The doctor example is an interesting one. As noted in my above answer, we do have pretty good diagnosis-support technology. It didn’t end up mattering much yet, however, because most doctors actually spend very little time doing Dr. House-style complicated diagnoses. In 99% of the cases, it’s really obvious what’s going on, so taking the time to query some IBM system doesn’t save much time.

That being said, I think this is, eventually, where AI will most fruitfully impact knowledge work: by automating the logistical wrangling and administrative conversations that take up so much of our time today in the form of email, slack, and meetings. If AI can stop the average knowledge worker from checking his inbox once every 5 minutes, it could, effectively, double, if not triple, the amount of high value output he could produce. This would be my dream: a simplified workday, with more deep work and less exhausting context switching. I don’t know if ChatGPT is what will get us there, but we will probably get there one way or another as the money at stake is massive.

DE: That would be great! You just reminded me of a talk I saw by computer scientist Pedro Domingos, in which he said that people are busy worrying about smart machines running the world, meanwhile stupid machines are running the world.

Cal, thanks so much for your time. And I feel like your last answer is a great prompt for me to recommend two of your outstanding books: Deep Work, and A World Without Email. …That said, your book Digital Minimalism has a special place in my frozen digital heart. It led me to do the “digital declutter” you describe in the book; I removed a bunch of apps from my phone for a month, and then at the end of the month considered which I wanted to add back. Twitter never returned to my phone. I still have an account that I use, but now I have to type in the URL. The result is that I use it intentionally, and I no longer fall into mindless doomscrolling.

Finally: I plugged this entire conversation into ChatGPT and asked it what question I forgot to pose to Cal Newport. But, after several proddings, all the suggestions were either covered in our discussion or kind of boring…or so broad (“Could the way we form our thoughts and express our ideas be influenced by our interaction with AI language models?”) that I didn’t think they were a fit here. So I’ll leave it at that.

Cal also has a blog, and a podcast. In the latest episode of the podcast, he discusses how ChatGPT works and how worried we should be about it; the YouTube version includes Cal’s explainer illustrations.

If you appreciated this post, please share it.

You can support Range Widely with a free or paid subscription.

Thanks for reading. Until next time…

David

24 Comments

Ryan McCormick, M.D.

Examined

Apr 26, 2023

Great interview thank you.

I have to comment as a doctor, because your letter references us several times, and because I naturally feel defensive when this AI stuff starts breathing down our backs. As of right now, in primary care, the juggling of biological, psychological, and social components of someone’s health is just too human of a task. When I coordinate the care of someone with 20+ problems, there are all sorts of levers being pulled in terms of priorities, hierarchical decisions, diagnostic possibilities and treatment recommendations... all delivered with a modicum of charisma and compassion. I think the estimate that 99% of what we do is not diagnostic is a gross underestimation... but it’s true that perhaps 80% of what we do is creative or algorithmic thinking within the boxes of patients’ established diagnoses.

ChatGPT does really suck at citing real sources. It’s one of the biggest Achilles heels, and undermines its reliability and trustworthiness. And trust is at the very foundation of the doctor patient relationship!

Nonetheless, here are a few questions I have typed in live, during the past 2 weeks, while seeing a patient, just to get some quick ideas before double checking veracity:

“is hemochromatosis carrier state associated with an increased risk of pancreatic cancer? Estimate the increased risk in percentage / relative risk.”

“please compare and contrast Interstim procedure versus sling procedure for the treatment of urinary incontinence and overactive bladder.”

“can D-mannose cause candida infection in the esophagus?”

“what could cause swelling of the fingernail beds and toenail beds, with tenderness, associated with blood clots and pulmonary embolus?”

“What are some possible diagnoses for a postpartum woman, presenting with Purpura, petechiae, swollen toes, erythematous skin on the toes, thrombocytopenia and history of antiphospholipid antibody syndrome Who is experiencing generalized abdominal pain?”

“How do statin drugs affect macular degeneration? Include at least 5 medical journal article references”

Most of these queries led to helpful ideas and discussions back and forth, except for the last one asking for references which was a confabulation of sources that were not real. I think I’m in a very small minority of primary docs experimenting with this, so I’m not a typical case. But thought you would like to see some of the trench work😊

Expand full comment

4 replies by David Epstein and others

Matt Thomas

Matt’s Substack

Wow this was fun. I read Deep Work last year, so it was really exciting to see you interviewed Newport. I hadn't seen his New Yorker article until you linked to it, and I'm really glad you did. I was struck by (1) how amazingly smart humans have been to figure out the problem solving and ingenuity behind LLMs and ChatGPT and (2) how terrific Newport is at explaining it. I especially love his analogies like the Plinko one you pulled (shoutout to Kepler, our analogical king).

You noted that one of your favorite approaches is going through the historical development of an idea. I'd love to read another example of this. Do any come to mind that either written by you or someone else? Also, if you have the time/energy to explain your master list, I'm all ears. And I'll be sure to check out the other Newport books you recommended. Thanks again!

Ps - just curious: what was the prompt you used to get the cover illustration?

22 more comments...