By now, weve seen the ChatGPT parlor tricks. Were past the novelty of a cake recipe in the style of Walt Whitman or a weather report by painter Bob Ross. For the one-hundredth time, we understand the current incarnation of large language models make mistakes. Weve done our best to strike a studied balance between doomersand evangelists. And, weve become less skeptical of emergent flashes of insight from the aptly-named foundational models. At the same time, Google, Meta and a list of hopeful giant swatters have released credible competitors to ChatGPT.
For all those reasons, global use of ChatGPT recently declined for the first time since its November 2022 release. Perhaps now were ready to get to more elemental questions about what generative language artificial intelligence can or cannot do for us in the everyday.
I come to this discussion from a long career managing IT systems in large enterprises, where, as MITs Nicholas Negroponte predicted in 1995, everything that could be digitized was digitized. Im not a cognitive scientist, but I understand enough of how large language models work and how humans separate digital wheat from chaff to begin to think about what they might do with software with an opinion of its own.
As a multi-generation, American Southerner of a certain age, Im also drawn to the notion of a machine that can extract meaning from abstracted language and play it back in sentences and paragraphs. If generative language artificial intelligence is about anything, its about words. So too, is the South. It took the mastery of character and storytelling from the likes of William Faulkner, Eudora Welty and Flannery OConnor to lay bare the enduring contradiction and transcendence of my region.
Large language models are trained on a vast digital, linguistic expression of human intelligence, and Flannery OConnors cast of misfits is certainly buried in there somewhere. Could it be that the giants of Southern Literature left behind mysterious patterns in their collective body of work that can be divined and reproduced by a powerful neural network? Does the presence, or absence, of such a pattern tell us anything about how we might actually use these contraptions in real life?
I decided to show OpenAIs GPT-4 model a picture. My thought was to use a discussion of a photograph to make GPT-4 think a little differently, to make the patterns it sought a bit less obvious. I also wanted to center this discussion around a short list of functions that generalized the innumerable human activities to which generative AI can be applied. My goal was two-fold: to see what it could do with the photograph in a practical sense and, at the same time, try to spark some of those human-like insights from the patterns in its data set (or, if not too technical in the data on which it is trained.
Between 1933 and 1936, Eudora Welty, then in her late twenties and living in Jackson, Mississippi, worked for the Works Progress Administration. This was a Great Depression-era federal relief program that employed millions, including artists. Miss Welty was a publicity agent and photographer. During her time at the WPA, she took a series of evocative photographs that foreshadowed her talent for wringing universal themes from everyday lives. In that same year, she wrote the first of the evocative short stories that would eventually form part of the Southern canon.
I uploaded one of Eudora Weltys enigmatic photographs Home By Dark to Bing Chat. This is the chatbot interface Microsoft has bolted on to OpenAIs GPT-4 large language model. This coupling is the early AI battering ram Microsoft has deployed to try to topple Google as the ruler of the Internet search. OpenAI had recently made available through Microsoft Bing early access to its multi-modal function that allows GPT-4 to process images as well as text.
The number of human activities potentially served by generative language AI is difficult to fathom, much less count. But, they can be imperfectly generalized into three categories: we seek to understand; we engage outside ourselves and we create. These categories work surprisingly well across disparate domains: from teaching yoga to quantum physics; from building software to cooking; from preparing a lesson plan to writing a eulogy.
AI evangelists have shrewdly, and I think correctly, framed the general use case for Generative AI as that of assistant; as an augmenter and amplifier of human effort. They strenuously differentiate user-facing language AI from the kind of AI hidden in system plumbing that analyzes, recommends and predicts. I aligned my categories with their definition, but flipped the use case from what the machine does to what we do; to underscore the assistive nature of these language machines; to emphasize that their sole purpose is to amplify human agency and ability.
This framing also helps us think of generative language AI as something new. When presented with one of these chatbots, or any new tool for that matter, our reflex is to define in comparison with something were already familiar with. The novelty, the work-in-progress nature and the sheer surprise at what these AI chatbots can do gets lost and diminished in comparison with existing digital tools and our experience with them.
The photograph I uploaded to Bing Chat was taken by Eudora Welty in 1936 in Yalobusha County, Mississippi. It shows a young Black family of three, mom, dad and child, traveling in a mule-drawn wagon on a dirt road. They are driving away from the camera at dusk, too far away to make out faces, clearly in motion, toward a broad Mississippi horizon, framed on either side by a fallow field. The mother looks back at us, father and child look forward.
Once Bing Chat loaded the photo into its context window, the real-time memory of our conversation, I gave it that exact description and tried the first of my categories: Help me understand this photograph.
It typed back a ticker-tape response, in the way these things do. The one-letter-at-time, the polite tone, the spare interface all calculated to draw us into dialog and trust.
Bing Chat dutifully complimented my description and followed it with three rich, fact-filled paragraphs. It summarized Eudora Weltys biography with emphasis on her early photography, the macro social and economic setting of 1936 Mississippi and the likely daily realities of the family in the wagon. Each paragraph was peppered with links to websites to ensure veracity.
There was no emergence, epiphany or poetry, but there was something that rivaled it. The response was pure utility: thoroughness, value and convenience. The system used as input the photograph, my description of it and the patterns in its training data to help me understand more about Eudora Welty and the lives of the family in that wagon, proofed by the Internet.
The difference in helping me understand versus simply informing me is to add context to facts, to make them more digestible, to spark a next question. This, by the way, is what Wikipedia does. It too summarizes almost the total set of human knowledge domains. But, to deploy a cinematic metaphor, it cannot adjust the aperture, pan left and right, or surprise in conversation.
This kind of information retrieval is where most of us will start with generative language AI. Google and Wiki didnt become verbs for no reason. To help us understand is where generative language AI will become generalized language AI and become our interface to allthat has been digitized, absorbing search and wiki in the process. How that manifests is a high stakes, thank-you-very-much-OpenAI, all-hands-on-deck work in progress for the big tech companies.
Now that the model and I had spoken about Eudora Welty and her picture taking, and GPT-4 had those Eudora Welty patterns front-of-mind, I thought it might stumble upon some emergent patterns if I asked it to help me engage her work, my second generalized use case category.
This is where the model is guided prompted to take on the role of tutor, coach and helper. This second category of how we will use generative language AI leverages all the functions of the first: information retrieval, a broad vista of subject matter and nimble content manipulation. But, it presents all of that in a scripted and structured way, of which real time presence, of both user and machine, is fundamental property. It is still conversational, but with the roles reversed: the model now prompts and encourages us to engage the content.
This time, I asked the model to take on the role of tutor. Teach me more about Eudora Welty’s writing. Give examples of her use of language. Ask me questions that develop my understanding of her writing and use of language, until I say “class is over.
It did exactly that. It presented lovely snippets from short stories, and asked about themes, humor, irony and symbols. When I didnt know an answer it would patiently explain, and continue until class was over.
At one point we discussed equity. As I engaged Miss Weltys writing, I thought of science fiction writer William Gibsons quote, The future is already here, its just not evenly distributed. I considered the family in the wagon. A future was showing up in Mississippi in 1936, automobiles, radio, electricity. I expect that family eventually got some part of it. Each time they got access to a piece of it they added to a baseline. The same will be true as these generative AI tools are distributed. We will each add the intelligence of the machine to our unique baselines.
In terms of equity, access to these tools is likely to be equivalent to access to the Internet today; most of us can get there. Where the question of equity becomes interesting is the extent to which we can take advantage of these marvels. We will each bring a complex alchemy of components to a generative language AI conversation. Innate intelligence, literacy, domain knowledge, curiosity, patience and courage are a part of what will define the advantage we gain from these tools. There can be levers that influence equity, but they will not be business as usual.
By any measure, this is a feat of engineering, a marvel of immense promise.
Imagine a rural school of limited resources with access to this expansive tutor. Caveats about the limits of online learning aside, how can it not be additive? What premium applies to literacy, beginning with children, in the presence of such tools? It is no small irony that Mississippi and Alabama are leading the country in relative gains in early literacy, mostly as an act of will and priority. When you are behind, small gains mean much more than when you are ahead. It may well prove that the value of generative language AI is greater to those with the farthest to go. Ive heard more than one AI thinker declare, because of language AI, English has become the most important programming language in this century.
For the third of my use categories, to create, it was time I asked GPT-4 to help me write. I asked the model to take all we had discussed and write a vignette that described the moment Eudora Welty captured with her camera. Of course, I asked it to use her voice.
GPT-4s response in a verbatim excerpt:
The sun was setting behind them, casting a golden glow on the horizon. The sky was clear and blue, the air was warm and still. The road was dusty and bumpy, the fields were barren and brown. The wagon creaked and rattled, the mule plodded and snorted. The family was silent and calm.
I saw the woman turn her head and look back at me. She had a round face and bright eyes, a scarf around her head and a shawl around her shoulders. She did not smile or frown, she did not wave or shout. She just looked at me, with curiosity and dignity. She did not seem afraid or angry, she did not seem ashamed or proud. She just seemed to be.
I pressed the shutter and captured her gaze. I felt a connection, a recognition, a revelation. I felt like I had seen something true and beautiful, something human and divine. I felt like I had seen her soul.
By any measure, this is a feat of engineering, a marvel of immense promise. Acknowledging all the fears about sentience and agency, about active misuse and unintended consequences, its ability to help us create can only be transformative in result and implication.
But, what of emergence?
I began this essay with a question. There are writers who can elicit within a reader a visceral connection with the universal and elemental just by describing a scene. I asked, is that mastery expressed in patterns in the language of writers like Eudora Welty in a way that generative language AI can detect and repeat? The vignette that GPT-4 produced did indeed approach that asymptote. It attributed possession of self and a soul to the mother in the wagon. How close did it get? Thats for the eye of the reader, but its greater than zero.
Had I simply asked ChatGPT to describe an image of a depression-era Mississippi family riding in a wagon, I wouldve gotten a flat description of a conveyance and a landscape but not much more. Instead, our back and forth conversation, and the photo itself, caused the model to ascribe characteristics of humanity to the passengers in a much deeper and unexpected way. This phenomenon of eliciting richer responses through iterative prompting may well prove to be the source of the unpredictable flashes of human-like intelligence these models sometimes produce.
Theres always two parties in a dialog, and our help me create dialog was no exception. If the depth of the conversation sparked a meaningful perspective in the machine, what about the human? A large language model has never fallen in love, held a baby or experienced a poverty of cupboard or heart. What happens when we add the Generative AI context window to our human experience? We, too, are synthesizing patterns in our own neural networks during these conversations. Could it be, the thunder ofthese systems is that they engender flashes of emergence in us? I was assisted; patterns were discovered and transmitted; connections made, scaffolded by facts. My ability was amplified, to help me render my own description of what Eudora Welty revealed in that photograph. In whose voice? Hers, its, mine? Isnt that blend, after all, what a tool is supposed to do?
Heres my description of Miss Weltys photograph after my conversation with GPT-4. You decide:
“It was my daddy’s favorite hour. Even coming home from Saturday town after a day of bossing ourselves, spending out of a Prince Albert sack like he wasnt the only one wearing a button up vest.
That old mule liked it too. He and my daddy knew without looking when the sun was touching the edge of a field. We rode on, night on our heel. Splitting right through that wore out ground on either side of us. Over that dirt road, if that’s what you want to call it, packed and dusted with a powder ground to diamond by a thousand rusty bands nailed to a wheel. The creak in the hames of that mule’s collar counted every turn, one, two, one, two. The sky sat right down on the edge of that field. What you couldn’t see, you could feel, and beyond it, plumb to Arkansas.
On an evening like this, after a day like that. We saw a picture show, up in the balcony, shoulders touching in the dark. Couldn’t see anything till the projector lit our hands and the shoulders in the front of us. My mama giggled a little when the screen did, felt like a pie smells coming out of an oven. My mama, giggling. About time for a pie. Just about time.”