listen to the voices

we are all talking and hearing so much + I interview Evan Ratliff, journalist and co-host of the (retired) Longform Podcast

Sep 26, 2024

I finally had the chance to listen to Shell Game a little while ago, when I was sick and couldn’t do much of anything. The context:

Evan Ratliff

, journalist and co-host of my favorite podcast, Longform (which ended this year, causing who knows how many tears to fall worldwide), created a voice clone of himself, hooked it up to an AI chatbot, and set it loose.

This is a Shell Game-inspired essay about voices, but there’s a bonus: this also made a great excuse for me to reach out to Evan, so we spoke about Shell Game and also, of course, about Longform.1 There’s a bit of irony: this essay is largely about voice, and my voice was still recovering when Evan and I spoke.2 I’ve posted the “actual” podcast version on the Gradient feed.3

1×

0:00

-1:18:42

The Conjurer — “cups and balls trick” a.k.a. “shell game” — Encyclopedia Britannica

I’ve really only noticed my voice — the kind of noticing that remains in your mind as a discrete, spiky moment in time, opposed to the noticing that turns into amorphous sludge once you’ve done enough of it — about four times in my life.

The first time was during one of those summer camps you send your kids to in hopes that they’ll get socialized and have something to do — we visited a radio station and recorded ourselves talking. I didn’t hear my voice so much as felt it was inflicted on me. It sounded so awful I left the room. I would later learn that we hear our own voices differently, because of something having to do with bone conduction. Bone conduction or not, I was sure I sounded like a cave troll.

The second time was when my cousin told me my voice had a nasal quality; I immediately changed something about it, and he noticed. Third: a few years later in my sophomore year of college, as I was walking back to campus in the evening and talking with a friend, a freshman I had met a few days before turned around and said she recognized me by my voice. Something about it being distinctive. I couldn’t and still can’t tell you what she meant, but I found it very funny because I had a crush at M at the time and, among other things, I was enamored with her voice — it was kind of low and resonant and not too serious about itself.

Maybe it was just three times.

In 1976, Don Ihde wrote a book about sounds and voices: Listening and Voice, a Phenomenology of Sound adopts a methodology based on Husserl and Heidegger to find a new language for the auditory experience. Ihde is not concerned with “voice” in the sense of your or my particular voices, but rather with the perception of and attention to the voice of all things that produce sound.

Listening to the voices of the World, listening to the “inner” sounds of the imaginative mode, spans a wide range of auditory phenomena. Yet all sounds are in a broad sense “voices” of things, of others, of the gods, and of my self … A phenomenology of sound moves …toward full significance, toward a listening to the voiced character of the sounds of the World (Ihde 147).

His “auditory turn,” an emphasis on auditory experience over the visual metaphors and frameworks that have shaped Western philosophy, has the kinds of commitments you’d expect: how do we experience being-in-the-world? Through sound. What reflects and shapes our moods? Sound. That “lifeworld,” the pre-reflective ground of all experience, is constituted by our everyday auditory experience. The world discloses itself to us through sound — yet another medium that exceeds text in its communicative capacities.

I thought of Ihde when I listened to Shell Game — Evan Ratliff’s new podcast, where he makes an AI voice clone of himself and sets it loose — because I wondered how he would feel about the acoustic ecology we’ve made for ourselves. Cities and sounds are a constant polyphony; you can shut everything out with your headphones so you don’t have to hear anyone; you can clone your voice so you don’t have to talk to anyone.

Evan begins by setting his voice agent on customer service representatives, then scammers, and eventually his own friends and family. Some figure out what’s going on quickly, laugh it off, say something like “So we’re really doing this?” — who do they think they’re talking to, since it’s clearly not Evan? — and try to have the conversation they were about to have with Evan himself. Others curse it out. One, believing the ChatGPT-powered bot, encumbered with high latency and hallucinations, is really Evan, becomes worried that Evan might be having a mental break.

It turns out that some people actually want to listen to AI-generated voices. A few years ago, I came across a Twitter user who was working on a series of scripts that would, given a research paper, turn it into a short “podcast.” He was prescient: Google has now made their own tool called NotebookLM that can do the same thing. A friend forwarded me this awful AI “conversation” generated from a book, and I only forced myself to tolerate the entire ten minutes so I could write about it. Much of it is really a monologue: a voice with almost no distinctive qualities whatsoever explains the book, as the other voice — which also has few distinctive qualities — adds incredible value to the conversation with a loop of “ok,” “mhm,” and similar mouth noises. They briefly exchange roles in a few spots, but it sounds like the caricature of a Socratic dialogue you’ll get from someone who hasn’t read much Plato.

They sound “better” in the formal sense than Evan’s voice clone, in part because there isn’t so much latency, probably because the system isn’t processing and generating responses to text in real time. They’re just emotive enough to convince you that you’re listening to actual podcast hosts.4 You can probably imagine an AI Evan Ratliff voice that sounds as natural as they do.

If the bots get better, what does Evan lose? After 12 years of Longform, he’s more of a podcaster than most can claim to be, with a distinct “podcasting voice” to back it up. In Episode 3, he asks himself: “to what extent was my AI’s voice… authentically me? Was it still the most important element of my personal brand?”

Does anyone have a voice? In the early innings of the Longform Podcast, when Max Linsky interviewed Ira Glass, Glass was well aware of the “Ira Glass Voice” phenomenon. That so many in radio and podcasting took inspiration from Ira Glass’s voice is both a metaphorical and literal point: the struggle to find creative freedom is universal; “NPR voice” permeated the airwaves and Glass became a sort of figurehead for a certain flavor of the elite podcaster.

New technology, in the various ways it brought us together, created a simulation of our most intimate interactions: we can all whisper directly into each other’s ears, all the time. You can hear the podcasting elite and teach yourself to sound like the podcasting elite, consciously or unconsciously: Mikhail Bakhtin thinks our language is not our own, fundamentally dialogic in nature, a social construction; perhaps the same is true of our voices. Ira Glass Voice has been joined by Ezra Klein Voice and Michael Barbaro Voice. People begin to sound the same because they know what listeners like. Deep and/or smooth voices. A “unique” articulation and pace. Guest hosts on “The Daily” often adopt Barbaro’s uneven staccato in the show’s final segment: here’s what else. you need… toknowtoday.5

Evan reports to us that, as he spent more time listening to his voice agent, he began to sound more like it — enough so that his wife asked during one conversation, “are you trying to be the AI?” If the most insistent, penetrating voices among us dominate our auditory commons, they become its center of gravity. As language models swell the world’s constant deluge of text, voice agents may add yet more dissonance to the soundscapes we enter and exit at every moment. Disembodied voices might come to sound just like us, but, to state the obvious, they’re not us. Our voices bear something of our relationships, our expressive intentions, our histories and presence.

Late in his book, Ihde writes about the shift from Bach to rock and the changing musical sensibilities of the young. One of his points concerns dominance — rock music is loud, punchy, insistent — it imposes itself on its listeners. His core argument concerns a deeper shift in sensibilities of which the shift in musical technology is only a symptom: we mold our concepts of ourselves on our concepts of the world, operate with myths “that contain this self-world interpretation function in terms of key symbols or metaphors” — the computer and the camera (and, of course, the language model) become our points of reference.

In Ihde’s words, a shift from the mechanical world to the electric world — with its images and metaphors that evoke flow, transformation, melting together — belies the shift in sensibilities that lead us from the “mechanical” sounds of Bach and Mozart and Vivaldi to the rock music of his time. I can only imagine what he’d think of EDC.

Ihde doesn’t think he’s witnessing the decline of classical music. He’s right — so many classical musicians are now content creators and enough people listen to classical music that they have an audience. But he does argue that our hearing and our perception are situated within an emerging metaphor. We can understand this — that we are attuned to “electric flow” and its perceptual manifestations — and try to maintain our grip on the full range of possibilities in sensory experience, to understand that our metaphors are just metaphors, and to grasp the genuine potentials with us and ahead of us, even as dominant metaphors and voices shift under our feet.

Evan seems optimistic, too. He told me about one tentative conclusion from Shell Game: that these new technologies could drive us back to human conversations. Maybe we’ll come full circle, and the gradual disembodiment of voices — through technologies that allow us to speak at range, technologies that allow us to not speak at all — leads us to the recognition that sound-as-world-disclosure feels unbearably thin when we leave the rest of the world out of the picture.

I hope he’s right. I want to know and remember what my friends and family sound like, and what it’s like when we’re together.

Something I appreciate a lot about Evan’s perspective on his own work is he doesn’t want to invent grandiose reasons that aren’t there for the work he does. Not everything in the world has a theory of change. I’m thankful to have had another conversation with someone thinking deeply about and throwing himself into living with the technologies we’re developing. I’ve done a couple of these: most recently with Clive Thompson, L.M. Sacasas, Joss Fong, and Nicholas Thompson (who, if you’ve followed Evan’s other experiments, siphoned out information to the Evan Ratliff Hunters when Evan tried to disappear for a month).

I mentioned worrying about sounding like a Max impersonator at least once — here’s when I think that happened, and I had been listening to a lot of Max’s interviews in particular around that time!

On that note, if you write — especially about any kind of art — you should send me email / a message because I probably want to interview you!

not good ones

I’ve only recently begun to hear Sabrina Tavernise trying something different: here’s what else you should know today. No awkward spacing.

sincerely, in jest

Discussion about this post