Language and Evolution
Date Posted:
June 30, 2017
Date Recorded:
May 30, 2017
Speaker(s):
Noam Chomsky
All Captioned Videos CBMM Special Seminars
Description:
[audio only, no slides or other presentation media was used]
On May 30th, 2017, during a CBMM Research Talk, Noam Chomsky shared insight into his latest book, “Why Only Us: Language and Evolution” (co-authored with Robert C. Berwick) where they discuss the biolinguistic perspective on language, which views language as a particular object of the biological world; the computational efficiency of language as a system of thought and understanding; the tension between Darwin’s idea of gradual change and our contemporary understanding about evolutionary change and language; and evidence from nonhuman animals, in particular vocal learning in songbirds.
NOAM CHOMSKY: Well, I'd like to start 500 years ago with the origins of the modern scientific revolution, when insights were produced that were not understood and laid dormant for-- ever since, almost. But I think are the most significant things that have ever been said about the nature of language and mind and the kind of challenges that we face in trying to address them. So this is Galileo and his contemporaries, and I'll quote their own words, they express their amazement and awe and wonder at the fact that "language permits us to construct from 25 or 30 sounds, an infinite variety of expressions, which although not having any resemblance in themselves to that which passes through our minds. Nevertheless, reveal all of the secrets of the mind and make it intelligible to others who cannot penetrate into the mind all that we conceive and all of the diverse movements of our souls."
Galileo himself described the alphabet in which has these properties as surpassing all stupendous inventions. The force and significance of this were never recognized, in fact the passages themselves were-- I can't find any reference to them and the entire literature up until about 50 years ago. And I noticed them and quoted them, but I don't absolve myself, I didn't give them the significance they should have had. This really, I think, tells us what the study of language and mind is about and of course it's something unique to humans. There's no hint of anything comparable to this anywhere else in the organic world. I'll come back to that.
Well, in the mid 20th century, it finally became possible to do something with those notions. That's the result of the great mathematicians Godel, Turing, others who established the theory of computability on firm grounds. And in particular, made it clear how a finite object in the brain can have the capacity not to do what is here that remains a mystery. But at least the capacity to yield an infinite array of expressions which articulate to ourselves sometimes outside our own thoughts, so some kind of language and thought. Actually there are a couple of important flaws in this classical exposition of the central questions of language and mind. One is that it's limited to speech. They talk about 25 or 30 sounds, and in the past generation roughly we've come to understand that the articulatory system is basically irrelevant to language. That the language system is modality independent.
So it's now known, in fact, that not only that sign has the same properties as spoken language, the same mode of acquisition, even the same neural representations, [INAUDIBLE] is visual, but in fact that sign language is just spontaneously invented by deaf people. So there is now cases of Nicaragua, Yemen, elsewhere where small groups of deaf people have just created their own language with no linguistic input. In fact, they are even spectacularly cases where groups of children with no linguist, deaf children with no linguist, have just invented their own language. Actually this was discovered 65 years ago by Eric Lenneberg, the guy who really created the modern field of biology of language. We were friends, graduate students at the same time at Harvard and he was then just investigating different kinds of language pathology acquisition and other things.
Among the things he did was go to visit the school for the deaf in Boston. At that time, there was what was called a oralist tradition. Children who were deaf were not allowed to learn sign, it was considered necessary that learn lip reading. So teachers were instructed not even to gesture. Eric noticed that in the class when the teacher turned to the blackboard, the kid started gesticulating to each other. They had obviously invented their own language. But this was considered so exotic at the time that he never even published it. It just looked like a curiosity, but now it's understood. And this is of some significance because it tells you that the aspects of language that are related to the sensory motor modality, usually speech, are really not part of language. They are part of some system that long antedates language and the evolutionary record hundreds, thousands, or sometimes millions of years. And language, when it came along, when it was externalized, which it sometimes though rarely is, it picks up one or another sensory modality.
But the properties of what we call language that relate to the sensory modality are essentially extrinsic to language. I'll come back to that, has lots of consequences. The other flaw in the way it was described in the quotation I gave, which is standard not only for then but for [INAUDIBLE] followed up until recently, is it is formulated in terms of production, the production of speech. But language is not just production. It's also perception and acquisition and so on. And that tells us that there's some internal system accessed for production. Usually just internally we think in language, but sometimes externally in one or other sensory modality, accessed for a perception, acquired. And that's quite a different thing, languages in this respect, like our knowledge of arithmetic. When you want to understand how our knowledge of arithmetic, which is apparently a universal human characteristic, is accessed you look at things like how you multiply numbers.
But a lot of factors enter into that like short term memory and other things. So what the actual process of, say, multiplication is what you're using the system for but the internal system is there independently of how it's accessed. And there's pretty good reason to believe it's a unique and human possession, rather like language, possibly related to or even derivative from it in some ways. When we overcome these flaws we forget about the particular mode of articulate externalization. Pay attention to the fact that we're talking about an internal system which can be accessed for production or perception. Then when we've overcome that, we can recognize that language has a kind of a basic property. The basic property is that our knowledge of language yields an infinite array of structured expressions, each of which expresses a thought, and which sometimes can be externalized in one or another sensory modality. That's the basic nature of language.
So basically it's a system that creates what's sometimes called a language, an infinite system that as those early observations point out 500 years ago, that expresses, that captures, in fact, constitutes a large part of our thinking. That part of thinking which is related to language, maybe there's other parts. That's the basic property. It's a species property, it's uniform among humans as far as we know. There's no group differences in language capacity, there may be some marginal individual differences. But you take a child, an infant from a tribe in Papua New Guinea, which hasn't had other human contact for 40,000 years, and bring them up in Boston, they'll be like everybody else. Go to MIT, study quantum physics, and so on. And conversely there seems to be no differences.
So it's uniform among humans, which tells us a lot. I'll come back to it. And it's unique to humans. There's nothing remotely similar anywhere in the animal world. As you may know, there's been elaborate efforts to try to train chimpanzees nearest relatives something language like, and they completely fail. Looked for a while as if they were getting somewhere, but by now it's understood that there was nothing with all the attempts. Recently one interesting study by Charles Yang, done important work on these [INAUDIBLE] he went through the productions of Nim, the ape that was brought up with great effort to try to duplicate human upbringing in the world. And Nim did make lots of sequences of signs. But it turns out that there's no productivity, he has a very precise measure of productivity in terms of there's nothing there. Nim's just randomly making signals, hoping presumably that these guys out there will give him a banana or something like that. But there's nothing, no other organism has anything remotely like the basic property.
So that raises a lot of serious questions for biology. Also it turns out that not only is it distinctively human, but it's dissociated from other cognitive capacities. And this is again something that Eric Lenneberg's pioneering work began to establish but now there's a lot more known. But there's double the associations, you know, each way between the language capacity and just about any other cognitive capacity you can think of. There's a lot of consequences to this but whatever it is, it's something unique, something unique in the organic world different from other human cognitive capacities. Probably the source of what makes human beings so distinct in many ways from anything in the organic world.
Well, if you try to address the basic property, there are centrally two questions to any computational procedure and this one in particular. There are going to be certain atoms of computation. They may be analyzable in other terms, but from the point of view of the computation that yields the language of thought, they're atomic, you don't go into them. And then there's a computational procedure which constructs more complex objects, infinitely many of them from these atoms. Questions to look at. So let's start with the atoms. One important thing about them is that they are, again, unique in the organic world. There is nothing comparable among animals. This is commonly misunderstood. This is a problem that comes up incidentally, commonly-- all of this bears-- and it's totally different from animals systems. Animal systems have a property, they satisfy what's sometimes called the referential doctrine. The doctrine that says that symbols pick out something in the external world. It's what Frege would have called [SPEAKING GERMAN]. It's what contemporary, formal semantics is based on, model theoretic semantics and so on.
It's all through psychology and it's part of the machine learning systems as well. So the machine learning, a lot of machine learning work as all of you know, which is working on what they call concept formation and trying to identify human concepts. And we know in advance that they're all going to fail. And the reason is that there are no referring expressions in language. So I'll just quote a recent article, a current one. Comes out of Google, in this case, mostly defines the referring expression. I'm quoting it, "Referring expressions are defined in terms of the attributes that people use to describe visual objects." Attributes means things like color and size, shape, and so on.
But there's a problem. There are no such referring expressions in language or in thought. That's just not the way human expression-- there are expressions that we use to refer. It's quite different from having reference. References a relation between a symbol and an object. For again, [SPEAKING GERMAN] referent and onto modern logic and philosophy and psychology. Referring as an action you use [INAUDIBLE] that involves actions, involve all sorts of things. And this much was known to the Greeks, classical Greece. It's been forgotten but it goes way back, so the first serious question in philosophy and science was actually raised by Hericlitis.
Hard question, not answered yet, how can you cross the same river twice? So I crossed the Charles River on my way to work, I'll cross the Charles River on the way home. But what makes it the same river? It's physically totally different. All the molecules are different, there's a different shape, and so on. But what [INAUDIBLE] river? Some modern philosophers, Ben Quine and others have suggested we can solve the problem by just taking a river to be a four dimensional object, but that just avoids the problem. The question is which four dimensional object? Why this one and not some other one? And when you think about it, the problems are pretty severe. You can make radical changes in the Charles River and it'll still be the Charles River. So if you reverse the direction so it goes the other way, still the Charles River. If you replace whatever is in it by 95% arsenic from an upstream plant or something, still the Charles River. If it goes in a different direction and ends up at a different place, still the Charles River. In fact, massive changes and it'll still be called Charles River.
On the other hand, trivial changes, some of them centrally and almost undetectable, will prevent it from being a river at all. So if you put up boards on the side and start using it for oil tankers, it's a canal it's not a river. If you harden the surface, a phase change, virtually undetectable and you paint a line down the middle, and people start using it the commute to Boston, it's a highway it's not a river. And if you play around you can easily convince yourself that [INAUDIBLE] changes will keep it being the Charles River, and trivial almost undetectable changes will make it not a river at all. And the reason was understood by the classical Greeks. We individuate objects not by their physical properties but by their use, their design, the way people deal with them and so on.
So for example, Aristotle asks, what's a house? And he says partly it's material, it consists of boards and bricks, whatever. Partly it's the design of the person who created it. It was built to house people. If it looks exactly the same way and it was built in order to house books, it's a library it's not a house. So but what's in the mind of the architect is nothing a physicist can detect, or a machine learning system can detect. You can do as much work as we'll ever be greatest success you can imagine in machine learning of machine object recognition, and so on. It will never get you to human concepts and the work on the atoms of language, because they're just different.
If you play around, this is [INAUDIBLE] word you can think of. Human does just not meet the referential criteria. There are no referential terms in human language, which is different from saying that we use terms to refer, of course we do, carry out the action of referring. But there are no terms that refer. That means all of formal semantics is syntax, it's not semantics. Semantics is based on reference and there is no reference in natural language. The machine learning, you know, whatever value it has [INAUDIBLE] but it's not going to get to language and thought because they just don't work that way.
This also raises a serious question about evolution. How this could have evolved is a total mystery. Not even a hint of how anything like this could evolve. There's nothing like it in the animal world, it's unique to language, it's universal to language and thought, it's connected to language. So it's just one of those mysteries which we'll probably never solve because no imaginable way in which we can get relevant evidence. Maybe someday something will come along but not now. Anyway, that's the atoms we can learn a lot about them, but where they came from and anything like that remains for the moment an impenetrable mystery.
So let's turn to the computational procedure. There we can make some progress. First of all, we'll obviously seek the simplest computational procedure. That's just normal rational inquiry. Simplicity and explanatory depth are essentially the same things said differently. So we want to find the simplest computational procedure that will in fact accurately characterize the interpretation of the expressions of language. And there's an extra reason in the case of language beyond general methodological reasons, and that has to do with the little that's known about the evolution of language. Now there isn't much known, there's something. And what's known is rather telling. Roughly it looks like this. Bob and I have a book about it that reviews much of what there is and a good deal has been discovered since that book came out. [INAUDIBLE]
What could be the case as is this. Modern humans, anatomically modern humans, show up around roughly 200,000 years ago. That's when you start, very small groups there were not many of them for a long, long time. But there were anatomically modern humans, detectable roughly 200,000 years ago. Prior to this, there is no indication at all anywhere in the archaeological record of any symbolic activities of a pre-human, non-human homonyms. So those could be anything. So it's a fair guess that language didn't exist before modern humans came along.
It's now known that humans began to separate into different groups. Not very long after that, genomic analysis has shown that some of the early separations, a particular group in Africa, it's called [INAUDIBLE] group, separated from other humans maybe 150,000 years ago, something roughly like that. They have the same language capacity that everyone else does.
Which means it's a common human faculty, that never changed. The That gives you a very small window from an evolutionary time scale within which the basic property must have emerged. Incidentally, the fact that modern humans can and do invent their own languages with no linguistic input at all, which I mentioned earlier, indicates that they could have done it 200,000 years ago when humans developed. What this tells you is that very likely the basic property of language emerged along with modern humans or maybe very shortly afterwards in evolutionary time. Remember, very small groups here, thousands of people at the most.
Now, there's further work. There's very interesting paper that just came out by a linguist [? Rene Hoybrooks, ?] a Dutch linguist who discovered that the group that separated say 150,000 years ago, have a unique form of externalization. These are all and only the languages that use what are called clicks, which are made of, some of them dozens of clicks, for us extremely difficult consonants to articulate. But they have lots of them. And apart from irrelevant exceptions, they all have these phonetic systems that no other language does. That's something. They also have apparently some slight articulatory adaptations which change in the alveolar ridge at the top of the mouth which seems to be adapted, maybe it's a pronunciation of clicks. All of this suggests that the externalization of language began after the separation. They have the same capacity for language, same internal language as us but that different externalization.
Well, in a sense it's kind of a matter of logic that externalization came after the internals [INAUDIBLE]. This suggests there may have been a time gap. And putting that together, it tells us that there is a good reason to believe that the internal system should be extremely simple. Because it emerged very suddenly, no external pressures, no selectional or other pressures. So it just presumably emerged in the simplest possible way. That's what we'd anticipate. So a good guess in investigating the basic property, is that it [INAUDIBLE] something like a snowflake.
It's form comes just from natural law. Natural law in this case would presumably include, and maybe be just considerations of computational complexity, it's a computational system. So expect to find a very simple system, computationally as simple as possible, which yields the internal language, the language of thought. And then can be externalized one or another way, different ways and different groups even [INAUDIBLE] modalities can be done instantly by touch, also harder, but it's done, Helen Keller-type case.
Well, that's a start. And it tells us the kind of null hypothesis, a, place to start, is to consider the simplest possible computational operation and see how far you can go and accounting for properties of language on that basis. And it turns out you can go pretty far. , We don't know how far but quite a lot. So the simplest computational operation which is found somewhere in every computational procedure, is just an [INAUDIBLE] which takes two objects already constructed, beginning with the atoms then anything that's already been constructed because it's a recursive procedure. So two things that have been already constructed and forms a new thing from them. That's somewhere in every recursive procedure. And in the simplest case, it won't change them. So if x and y are put together to form z, x and y aren't changed. And no further structure is assigned to x and y. Like crucially linear order, that would be an extra complexity. But if it's the simplest [INAUDIBLE] it will take x and y and form, in, effect the set containing x and y with nothing else. That's what's been called merge in the recent literature.
If you take a look at the operation merge, just as a matter of logic, it has two aspects. You're taking x and y from the set xy, either x and y can be distinct from one another, or one of them can be part of the other, say y can be part of x. Those are the two possibilities. Actually takes more work to prove this, but that's the essential fact. Well that suggests [INAUDIBLE] should have two kinds of ways of constructing utterance. One, say take something like say, read, not the word read but the abstract entity underlying it, and books and put them together and you get the set [INAUDIBLE] read and books. Take the sentence, John read what. You can take what, which is part of John read what, and merge it with John read what, and you get what John read what.
Well, it turns out those are the two basic operations in language. The first one, sometimes called external merge, constructs a hierarchy of expressions unbounded. The second one yields what's called displacement. You don't pronounce what John read what, but that's what you interpret. You are putting the word what where you hear the word what in one position, but you're interpreting it in two positions. It means something like the what that you hear also happens to be the object of read. You're reading that thing. So the sentence means something like for which x John red x, you know, it's which book did John read for which x, a book, John read the book x. And that's displacement. In the mind, you hear both, there's both of them. In the externalization, they're one. But that just follows from considerations of computational efficiency, and the externalization you're going to make it as simple as possible.
So therefore you have you have to produce at least one, or there's no indication that the operation ever took place. So you just delete all the others. And notice that that causes very serious problems for communication. Those of you you've worked on parsing systems, know that one of the main problems is what are called filler gap problems. You hear the word what at the beginning of a sentence, and you somehow have to find out where it is. I mean I took a simple sentence, you take a more complicated one and could be quite a problem. So filler gap problems are a major, maybe the major problem for automatic parsing and answer perception.
They could easily be resolved just by pronouncing both of them. But we don't because presumably reasons of mutational efficiency for the externalization system. But we interpret both of them. And you get pretty complicated cases. So I'll give you a couple of examples. So take the sentence it's the mother of his children that every man admires. Every [INAUDIBLE] over his, right? His variable within the quantifier every. Switch it around a little. It's the mother of his children that admires every man. Every man has nothing to do with his. The reason is what you're hearing in the mind is it's the mother of his children that every man admires the mother of his children. They're, every, and his are in the right structural position for qualification. In the other case, what you're hearing is it's the mother of his children that the mother of his children admirers every man. And that's not the right structure for quantificational relations. And I won't go into it here, but there are much more complex examples which are the same thing.
And what they show is that-- it had been thought for always in fact that the displacement property of natural language which is ubiquitous, was some kind of a strange imperfection. You'd never construct a formal system that has that property. For example, metamathematics, you know, make up such systems or computer languages, or whatever. But it's ubiquitous in natural language and it follows from the null hypothesis. Including the interpretations that come, which is quite a remarkable fact. And one of the things that suggest that the null hypothesis, which is sometimes called the strong minimalist thesis, probably is correct.
Well let's proceed. Another major property, a very puzzling property, which was just noticed for the first time when the first efforts were made to construct generative grammars about 60 years ago. Grammars that actually try to satisfy the the Galilean challenge that I quoted, the first ones to deal the basic property. The one very curious fact about natural language is that the rules ignore linear order. They simply ignore linear order. They look only at structure, the property sometimes called structure dependent. So for example, take, say, the fact that let's say, a trivial sentence-- let's say a conjunction, John and Mary are in the room. Not John and Mary is in the room. If you look at the Google natural language program, it can't distinguish these two. And in fact, John and Mary is in the room should, on statistical big data grounds, be much more highly probable. Because Mary is is a bigram frequency, is way beyond Mary are. And in fact, more complicated cases like this turns out is irresolvable if you keep to the insistence of the Silicon Valley analysis that you only look at words and never phrases. One of the reasons we know it's going to fail totally, you know, no matter how useful it may be, but as an engineering achievement. So it's going to fail for elementary reasons because languages just don't pay attention to linear order, they pay attention to structure. There's a very common phenomenon in language called verb second. The verb comes in second position. You see a little of it in English and all over the place in German [INAUDIBLE] So which book will you read, the will is in the second position, but it's not the second word. It follows the first phrase, it's in the position after the first phrase. So you're looking at structure not the linear arrangement.
And take a last example, there are thousands of them. If you take a sentence like Eagles that fly instinctively swim. That's ambiguous, could be fly instinctively or instinctively swim. [INAUDIBLE] instinctively in front, instinctively eagles that fly swim, resolves the ambiguity, but also in a strange way. Instead of taking the adverb going too linearly closest word, it goes to the linear or more remote word. Structurally closest. This is ubiquitous all over natural language. Every construction that's known works like this in every language. And the reason it's puzzling is the computations with linear order are far simpler than computations over structures, that's transparent. But language universally picks the computation over structures and ignores the one over linear order.
Actually there's even a neuro-linguistic support for this. [? Andrea Mora ?] who is here, or his work that he initiated with his group and were able to show that with invented language, a kind of language like invented languages, if the invented language kept to the linguistic principle of ignoring linear order. Subject that was action in the normal language areas of the brain.
If the invented language had linear order, what you got is diffuse neural activity, meaning people are just treating it as a puzzle not as part of language. There's actually psycholinguistic evidence that shows that if you present things like this to people and tell them it's a puzzle, they're not the ones that use linear order, then they can solve the problem. If you tell them it's a language, they're inhibited from solving the problem. [INAUDIBLE] And the striking fact that the simple computation of linear order is never used, only the computation of structural order. Which incidentally follows from the null hypothesis. Because remember that it doesn't introduce order.
So why do we have order in language? Well, if you try to externalize it, the sensory motor system requires order. You can't talk structures. Sensory motor systems says you got to put the words one after another. In fact if you're using sign and sign language, they're somewhat different structure arrangements. Sign, you have visual space. So you can pick out points and use them for reference and so on. But whatever sensorimotor modality you're using, is going to be a kind of a filter. Some things are going to have to go through it, and when you study what comes out, you're really not studying language. You're studying the effect of some sensory motor system on the internal language. Sensory motor systems were around way before language emerged. In the case of sign, millions of years. In the case of articulation, hundreds of thousands of years. So definitely nothing to do with language, they're just sitting there.
And when you study the relation between internal language and the sensory motor system, you're actually not studying language, you're studying an interaction of two systems which have nothing do with each other. And paradoxically that's about 100% of the invention of language. Take a look at the whole study of language for the last 2,500 years, and today it's almost always studying externalization. It's not studying the internal system. Which means that all of the rich, exciting, informative work about language isn't about language. It's about the interaction of two independent systems, one of which sort of filters what's inside.
When we proceed, I won't go on with this. It turns out that the internal system, the one that yields the language of thought, couldn't very well be uniform among humans. And that accords with what little we know about the evolutionary history. So kind of a good guess at this point, kind of a challenge to work on. Just to see if you can show that the null hypothesis, the strong minimalist thesis, cases actually accounts for the internal system. Which may be uniform, or virtually uniform among humans, and is the actual species property. Then there is the system of externalizing it, which is done in various ways.
And when you look at languages, what we call languages, they turn out to be extremely-- they look very complex, they differ from one another in all sorts of ways, they change very rapidly, every generation they can change. But it seems that all of that complexity and variability and mutability may be just part of the externalizations. Which would not be surprising if it's complex, because you're trying to relate two systems that have nothing do with each other. So of course it's going to be complex. And it could turn out that the actual system of language extricated from these other extraneous elements, like the [INAUDIBLE] sense of modalities, could be kind of like a snowflake. Just something that arises because that's what natural law tells us. So we would be looking from an evolutionary point of view for some change that's maybe small rewiring of the brain that took place maybe, roughly 200,000 years ago, roughly along with modern humans, which yielded the basic property. And then later, one or another form of externalization is tried. They could all be different, they could vary and so on.
I think that's a fair analysis to what might turn out. There's a huge challenge to show this because remember that just about all of the work by now, we have a ton of descriptive material from every possible imaginable language, almost all of this learned in the last few years incidentally. There's been an explosion of understanding and investigation of typologically varied languages at a kind of depth that was never possible before. But almost all of it is study of externalization. So what you have to show somehow, a real challenge, is that if you [INAUDIBLE] the effect of sensory motor systems, you get rid of the apparent complexity and variety. Of course you have to account for how the externalization yields that external variety, but that's a study of the relation of two different systems, language and something else.
Well, there is some steps towards finding out something about what the neural basis of this might be, but it's a tricky problem. And notice that all of this work, it leaves us with two fundamental mysteries which we don't know how to solve. But one of them is the nature of the atoms. The other is the original challenge from hundreds of years ago. How do use this richness of internal structures to construct internally [INAUDIBLE] in an appropriate way? We don't just talk randomly or think randomly. When we're thinking in language it's appropriate to some circumstances and situations. That was the original problem. The creative aspect of language use and thought and that remains totally mysterious. I should say that it was this property that led Descartes to make a distinction between body and mind. He thought he could show that every aspect of the world, including every aspect of humans, [INAUDIBLE], perception, emotion, and so on. They could be accounted for in mechanical terms. With this property, which he identified, same one as in the comments I quoted, of being able to construct your thoughts internally in a way that's appropriate to situations. And if you want to convey them to others, you could have access to your mind this way then. That was the reason why he postulated a second substance and it remains as much of a mystery as it was hundreds of years ago. But that's I think roughly where we stand today [INAUDIBLE].
[APPLAUSE]