Do LLM's have Language ?
exploring language a biological object of the human mind/brain
My First Encounters with Generative Grammar
When I first encountered universal grammar, I was resistant to its core claims. Specifically, I was skeptical of the claim that language was not really ‘learned’ in the traditional sense (from listening to our parents, being told what words mean, etc). I thought, how is this possible? Surely, we ‘learn’ language. How wrong I was.
It took just a few weeks of technical exercises illustrating the shared underlying structures of different languages, and the evidence coming out of labs showing how rapidly infants acquire language, for my skepticism to dissolve. I was completely amazed at how intricate language as a biological system really was.
At the same time, I was always a ‘big picture’ thinker, so becoming a linguist made no sense to me. So I thought I should look into the philosophy of mind.
To this day, the connections between philosophy of mind and the study of language as a biological object of the human mind/brain are pretty tenuous. But I don’t think they should be. Human language is one of the most well understood systems of the human brain, and it reflects very deeply some of the most fundamental and shared properties of human beings. in a word, language tells us a ton about the mind.
During my PhD I studied language from this angle anyway. I did this mostly through Noam Chomsky’s work on the issue, which was highly accessible and (to me) deeply interesting (for context, he basically founded this field, a field that is now studied all over the world and deeply prolific in its insights in syntax, semantics, phonetics, and more). I gradually found the work of adjacent syntacticians, as well as work from neuroscientists and psychologists that further cemented the idea that language is a species-property and biological object of the human brain.1 I loved this work so much, it became a core part of my PhD thesis.
Here are some examples of general things I learned that fascinated me.
(1) what we call different languages, e.g., English, French, Japanese, Swahili, all share the same fundamental underlying structures and properties. There is now tons of research on this stuff, showing how surface level differences between languages are just that — surface level — and how different languages have at least as much in common than they do not in common.
At the same time, ‘surface level’ differences are by no means unimportant and you can learn all sorts of cool stuff about your own language in studying this stuff too (the first time I learned something about Irish from this work, I felt so seen, and yes, Irish is a language that is not English!)
Chomsky suspects (though this is not a universal view in the field) that there is really only one biological language, which humans innately acquire, and that this language gets expressed in each of us in different ways depending on what languages we hear in our environment as a child.
As someone who learned multiple languages in the first five years of life, this made so much sense to me. My brain in its early years always felt like a hodge podge of different languages. It also made sense of my need to speak of ‘Irish-English’ in contrast to ‘British-English’, for example, when explaining certain cultural turns of phrases.
The phrase ‘giving out’ in Irish-English, for example, is a direct translation from the Irish ‘tabhairt-amach’. The term ‘giving-out’ does not mean to distribute things, but ‘to complain’ or ‘scold’, i.e., it means what it means in Irish.
So this to me was one fascinating property about brains that I figured would be of use to me in the philosophy of mind, namely, that what we call ‘different languages’ are in some deep sense expressions of the same underlying shared language, which we all express in our own idiosyncratic ways.
(2) I also learned that this was true of sign languages too. There are many studies, for example, showing that spoken language and sign languages share the same fundamental syntactic properties—it is just the ‘modality’ i.e., the way the language is expressed, that is different in each case. In the case of sign languages, we use a visual-manual modality to express language whereas in the case of spoken language we use a sensory modality.2
To me, this is another excellent illustration of how deep the nature of human language goes. It’s not just about the words we speak or sign. There are deeper underlying cognitive structures that constitute human language.
(3) a third thing I learned that I absolutely loved? prescriptive grammar has basically nothing to do with language. This made SO much sense to me. My brain has never really cared about whether I should use the term ‘which’ or ‘that’ in a sentence. good to understand why my brain doesn’t care about this. it is a social convention and my brain doesn’t appear to love those.
So What is Language?
Chomsky (and the generative grammar tradition more broadly) views language as an innate mechanism of the human mind/brain. In other words, ‘language’ is not the words we utter or the words we see written on a sheet of paper. Language is a computable or ‘implementable procedure’ (a ‘computation’) that constructs an infinite array of hierarchically formulated expressions in the mind of a human being. Now, I’m pretty sure there is disagreement on what exactly this computational is or looks like (it’s worth remembering that syntax is an entire field of study). Much work is being done - in a field usually known as ‘syntax’- to figure this out, but since I’ve followed Chomsky and adjacent syntacticians work mostly closely, so that’s what I’ll discuss.
The name for the most paired down understanding of this computation in the literature is Merge. This computation is posited to construct linguistically formulated expressions from a set of ‘primitive atoms’ or ‘syntactic objects’, generating linguistically formulated thoughts.3
Here is a description of the computation Merge. Merge is an operation that takes two syntactic objects, call them, X and Y, and forms a new object, call it, Z, defined to be the (unordered) set, {X,Y}.
Two cases are distinguished when Merging X and Y. Either X and Y are distinct syntactic objects, where neither X nor Y is a term of the other (this operation is called ‘External Merge’), or one of X or Y is a term of the other (this is called operation ‘Internal Merge’).
These different forms of Merge can yield different kinds of sentences e.g., if X is the syntactic object ‘read’, and Y is the syntactic object, ‘that book’, then using External Merge the brain forms the expression {read that book}.4
Ok, that was a lot. But bear with me!! the basic point is that there is a simple computational system in the human brain (which the above description of Merge describes) that can generate infinitely long sentences. this computation forms the basis of all of our linguistic expressions, with more complex rules being required to explain how we get the many varied kinds of sentences we are familiar with producing on a daily basis. This computational system is language, and language, so defined, is a species property of human beings.
Linguist and neuroscientist Andrea Moro illustrates the nature of Merge with simple and intriguing examples. He takes the example of how ‘‘John runs’’ and ‘‘this fact surprises me’’, can be combined to produce the sentence ‘‘This fact that John runs surprises me’’, which can be further merged with the phrase ‘‘Mary Loves’’ to produce the sentence, ‘‘This fact that John whom Mary loves runs surprises me’’, and so on ad infinitum.5 The sentence ‘‘This fact that John whom Mary loves very dearly runs surprises me’’ also makes sense, etc.
You can keep going. Try it! your brain will keep on giving (though eventually you’ll probably confuse yourself, at least that what happens for me).
Notice how these sentences are kind of hard to keep saying but are perfectly grammatical. Notice also how the structure of these sentences are nothing like what an LLM produces. Chomsky’s view is that these are core examples of how our brain ‘thinks’ in language, where ‘language’ refers to the computational system of the human mind/brain.
Here is just one fascinating neuroscience result that illustrates the nature of this system: it’s been found that ‘computer languages’ as they are called, do not activate the same brain areas as natural languages do (nor for that matter, does math or logic). Learning code activates a general purpose brain network instead. In other words, your brain interprets learning code as something like a puzzle, not as learning a language.6 Your brain is wired to identity language as a unique biological thing, and ‘language’, according to the research, is a distinct system.
Language vs Language Use
Now you might be asking, if language is a biological computational system of the human mind/brain, what the hell is ‘language’ in the sense that we ordinary use this term? Well, that is a big question. There are all sorts of political and social factors, for example, that shouldn’t be overlooked in understanding what gets considered a ‘language’. But very roughly, we can draw a distinction between language in the biological sense, and language use, which of course is also a deeply important part of language that can be studied on its own terms.
Now, nobody who studies language as a biological system is actually suggesting that in making this distinction we need to change our ordinary use of the term ‘language’ to ‘language use’, nor is this distinction intended to denigrate the very important work that is done on the study of language use. It is just to say that this conceptual distinction exists, and that there is a wealth of evidence to suggest that language proper is a biological property of human brains, which can complement the study of language in a number of other domains.
Language and LLMs
It’s interesting to think about these feats of studying human language as a biological object since the 1950s, with the onset of large language models. The name large language models suggests that LLMs have language, but do they? Well, this will depend on how we think about ‘language’.
If we understand language to be a computational system of the human mind/brain, LLMs certainly don’t have it.
Let’s look at a few reasons why. First, human beings acquire language virtually automatically from a very young age, something I spoke about here, whereas LLMs do not acquire language at all. LLMs are trained on linguistic artefacts, i.e., products of human language use, be that from texts, reports, etc.
Second, even supposing that LLMs ‘acquire’ language in some sense, if you study the structure of human languages you see that its core properties are completely lacking in LLMs. LLMS for example, do not produce sentences using MERGE.
LLMs ‘acquire’ language on this view by probabilistic patterns of reasoning, whereas humans recursively apply a computational system to syntactic objects, generating, as we saw in some examples above, very different kinds of sentences to the kinds of manicured sentences produced by LLMS.
Now, there is all sorts of research trying to discredit a broadly Chomksyan approach to language. Frankly, I think this research is pretty groundless.7
But if we do resist the idea that language is a biological object of the human mind/brain it is a lot easier to claim that LLM’s ‘have language’.
Real question, though, why do this? Why resist a whole discipline of cumulated scientific knowledge on this topic, and continue stating falsities like LLMs have language? This will have to be a topic for another day, as will comparing language to what LLMs do. But for now, I want to leave you with some suggestions for where on Substack you can find linguist’s works.
The Strategic Linguist has recently done an amazingly accessible and insightful post detailing some of the wide varieties of issues that arise when studying language in a scientific way. You can find that post, here!
This post also offers a list of other substackers that do linguistic work here on substack. Because if you take anything from this post, I’d love it to be this. There is so much more work to be done raising awareness around understanding ourselves and the human brain, to actually appreciate how amazing the human brain is. Understanding language is a big part of that.
My broad thesis, discussed here, is basically that in the age of AI, this work is becoming more important than ever.
Language is one of the most well understood biological organs of the human brain and the field is INCREDIBLY underrated and neglected. I’m not a linguist, just a linguist enthusiast, so I urge you to look into The Strategic Linguist’s work and those she recommends. Let’s start to better understand ourselves and our brains!
Suggested Readings (for those interested in learning more!)
Accessible Intro to Language
Chomsky, Noam. ‘What Kinds of Creatures Are We?’. New York: Columbia University Press, 2015. Chapter 1 called ‘what is language’?
Moro, Andrea, and Chomsky Noam. The Secrets of Words. Cambridge Massachusetts: MIT Press, 2022
Neurolinguistic Work on Merge
Zacarella, Emiliano, Meyer Lars, Makuuchi Michiru, and Friederici, Angela D. ‘‘Building by Syntax: the Neural Basis of Minimal Linguistic Structures’’. Cerebral Cortex 27, no.1 (January 2017): 411-421
Zaccarella, E., & Friederici, A. D. (2017). The neurobiological nature of syntactic hierarchies. Neuroscience & Biobehavioral Reviews, 81(Pt B), 205–212. https://doi.org/10.1016/j.neubiorev.2016.07.038
Zaccarella, E., Schell, M., & Friederici, A. D. (2017). Reviewing the functional basis of the syntactic Merge mechanism for language: A coordinate-based activation likelihood estimation meta-analysis. Neuroscience & Biobehavioral Reviews, 80, 646–656. https://doi.org/10.1016/j.neubiorev.2017.06.011
Language Acquisition in Humans
Babineau, M., Barbir, M., de Carvalho, A., et al. (2024). Syntactic bootstrapping as a mechanism for language learning.Nature Reviews Psychology, 3, 463–474.
Yang, C., Crain, S., Berwick, R. C., Chomsky, N., & Bolhuis, J. (2017). The growth of language: Universal Grammar, experience, and principles of computation. Neuroscience & Biobehavioral Reviews, 81, 103-119.
Work with Contrasts between Human Language and Machine Learning (though in a sense, this contrast can be gleaned from most of the stuff cited here)
Yang, C. (2016). The Price of Linguistic Productivity: How Children Learn to Break the Rules of Language. MIT Press.
Yang, C. (2018). A formalist perspective on language acquisition. Linguistic Approaches to Bilingualism, 8(6), 665–706.
Moro, A. (2014). The Boundaries of Babel: The Brain and the Enigma of Impossible Languages.
Cuskley, C., Woods, R., & Flaherty, M. (2024). The limitations of large language models for understanding human language and cognition. Open Mind, 8, 1058–1083. https://doi.org/10.1162/opmi_a_00160
Language and Evolution
Berwick, R. C., & Chomsky, N. (2016). Why Only Us: Language and Evolution.
Bolhuis, J. J., Tattersall, I., Chomsky, N., & Berwick, R. C. (2014). How could language have evolved? Trends in Cognitive Sciences, 18(9), 441–449.
The Semantics of Words (technical, but just to give an indication of how rich in work the field of syntax alone is)
Borer, H. (2005). Structuring Sense: Volume I: In Name Only. Oxford, UK: Oxford University Press.
Marantz, A. (1997). No escape from syntax: Don’t try morphological analysis in the privacy of your own lexicon. In A. Dimitriadis, L. Siegel, C. Surek-Clark, & A. Williams (Eds.),
Thank you reading Ai Without Minds! If this essay was useful or interesting, you’re very welcome to subscribe for future posts. You can also support my work by buying a paid subscription. Currently, I have no added benefits for a paid subscription and will keep my weekly posts free, as I value the accessibility of my work. Buying a paid subscription allows me to keep writing informed, rigorous and accessible writing for all. I post weekly on Wednesdays on topics in AI, the philosophy of mind, and their intersection. See this post for a starter guide to my work, if you are new here, or subscriber-curious
Photo Credit
Photo by Shawn Day on Unsplash
See, for example Zacarella, Emiliano, Meyer Lars, Makuuchi Michiru, and Friederici, Angela D. ‘‘Building by Syntax: the Neural Basis of Minimal Linguistic Structures’’. Cerebral Cortex 27, no.1 (January 2017): 411-421
See, for example, Blanco-Elorrieta, E., Kastner, I., Emmorey, K., & Pylkkänen, L. (2018). Shared neural correlates for building phrases in signed and spoken language. Scientific Reports, 8, 5492. https://doi.org/10.1038/s41598-018-23915-0 ; Lillo-Martin, D., & Gajewski, J. (2014). One grammar or two? Sign languages and the nature of human language. Wiley Interdisciplinary Reviews: Cognitive Science, 5(4), 387–401. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4084854/;
Here, a ‘syntactic object’ does not mean the same thing as it does in computer programming (a data structure). My understanding is that currently in the research, it’s considered a fundamental unit of some kind, but it’s an open question what it really is.
Chomsky explains these ideas in very simple terms in talks he gives to general audiences, which I am using here to explain his work. The examples are his. See Noam Chomsky, On Generation of Thought, posted February 17th 2021 by Association of Indian Research Scholars, YouTube, 1 hour 42 mins 25 seconds, 47 mins 21 seconds - 48 mins 18 seconds.
See, Andrea Moro, Impossible Languages (Cambridge Massachusetts: The MIT Press, 2016), 28.
Trafton, A. (2020, December 15). To the brain, reading computer code is not the same as reading language. MIT News, Massachusetts Institute of Technology. https://news.mit.edu/2020/brain-reading-computer-code-1215
My sense is that, regardless of whether you agree the details of Chomsky’s view on language, denying the existence of theoretical linguistics/universal grammar/ generative grammar, is like denying the existence of neuroscience. It’s not a serious claim.


Q: Do LLMs have language ???
A: No.
Then a philosopher adds about 3000 words 😂😂😂
Really enjoyed this — especially the clarity around language as structure rather than vocabulary, and the distinction between language as a biological system versus language use.
That resonates strongly.
One thing I’d add, purely from lived experience rather than theory: growing up exposed to multiple European languages, I’ve often found that comprehension arrives before words — through rhythm, cadence, structure, and sound. Even without formal fluency, the “shape” of what’s being said often makes sense.
That’s always made language feel less like a dictionary and more like a pattern-recognition faculty — something embodied and orienting, not just symbolic. In that sense, your framing of language as deeper than surface expression rings very true to me.