Home What is language really for?
Post
Cancel

What is language really for?

Rethinking the purpose of language

Language is often considered one of the distinctive cognitive abilities of complex organisms, particularly humans, who seem to exhibit unique forms of communication through it. With increasing interest in both machine learning and neuroscience, language has become an especially fascinating subject, as large language models (LLM) appear to demonstrate certain intellectual abilities. As someone who explores both fields, I find myself drawn to a fundamental question: what is language, and more specifically, what is its purpose? My recent engagement with Wittgenstein’s philosophy has provided useful perspectives and new ideas that challenge conventional views of language.

That said, before addressing this topic in greater detail, I want to make it very clear that we cannot define the purpose of language as having a singular function. Language, like other cognitive abilities, did not emerge with a specific objective in mind - it simply evolved. This essay, then, does not aim to pinpoint the singular purpose for language, but rather to explore how we can undertsand recent discoveries about language and intelligence through the lens of Wittgenstein’s philosophy.

Abstraction-first vs. Communication-first theory

The primary purpose of language is a topic of ongoing debate among scholars. Some argue that its core function is abstraction (abstraction-first), while others claim that its essence lies in its social role—communication (communication-first). To me, these perspectives seem to reflect Wittgenstein’s early and later philosophies on language, respectively.

The abstraction-first view finds support in theories such as Noam Chomsky’s Universal Grammar and Jerry Fodor’s Language of Thought hypothesis. These theories focus on the innate structure of language, emphasizing its role in organizing and representing the complexities of the world. In Wittgenstein’s early work, particularly his picture theory, language was viewed as having a one-to-one correspondence with the facts of reality. This resonates with the abstraction-first approach, where language is not just a tool for communication, but a system for representing the logical structure of the world.

On the other hand, Robin Dunbar’s social brain hypothesis supports the communication-first theory. This view sees language primarily as a tool for social coordination and group survival, essential for human interaction and collaboration. This aligns closely with Wittgenstein’s later philosophy, where he shifted his focus from representation to the idea that language is fundamentally social—a practice embedded in context and usage. Language here is not a mirror of reality but an activity, shaped by its role in human life and the interactions it enables.

Given that the primary purpose of language remains a subject of ongoing debate, it’s no surprise that Wittgenstein’s perspectives diverged, with his early and later works emphasizing different aspects of language’s nature and function.

The emergence and convergence of language representations

In the machine learning community, the term representation is often used to describe what some might call the “logical structure of the world.” Interestingly, many recent studies point to the idea of representation convergence across different models and even across different modalities. A well known paper by Bansal et al. highlights the plausibility of the “Anna Karenina” scenario, where all successful models seem to converge on the same internal representations. Dravid et al. introduced the concept of “Rosetta neurons” — units that encapsulate shared concepts across various models and training regimes. Their presence in vision models with diverse architectures suggests that certain visual concepts and structures are inherently present in the natural world.

Rosetta Neurons

Expanding on this idea, Huh et al. demonstrated that representation similarities not only span across different model architectures, but also across different data modalities. With evidence that the representations learned by language models align with those in vision models, the hypothesis of representation convergence is gaining traction.

Although the machine learning community’s primary focus has shifted towards understanding LLMs, one of the topics that had garnered interest from many scholars beforehand was how language emerged. Simulations of multi-agent learning environments have provided intriguing insights into this. Several studies, including those by Mordatch et al. and Vithanage et al., demonstrated that compositional language can emerge when linguistic and physical behaviors are treated together, particularly when vocabulary sizes are explicitly constrained to be small. This not only suggests that language (or something similar) can emerge even when not explicitly targeted, but also identifies some necessary (though not sufficient) conditions for language to arise.

As expressive language becomes a powerful tool for communication, what we abstract inevitably reflects what humans can mostly agree upon and care about in the world. In this way, communication shapes our language representations to align with the objective properties of reality. Each individual has unique experiences and distinct roles, but for language to carry meaning, those experiences must be abstracted into units that other beings can understand. This process naturally shifts the focus toward the more objective aspects of world experience, embedding generalizable concepts.

We build new systems by drawing on the compositional nature of language. Humans have distilled the common rules of the world we inhabit and agreed upon certain fundamental truths—our “always true” propositions, which form the axioms of our thought systems. These axioms laid the foundation for mathematics and logic, which in turn became the building blocks for other fields, including science, engineering, medicine, social sciences, and even music.

In short, the feature that language strives to capture is the logical structure inherently embedded in the ‘veridical world’, while its communicative role determines which aspects of that structure are most emphasized.

When words fall short: the struggle to share our inner worlds

Subjective sensations, on the other hand, may emerge due to the biological constraints shared by human beings. We, as humans, possess the same sensory organs and have similar body shapes and scales. As a result, our sensory experiences are tightly constrained, allowing for less variance. In my view, Wittgenstein’s concept of pain behavior corresponds to this idea. This commonality in our sensory experience—rooted in biological similarities—creates a shared framework that allows us to understand and communicate subjective experiences. This, in turn, is reflected and reinforced through our shared language, which acts as a systematic homomorphism for these sensations.

However, since each individual has their own unique experiences, their conceptual frameworks are also unique. To clarify, I imagine the conceptual framework as a continuous n-dimensional representation space that maps various entities and aspects of the world (inspired by Paul Churchland’s work). While some parts of these frameworks exhibit systematic homomorphism, language functions as an alignment tool, helping to converge individual conceptual spaces toward a common ground. In other words, the discrete points we verbalize through language serve as reference markers, enabling different individuals’ conceptual frameworks to align as closely as possible. By sharing parts of these maps, we can communicate and ‘point’ to the same entities. Without these reference points, meaningful communication would be impossible, as Churchland puts it:

Finger tips without maps are empty; maps without fingertips are blind.

From this perspective, we cannot discuss things that cannot be mapped into discrete words, nor those that fall outside the shared conceptual framework. Yet, when we communicate, the “fingertips” of our understanding are expressed through language, which is inherently discrete. This limitation makes it impossible to fully capture the continuous nature of our conceptual spaces—a phenomenon I refer to as the curse of abstraction.

One intriguing example of the limitations and evolution of sensory language is the recent development of coffee’s sensory evaluation system. Until the late 2010s, as far as I know, sensory descriptions in coffee mostly relied on taste notes. However, these taste notes have significant limitations in communication between people. For instance, when we say a coffee has a “peach” flavor, we don’t mean it contains peach, but rather that it has a nuance reminiscent of peach. This “language game,” used among baristas and coffee enthusiasts, works well within the community but is challenging for newbies to grasp.

Recently, however, coffee sensory evaluation evolved with the introduction of a color-coded system—the coffee taster’s flavor wheel—modeled after the color wheel. I find this to be a brilliant innovation. While taste notes using familiar flavors are intuitive, they often lead to literal interpretations that can confuse people unfamiliar with the nuanced language of coffee. By borrowing from a different sensory modality, color, which has a more continuous nature, the new system allows for a more refined mapping of coffee flavors. Although each color still corresponds to some tasting notes, these serve merely as reference points, helping to map flavors onto the color wheel. This shift allows for a more continuous representation of taste, moving beyond the limitations of discrete flavor descriptors.

grid points

Wittgenstein used the analogy that nonsense propositions—such as those in ethics—are like points that fall outside the grid, such as points C and D. This corresponds to the “no fingertip” scenario in Churchland’s argument, where we have no way to point to or reference these propositions. However, I suspect this isn’t the only case. Some nonsense propositions may not just be outside the grid but could also be “outside” the commonly shared conceptual framework—where there is no shared map for others to follow.

Wittgenstein famously argued that we should remain silent about things we cannot speak of. At the same time, he distinguished between science, which he saw as the study of phenomena, and philosophy, which he regarded as the exploration of the possibilities of phenomena. Inspired by his view, I would suggest that even nonsense propositions deserve some consideration. Specifically, we should seek to understand whether such propositions fall between the discrete points of our conceptual framework or lie entirely outside of it. In the former case, there remains the potential to invent new linguistic compositions, refining our ability to describe these propositions with greater clarity and precision. However, in the latter case—where the proposition is entirely beyond the shared framework—perhaps silence is the only appropriate response, as Wittgenstein advised.

References

  1. 박병철 (2014). 비트겐슈타인 철학으로의 초대. 필로소픽.
  2. Bansal, Y., Nakkiran, P., & Barak, B. (2021). Revisiting model stitching to compare neural representations. Advances in neural information processing systems, 34, 225-236.
  3. Dravid, A., Gandelsman, Y., Efros, A. A., & Shocher, A. (2023). Rosetta neurons: Mining the common units in a model zoo. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1934-1943).
  4. Huh, M., Cheung, B., Wang, T., & Isola, P. (2024). The platonic representation hypothesis. arXiv preprint arXiv:2405.07987.
  5. Mordatch, I., & Abbeel, P. (2018, April). Emergence of grounded compositional language in multi-agent populations. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1).
  6. Vithanage, K., Wijesinghe, R., Xavier, A., Tissera, D., Jayasena, S., & Fernando, S. (2023). Accelerating language emergence by functional pressures. Plos one, 18(12), e0295748.
  7. Churchland, P. M. (2012). Plato’s camera: How the physical brain captures a landscape of abstract universals. MIT press.
This post is licensed under CC BY 4.0 by the author.