Perhaps coming up with a theory of information and its processing is a bit like building a transcontinental railway. You can start in the east, trying to understand how agents can process anything, and head west. Or you can start in the west, with trying to understand what information is, and then head east. One hopes that these tracks will meet.

– Jon Barwise

You have to start somewhere. I’m going to start in the west.

Any attempt to engineer more powerful human-information interfaces must first grapple with the nature of information itself: its structural and semantic characteristics and how it comes to mean what it does to a user. I don’t claim any unique insights here, but the topic bears discussion because there is, of course, a lot of fuzziness and confusion surrounding terms like “data” and “information” (take a look at the Wikipedia definitions of Information and Data if you have any doubt). Before proceeding further I want to establish a clear, workable definition of the term “information” to both guide my discussion and and clarify my reasoning. Fortunately, we already have Top Men working on this. And by Top Men I mean, of course, philosophers.

The Philosophy of Information (PI) is an emerging philosophical discipline concerned with the conceptual nature of information and the principles that govern its composition and use. In contrast with Information Theory, which addresses the representation and communication of patterns in a generic sense, PI attempts to address the nature of the semantic content of information.One useful notion that has emerged from PI is a generally accepted definition of “information” as data + meaning. This characterization is referred to as the General Definition of Information (GDI) and is formally expressed as follows (Floridi, 2010):


GDI) σ is an instance of information, understood as semantic content, if and only if:
GDI.1) σ consists of n data, for n ≥ 1;
GDI.2) the data are well formed;
GDI.3) the well-formed data are meaningful.

GDI, then, is a data-based definition of information, where a datum is defined (after MacKay (1969), later Bateson) as “a distinction that makes a difference.” In other words, an instance of data captures some lack of uniformity in the world: a temperature measurement, a date, maybe a proper name that serves to distinguish one person from another. From these examples it can be seen that a datum has no inherent value of its own. Rather, a datum is a relational concept, a characteristic captured in the slogan “data are relata.”

GDI.1 states that an instance of information, σ (sometimes referred to as an infon), is defined as a “bundle” of one or more data instances. GDI.2 says that these data bundles are organized according to the rules of a particular system- they are syntactically structured, in a broad (not strictly linguistic) sense. And GDI.3 states that the well-formed data also comply with the meanings of the chosen system. Without getting too caught up in the symbol grounding problem, GDI.3 says that the data have associated, preestablished semantics. When data are well-formed and meaningful, they are said to have semantic content.

In practical terms, data semantics are commonly established via the notion of data types, with data instances typically represented (either implicitly or explicitly) as type-value pairs. Note that data types and data values must both have preestablished meanings to qualify as semantic content. The notion of type also extends to the infon level, an idea captured in the concepts of information schemas and doctypes. GDI doesn’t apply only to lexical (text-based) information, however. Graphical information representations (for example maps and statistical graphics) also have associated structure and semantics, and qualify as information under this definition.

It is this predefined syntax and semantics that gives information its power and utility. Without structural regularity (well-formedness) we reduce or eliminate our ability to to derive value by relating information instances to one another. Without well-defined semantics we introduce ambiguity, reducing our ability to develop shared understandings and to link what data tells us back to features of the real world. It is certainly possible to computationally represent data in formless and arbitrary ways, but to do so limits its ability to usefully represent the phenomena it purports to represent.

To summarize, the term “data” refers to the atomic components of information, the individual distinctions in (actual or simulated) reality we chose to pay attention to. “Information” consists of packages of multiple distinctions or dimensions that collectively capture enough about a phenomenon to be useful. Further, information is structured and defined for some purpose or purposes. Its value to a user is some function of how well the data captures key aspects of the phenomena, how uniformly structured (e.g., “clean”) it is, how clearly defined its semantics are, and how well it matches the user’s specific needs.

One last comment, information (per the GDI) comes in two main varieties, factual and instructional. Factual information describes or models a certain state of affairs. Instructional information describes the steps needed to bring about a state of affairs. Factual (but not instructional) information can be either true or false. For the most part, when I talk about information in the context of human-information interfaces I mean factual information that truthfully represents actual or possible states of affairs in reality. The ultimate goal is to develop new tools for understanding and acting on information about the real world.

[zotpress items=”3GMPDBZB,35GETMC6,38JFCA8H” style=”association-for-computing-machinery” sort=”ASC”]