Itâ€™s Always Been About Big Data

Source: Itâ€™s Always Been About Big Data

In 1668 the British Royal Society undertook its first official publication, An Essay Towards a Real Character, and a Philosophical Language, penned by the Society’s first secretary, John Wilkins. Wilkins’ ambitious work proposed to create a universal language, intelligible to all nations and peoples, as a means of facilitating and accelerating the production of knowledge. The bulk of the multi-volume work consists of an extended taxonomy of the world, in which Wilkins attempts to understand and categorize everything, literally, as a means of creating a universally navigable system of linguistic representation.

Wilkins_Taxonomy

Wilkins’ universal taxonomy may seem to have little connection to the problem of big data in the digital humanities; but, Wilkins stood at the same representational event horizon as does the digital humanist trying to structure, in one form another, a world of endless, semantically related data. There has, in fact, never been anything other than big data. The invention of â€œDataâ€ as a form of binary stored information, represents an ontological lie that pretends that some set of information can have meaning outside of its connection with the universe of signification.

Because in computing’s infancy we could only physically store a limited number of ones and zeros in either physical or working memory, we began to think of each cluster of ones and zeros, each file, as having a discrete existence of its own, that was somehow separate and different from the rest of our discursive universe. We allowed the limitation of the machine to lull us into the belief that the â€œblack boxâ€ held some magical power to create independent, contained realities.

The ubiquity of the network revealed the fallacy of this dream, both functionally and theoretically. The moment most of us started carrying the network around in our pockets, we became immediately dissatisfied with stand-alone applications, data-stores, and personalities. The desire to connect my map, my journal, and my address book not only to each other but also to your map, your journal, and your address book demanded we give up the illusion of a stand-alone discursive universe and recognize that these ones and zeros are simply one of many languages we use to write ourselves.

And so, we find ourselves yet again on the brink of Wilkins’ dilemmaâ€”that of defining an Ur language capable of semantically unifying the complete discursive universe. A common problem lies at the heart of all engagement with data. All data manipulations short of a dadaist artificial intelligence rest on the edge of an ontological razor. You cannot visualize, link, search, or browse any set of data without first somehow structuring said data according to some discriminating system that says, at a minimum, â€œThis is like that, and that is like this!â€ And before we can say this, we must first agree on the very boundaries of the this.

In order to link, order, or display data based upon dates, for example, we must first have an idea of date. In order to link, or, display data by place, we must first have an idea of place. In order to link, order, or display data based upon anything other than a totally random, meaningless presentation of data, we must first have ordered our universe such that we recognize the existence of meaningful categories of existence based upon which we can discriminate and, hence, understand the data.

We thus find ourselves standing at the same ontological brink as did John Wilkins in 1668. The volume of our digital discourse is such that we can no longer pretend that data exists or has meaning that is somehow separate from the entirety of our discursive realities. And we can no longer pretend that the basic problem we faceâ€”that of structuring this universeâ€”is fundamentally different from that of our predecessors. Digital tools certainly increase the speed with which we can test new discursive formulations and map links between those composed in different languages. But, in the final estimation, the very discrimination that constitutes the boundary between points of datum in the data is already pre-determined by our ontological history.

It is, of course, the dream of â€œbig dataâ€ that the very presence of numerically staggering data pools will spontaneously offer the very solution to this problem, as we can rely on the data itself to reveal its own taxonomies. But we find ourselves here on the equally slippery grounds of apophenia and religiosityâ€”of patterns that have no meaning or, alternatively, that have one and only one meaning. We find ourselves, as it were, once again on the doorstep of the Enlightenment.

Facebook Tweet LinkedIn

Search

Itâ€™s Always Been About Big Data

DataLab is Hiring!

Women in Data Science 2023 Datathon and Conference

UC Love Data Week 2023

RESCHEDULED – Computational Pedagogy Fall Meetup

CA 2022 Election Data Challenge Public Symposium

DataLab Launches New Micro-Credential with GradPathways

Hood Canal Landscape Assessment and Prioritization Tool (HC-LAP): A Web Mapping Application to Support Conservation Efforts in Hood Canal, Washington

Winter 2023 Course Announcement: Adventures in Data Science (Quarter 1)

Call for 2022-2023 DataLab Affiliates

Call for Proposals: Pilot Translation and Clinical Studies Program

New RLC: the Davis Julia Users Group

Upcoming Intro Workshops: Python & R

2022 CITRIS Seed Funding Applications Open

Call for Applications – Translational Health Data Science Fellowship Program 2022-2023

WOMEN’S REPRODUCTIVE RIGHTS ARE A FUNDAMENTAL HUMAN RIGHT

NIH Common Fund Hackathon

HackDavis Returns April 16-17

Disease Bioportal Project Awarded Phase II NSF Grant

Recordings now available from 2022 UC Love Data Week

MPA Project Receives New Funding

Related Posts