Valid or invalid? Of Internet “science,” standards and browsers.

Good programs do not contain spelling errors or have grammatical mistakes. I think this is probably a result of fractal attention to detail; in great programs things are correct at all levels, down to the periods at the ends of sentences in comments (Dyer 2009).

In January of 2008, the Metadata Analysis and Mining Application, or MAMA, an online non-profit long involved with creating reportable and scalable data about the Internet, produced a study detailing the results of a large scale automated test to find out how much of the World Wide Web (from here on the web) adhered to “correct” and “proper” coding of web pages. The author of the study, a Web developer himself, noted that although internet languages have logical and scientifically-exact syntax, contrary to the rational ideas that one expects from such structures, “authors do the darnedest things. Real-world Web pages are often convoluted, complex, and messy, not insular or sanitary” (Wilson 2008).

It was with bittersweet astonishment, then, that the study found that only 4% of the Web conformed to its own scientific standards of intelligibility – or “validated,” in the vernacular – although it all ran and daily supported billions of users. The website detailing the findings wrestled with these puzzling numbers, which I find worthy to quote in their entirety (Wilson 2008):
• Many sites are built upon CMSes [Content Management Systems] that do not spit out standards-compliant markup on the front end—it is nigh-on impossible to get these sites to validate.
• Many sites are put up on the Web by hobbyists, who do not care about Web standards—they just want to get their “look at my kittens” site on the Web by any means necessary.
• Many sites these days feature user-generated content (think of any blog and social networking site); even if you make your blog validate, it can still easily be invalidated by a site visitor submitting a comment featuring bad markup.
• A lot of developers don’t care about validation—their site works for the target audience they are aiming it at, and they get paid regardless of standards compliance

Although this particular study is by now a bit outdated, and the languages it augured are now superseded by newer ones, this particular finding has always intrigued me since I encountered it, and I intend in this paper to examine it in the spirit of anthropological intervention; if we take anthropology’s ability to problematize the endeavors of so-called foundational sciences like medicine and statistics, then we must point this spirit too toward an investigation of computer science, and especially the internet. What at first appears to be scientific – the development and acceptance of formalized languages to convey information and aesthetics – becomes a site of contention, where individual authors coalesce around local knowledges and certain practices become more effective than others.

In doing this, I take inspiration, though not necessarily the same conclusions, from studies of the development of standards in the laboratory, such as Joseph Dumit’s Picturing the Mind, Rayna Rapp’s Testing Women, Testing the Fetus, and especially Ludwik Fleck’s cogent early analyses in Genesis and Development of a Scientific Fact. Fleck outlines the evolution of the testing – and thus the underlying meaning of – syphilis, from its religious origins to its cemented scientific basis elaborated by the Wassermann test. For Fleck, making something scientific involves much more than the foundational objectivity its practitioners claim. For him, rather, the formation of “thought collectives,” wherein “cognition modifies the knower so as to adapt him harmoniously … to his acquired knowledge” (Fleck 1979: 86), are integral to development of scientific fact. “The [scientific] phenomenon is obviously social in character, and has important theoretical consequences,” writes Fleck, and it is only when the vacedum, or impersonal, knowledge is confirmed in everyday popular knowledge when “the fact becomes incarnated as an immediately perceptible object of reality” (Fleck 1979: 124).

Many years before Thomas Kuhn re-realized it in Structure of Scientific Revolutions, Fleck understood that the construction of syphilis as a fact was actually a spontaneous confluence of many factors that combined to make it appear as if it was a fact. Had scientists “observed” syphilis differently, had the journals evoked it differently, and had access to its testability been distributed differently, syphilis would have been understood as a totally different and perhaps nonscientific phenomenon. Though these types of observations, I suggest, undergird the rich anthropology of science that inspires this paper, they are sorely lacking in official computer science literature.

Interestingly, but not directly related to my project, formal computer science literature does anticipate a rather different branch of the anthropology of science, invoked in Haraway’s Cyborg Manifesto and other less canonical works like Tom Boelstorff’s ethnography of Second Life or Stefan Helmreich’s work on hypertext kinship . Many computer science texts, like these anthropologies, suggest reconceptualizations of not only computer-human interaction, but more fully the binary of the human and non-human. Alan Turing’s original “thinking machine” is posited alongside artificial intelligence programs in a curiously reversed evolutionary teleology of modernity and progress; where the Cartesian division constructed these divisions as steps toward rational society, the computer worlds that are related to it attempt to rebridge the divide, arguing through metaphors of atoms and neurons that humans differ from computers only on orders of magnitude.

The actual topic of this paper, however, is a more prosaic one: What makes a scientific form – in this case, the internet and its languages – appear real and yet irreal? And how does the web’s messiness and inherently collaborative epistemology allow a working, changing problematization of the coalesence of scientific standards? How, in essence, could the web produce an “indigenous” critique of the creation of scientific language?

In this paper I use language somewhat delicately, acknowledging my definition of it as being grounded in a somewhat archaic definition of language as a signifier/signified system. A Piercian reframing of language introduces an interpretant, or third component, to language and indeed may be better understood as a way to think about computer languages. Through a Piercian lens one notes the location of language as a product of social structures that configure it and produce the limits of its representability, allowing us to speak fruitfully about computer language without reifying it as a language with the same rules and forms. Nevertheless, I stick in this paper to a more traditional definition of language for two key reasons:
1) Ethnographically, the word is quite often use to communicate between informants and participants, and as such has value in the description of social systems without trying to interpret or translate them. A person will speak about markup languages and the English language, as well as “body language,” using the same referents. I felt a critique of this particular usage was outside the scope of this paper and more appropriate for a linguistic, rather than cursory, examination of the world of Web languages.
2) Using the word language allows us to think about its function as a vibrant form of communication that important is forgiving. My paper asks why the development of a language was so tolerant of mistakes, ruptures and failures, and why a body tasked with developing standards assumed everyone would be uninterested in using them, or misuse them. My ethnography shows the variety of opinions about this, and points to the questions of expertise, local usage, pidgin, and diglossia. Insofar as linguistic ideas like this are utilizable in the analysis of markup languages, this paper follows those trajectories, acknowledging the insufficiency of this traditional usage of language.

The more I began to investigate this project, the more I realized how large a scope it entailed, and felt lost in the barrage. Thus, below I locate the MAMA Web compliance findings as the latest in a long line of self-reflexivity about standards on the Web, place it in the context of anthropologies of science, and offer some brief comments on the phenomena of programming for the Web, as opposed to other applications. I introduce a necessarily cursory overview of the internet to situate the reader and choose an entry point into questions of standards and knowledge through a meditation on the expertise of the W3C, a standards and compliance body, and the people on the Web who criticize and work with the intractable problems of representation and language. I conclude with some research offerings for the future, and reflections on genealogies of Web use.

Priming the System
At their base, computers work through a series of logic gates corresponding to off and on. A problem is passed to the code interpreter; if it solves one way (true, for example), the computer acts one way. If it solves for the opposite, naturally something else occurs. At this level, computers communicate in base 0 – a binary code consisting of 0s and 1s that correspond to the “on” and “off” states. If a more complicated operation is needed, a sophisticated layering of this basic on/off state is applied to the computer, articulating steps that can number in the trillions. It is obviously outside the scope of this paper to explain these machinations, but everything relating to computer use, from complicated space launches to internet banking is based, through many levels of abstraction, on these very basic on/off switches, “rational” and essentially irreducible scientific facts about representation.

The internet, parasitic on these abstracted layers of logic gates, is a changing network that offers data stored on a series of connected computers through various protocols. These protocols format information differently depending on the needs of the service. For instance, a weather station may need information delivered about time, date, and radar coordinates, whereas a news service demands a completely different set of variables. To offer these over one protocol would not only be time consuming, but could often lead to interpretative problems. Hence, various protocols were invented to serve these different needs. Although these protocols deliver information differently, they are at, a base level, still operating in binary – the only ultimate deconstruction computers accept.

For the purposes of our study, the only protocol that is important is the HTTP protocol, or hypertext transfer protocol. Invented specifically to deliver interconnected web pages and their “links” (hence the prominence of hypertext), it sends data encoded in a number of languages that are in turn decoded by the browser and displayed on a computer screen. These languages are HTML (or hypertext markup language) and CSS (cascading style sheets). Together, these two languages give both the semantic content necessary to view a web page and its accompanying style. For instance, the HTML, or the semantic markup, may indicate that a page contains headers and footers, whereas the CSS will tell the browser to display the header as a gray bar with white text, and the footer as an orange bar with loud, blinking blue text. Hence structure and display, and meaning are theoretically delinked.

Admittedly an absurd simplification, I have only provided the above to ground the reader in the abstract and cascading levels of interpretation that occur simultaneously to provide what is colloquially understood as the web-viewing experience. If we are to look at the disparity between scientific reality and the practical realities of internet language, then, we must begin with an analysis of the point of contact with the user. This in turn is only useful, however, if we could freeze the experience of all these levels and networks in one place. How would an anthropologist approach this?

Here, Bruno Latour and Marilyn Strathern’s configuration of hybrids become very useful; abstract disjunctures that are, following Strathern’s work, places well situated to pry open networks. In a prescient piece detailing her struggle to find ways to of speaking about seemingly infinite networks of which we are all a part, she writes that to “cut the network” in order to submit it to ethnographic intervention, one must follow the stops as well as the circuits of the network. That is, her ethnography of the A’rare Melanesian concepts of ancestor money do the work of “addressing the problem of potentially endless networks” (Strathern 1996: 529) that would trace out forever. Strathern is grappling with a question not distinct to the anthropology of technology, but at least intrinsic to it: If all social lives are part human and part otherwise, how does one point the ethnographic flashlight on science, which fashions itself as all-encompassing of the human experience? To put it another way, I suggest, detailing where flows of knowledge-production ultimately fail, rather than succeed, and are stamped firmly by human action, is where the anthropologist makes her intervention.

These are the methods Bruno Latour follows in his many investigations of science, from France’s cultural history of Pasteurization to his larger analyses of Western modernity in We Have Never Been Modern. Indeed, these are methods that many anthropologists of science have used in order to turn the gaze back on reified sciences of production, including Theodore Porter on accounting practices and Steven Schapin on the laboratory. I underscore: where things are actually practiced is where science convinces itself of its, following Fleck, objective reality. But in order to become objective reality, its practitioners must also believe in it and must continue to contribute, and to take from, its vacedum knowledge.

The organizational body that invented the CSS and HTML languages (and continues to improve them) is the W3C. Composed of many interlocking working groups, this peri-official, but governmentally-recognized body of experts promotes standards for the languages of the Web, each iteration following many angles of deliberation and democratic voting. Practically, the W3C is not an “enforcing” body; it simply proposes the standards and urges Web developers, and Web browser programmers, to follow them. The W3C has on its own website an area for “validation,” or checking that the language in which a site programmed conforms to its own standards. Web site developers submit their site for an instantaneous outline of their conformity to standards.

The standards take the form of many vernaculars and genres. For instance, a tag to display images on the Web might take the form of:

Although all browsers, from the original Mosaic to the latest Google Chrome, Internet Explorer 8, and other emergent browsers, would display the image, this tag – or language statement – fails compliance in at least two ways: First, the image tag displays an image but not the mandatory “alt-text,” which is a semantic marker for browsers with images disabled (or for those with disability technologies) to see what that picture means. Second, the image tag does not have a closing character to indicate technologically that the command to display the image is now over. Hence, the proper image tag would be:

In defining these kinds of standards then, the developers of the markup language for the Web are arguing for the combination of very different, yet practically related facets: One, a language has a proper and an improper usage in relation to the computer and software using it (the closing character is vital). However, it has a very important function in meaning as well, and thus one must communicate to a blind person what an image should convey to a sighted user. Meaning and practice are combined with structure in what defines a language on the Web.

In doing so, the developers of the language highlight an interesting conundrum in defining a particular type of representation as an objective scientific language: they realize that language is a situated knowledge, open to interpretation based on the receiver (the Web browser, such as Firefox or Internet Explorer) and the sender (the Web developer, or the standards developer). This is a difficult concept to wrestle with, but I believe it is because of this that Web languages present such a poignant critique that a scientific standard is anything but; as I mentioned above, for every browser one could think of, the first example of an image tag above would display perfectly correctly. Future browsers, necessarily encumbered by websites written for older browsers, would in turn support these tags. Here, language is encrusted with old colloquialisms, mannerisms and personalities.

Thus, I could write an entire website relying on bad tags and improper syntax, completely stripped of its semantical context (meaning somebody without the latest technologies would see a completely different site) and have it still adopted and viewed by millions of people. This site would fail every kind of validation, would not be considered a scientifically “correct” site, and yet it would be just as usable and perhaps even more popular than one coded according to standards.

How do we account for these radical ruptures? One must be grateful for the anthropology of science, because earlier anthropological accounts of such aporia would be wholly incomplete. Imagine explaining these differences in terms of Geertzian “cultural texts” or even earlier models of bounded networks where structure is endemic, and passed along to others, without caring for meaning. The anthropology of science can point us to this fundamental contradiction that actually is an example of a noncontradiction: There is a scientific way of writing this language and nobody follows it. And yet everyone is aware of it. Levi-Strauss’ sorcerers faintly scent the air…

In order to ask why standards are never followed, I decided to perform some ethnography on online boards that dealt with standards. One message board, Metafilter, chronicled web developers’ reactions to the MAMA news of validation fail. Below I quote some responses at length from the board (Anonymous 2008); analysis follows.

Either the whole point of having a “standard” is so that there is a baseline of functionality across the browsers or the whole point is to follow the rules Because They Are The Rules. Most of the people who build websites professionally can’t pitch the second option to clients. There has to be a benefit to choosing the option that requires more work and has more restrictions on the final product … they care that their customers can all see and use the site; they care about Google ranking; they care about functionality. If validation takes more time and effort and restricts what can be done design-or-function-wise but offers no benefit to the client’s bottom line, then the client won’t pay for it.

My web 2.1 … site is so terribly invalidated, I’m surprised that it works. I use lots of custom attributes to make the lightbox take in all sorts of metadata. Plus, I’m lazy. If it works in FF3 and IE7, why even bother trying to get validated? I have to do some amazing hacks for transparent PNG’s in IE, and I’m completely lost trying to get z-index to work in IE to make an pop up menu without more slow javascript. Front end web coding is a losing battle, I don’t know how people do that as full time work. It PAINS me whenever I am forced to do it.

It’s very tough to make a site that works on Internet Explorer, Firefox, and Safari. Validation is like ensuring your site works on a 4th browser that no one uses because it doesn’t exist.

I’m surprised that the percentage of valid websites is as high as 5% – as mentioned, it could be even higher, were it not for generated content breaking a number of pages…. But really, where’s the shock? Most people are barely competent, in almost every field [sic]. This is especially true in web development, where everyone considers themselves a creator, or at least a contributor. It’s made somewhat better, at least in theory, by content management tools and design, if most of those tools weren’t half-assed to begin with… everyone believes they can make a web page. Don’t get me wrong: I’m all for open access. But standards have a purpose. They might be arcane, they might be fussy, they might be a pain to implement, but they serve a very logical and ultimately very useful purpose.

Some common themes that emerge from these (and other) responses are a rejection of the expertise of the W3C. Their particular access and ability to speak about what constitutes a good coding language is problematized by users. In general, ideas of expertise are completely deformalized in discussions of standards. Actors that are competent in programming, able to translate W3C standards, and able to communicate these practices to paying clients are of big concern to these informants. More generally, operators see the coding practices as rigid and constrictive and unforgiving of the real world incompatibilities that emerge in the movement from code to visual aesthetic.

Compellingly, though, the languages themselves are forgiving! As one of my impromptu informants mentioned above, often developers are amazed that their site “just works.” One wonders about the development, self-consciously, of a language that is so forgiving it was intended to fail. One must here reconsider the position of the W3C group and their initial design of Web languages. What kind of discussions prompted them to allow breakage as valid and part of the system – and simultaneously a rupture with the system?

Also interestingly, political economic considerations of browser dominance run through conversations on code. Programmers seem to have an almost uniform hatred of Microsoft’s Internet Explorer, a browser included with every version of Windows that was once subject to a monopoly lawsuit (many European countries now prohibit it’s bundling with the default installation of the newest Windows, Windows7). Developers complain the browser does not render their code correctly, necessitating hours more work and fixes (“hacks”) to work as they intended. One commenter responding to the notion of browser-specific coding languages involving separate codes for different platforms and browsers as “of the biggest favors Microsoft ever gave web developers,” commented that Microsoft makes his “head explode.” Indeed, this intrepid anthropologist could not find one instance in many online ethnographic searches where a favorable comment about Microsoft was made.

Recently, Microsoft itself went so far as to suggest that Internet Explorer 6, the most hated of all online web browsers, should be retired, sparking a round of applause from board as diverse as Lifehacker, Reddit, StackOverflow and CNET. I will quote a few of the responses here

Notice the problems you speak of mostly center around Internet Explorer and THAT is the problem, not anything to do with HTML5 or any other browser. This is the reason IE, in all its forms, must die (drhowarddrfine 2009, from Lifehacker)

That only highlights half of the problems developing for IE6. The other half is that it has never even supported basic web standards and is riddled with bugs. In my own business, this half of the problem alone, doubles the cost and time invested in developing even the most basic web sites. This is a cost that has to be passed on to the customers…In the end, everybody loses. (Bluecommons 2009, from CNET)

Questions of how to circumvent Microsoft’s domination, then, and how to code for many browsers at once, permeate all discussion. Everyone is seeking a “solution” to the problems they encounter, and hierarchies of solutions appear with powerful figures who are best able to successfully carry solutions gaining in status. As new iterations of CSS/HTML standards and Web browsers emerge, then, these figures are well-positioned to offer new critiques and opinions about the standards.

Here, too, we may turn to Strathern’s ideas of the network stop to investigate how something seemingly innocuous like a computer language, officially scientific, becomes a matter of many networks, including Microsoft’s desire of a monopoly over Web users. Because, as these complaining developers grudgingly note, one is often forced to actually write bad code just to make it function. Thus, good code and objective becomes a function of many other interests and actors, and not just of objective science. Strathern notes that “interests, social or personal, that invite extension also truncate it, and hybrids that appear able to mix anything can serve as boundaries to claims” (Strathern 1996: 531).

When a review of web standards declares them to be the “Three Circles of Hell” (Holzschlag 2008), too, one is compelled to investigate how it is that a standard can be so evasive and yet so psychologically and discursively dominating. Again, I highlight this area as fertile ground for an anthropological intervention, calling on diverse but related anthropology of science works on nuclear testing such as Joe Masco’s Nuclear Borderlands, Stefan Helmreich’s Alien Ocean, and related ideas to understand that scientific discourse has traction outside of its domain, and to follow on Dumit’s questions of a scientist’s responsibility outside the realm of the laboratory.

The level of debate and doubt over these standards, however, fascinate me to no end. In “Three Circles of Hell,” designer Molly Holzschlag reviews many of the critiques I have touched on above. At length, she repeats the “problem of human actors” that “create ‘open standards’ by ideal, not necessarily fact,” lamenting that the Web standards enacted by the W3C “[are] sophisticated and interesting sometimes, it goes against the heart of what we came here to build in the first place: an accessible, interoperable web for all.” The comments on the article, all available on the same page, reiterated the observations in the Metafilter post I quoted above. Many lamented the advent of proprietary applications “shoved down our throats” by companies like Microsoft and Adobe, whose popular Flash application creates animations and easy user interfaces without any thought or regard for other Web standards – often making other pages on the site fail review.

Overall, reading this material prompts our understandings of new combinations of meaning and form as they intersect with the foundational ideas of science. I have tried in this very brief paper to theorize a new approach to understanding meaning and language as they relate to scientific discourse, utilizing Marilyn Strathern’s science studies approach to the network stop .

Science and Technology Studies has long engaged the discourses of foundational sciences in order to uncover this discursive bias and power structures that inhere within. Lacking, however, has been insight into new systems where science and meaning, language and power manifest themselves reflexively. Computer science and the even newer Web programming systems that separate content, semantic meaning and aesthetic/representation have, I suggest, erupted at a post-modern moment where the critique of science has been internalized into the languages themselves.

In this new scientific world turned inside-out, bodies of knowledge production like the W3C may incorporate questions of meaning and political economy into their design. Multiple ways to represent, ask questions and communicate with the user that rely on very specific components and corporate and other interventions. One wonders if a feature of all systems of technology developed today is their impermeability against critiques of science, or put differently, a tacit acknowledgment of their social construction and structures and an invitation for improvement. Thus, we rejoinder, what is it about the Web’s science that allows it to be so forgiving even though it is so overwhelmingly popular that in every endeavor from bringing the so-called $1 laptop “to Africa” to jamming a web browser onto every mobile phone? Why is it that aesthetic, visual representation and the coding of artistry becomes a space where different opinions are fought out, where market leaders may encroach their own way and where small open-source fighters can retaliate by presenting “solutions” to problems?
Works Cited

2008 “Standards Fail,” in MetaFilter. Accessed Dec 1, 2009 from

Boyd, Danah
2008 “None of the This is Real: Identity and Participation in Friendster,” in Structures of Participation in Digital Culture.
Bernal, Victoria
2005 Eritrea on-Line: Diaspora, Cyberspace, and the Public Sphere. American Ethnologist 32(4):660-675.

2009 Comment on “Microsoft actively urges IE6 users to upgrade,” in CNET. Accessed Dec 1, 2009 from

Boellstorff, Tom
2008 Coming of Age in Second Life.Princeton University Press.

Dumit, Joseph
2004 Picturing Personhood.Princeton University Press.
2009 Comment on “How HTML5 Will Change the Way You Use the Web,” on Lifehacker. Acessed Dec 3, 2009 from

Dyer, Landon
2009 “30 Years of C,” in DadHacker. Accessed Nov 28, 2009 from
Fleck, Ludwik, Thaddeus J. Trenn, and Robert K. Merton
1981 Genesis and Development of a Scientific Fact.University of Chicago Press.

Helmreich, Stefan
2001 Kinship in Hypertext: Transubstantiating Fatherhood and Information Flow in Artificial Life. In Sarah Franklin and Susan McKinnon, eds. Duke University Press.
Holzschlag, Molly
2008 “Standards Fail,” on A List Apart. Accessed Dec 1, 2009 from
Rapp, Rayna
1999 Testing Women, Testing the Fetus.Routledge.

Reed, Adam
2008 ‘Blog this’: Surfing the Metropolis and the Method of London. Journal of the Royal Anthropological Institute 14(2):391-406.
Strathern, Marilyn
1996 Cutting the Network. The Journal of the Royal Anthropological Institute 2(3):535, 517-535, 517.

Wilson, Brian
2008. “MAMA: Markup validation report.” Accessed Dec 3, 2009 from

2008 “MAMA.” Accessed Dec 3, 2009 from