Wednesday, April 08, 2009

The Age of Digital Citation

Peer-to-peer technologies are working to unlock one of the most secretly-guarded rituals of academic citizenship, one that in former times was the most expensive to procure and the most costly to transfer: that is to say, knowledge of the canon itself.

The canon has to be mastered in a process of slow reading and even slower surfing of footnotes that occupies the first three to five years of graduate study leading up to qualifying exams and a dissertation prospectus. Even finding out what the canon is remains part of the work, eased in certain places by official departmental reading lists and historiographical classes, but finally a matter of reading and mastering the minutia of the scholarly apparatus.

Finding the canon in history, for instance, means careful reading of acknowledgements sections and footnotes correlated with cv’s, finding out who worked with whom, which texts appear with frequency, and which are dismissed. Finding the canon in comparative literature is frequently a matter of reading notes from Lacanian seminars in 1960s Paris, deducing from reported conversations the subtext that actually mattered to scholars.

All of these processes depend upon having the time to follow professors, to track them down in office hours, to pay attention to which conversations they listened to, to abstract one’s own canon from the masses. It was never enough to self-train; it was hardly enough merely to read, and visiting bookshops was a way to error rather than fruition. The canon has, until now, been secret, and it has been a matter of personal socialization to even find out what the important names were. And all of this suddenly promises to fold. Google Scholar counts citations and delivers the one true text on the transport revolution cited by scholar after scholar, or the new groundbreaking text that rocketed to a favorite within the last ten years.

The citation databases create new canons, established by numbers. Numbers have power. Sooner or later, it’s nearly inevitable now, that those numbers will begin to influence hiring decisions. Woe betide the uncited book or ignored article: relevance to disciplinary discourse can be counted and numbered. Scholarship has entered the age of the citation database.

Such highways create residual suburbs on the periphery of common activity. Journals to exclusive or small or specialist to go online, such as Cabinet, which depends on orders of back editions for part of its revenue, upload none of their articles by humanities rockstars, be they ever so bright as Wolfgang Schivelbusch or Marina Warner. Blog entries from para-academic scholars such as Geoff Manaugh of BLDGBLOG, or podcasts by David Harvey, despite their circulation, will show up in Google stats but never Scholar; it will who up on Zotero if users put it there, but part of its cache is being known to only a small group of thinkers. One finds these scholarly suburbs by knowing the right people, by following the right idea. Knowing about Cabinet means dedication to a discourse.

There are two levels of citation, two ways of knowing then: the official, the common highway, the established canon of knowledge, now finally unlocked for all. The eighteen-year-old high-school drop-out in Cleveland can learn about the industrial revolution on his own, navigating straight to the top texts if he chooses. Alongside that canon, another and more mysterious one is forming: the secret canon of para-academic, interdisciplinary know-how. The former uses Scholar and Zotero, the tools of the trade. The latter leaves traces on Delicious and Twitter, the tools of public intellectuals. Scholars following the breath of the new will want to have exposure to both.

Interdisciplinary Canons, New Fields

We can look around the curve of time to further consequences of this unlocking of canons. The proximate arenas of affect are in interdisciplinarity and the establishment of new fields.

The age of digital citation also makes possible a new age of rampant interdisciplinarity: searching for the origins of urban prisons in the nineteenth-century launches the historian into the abundant writing from literature scholars on the same subject. One no longer has to visit the art history department to develop a second field in art history; the list of innovative new texts is in easy grasp. While traditional scholars ignore other fields for the sake of expediency, the easy grasp of interdisciplinary knowledge makes ignoring it merely irresponsible.

Open canons also imply the more rapid establishment of new fields: though scholars had been writing serious studies of the city since Henri Pirenne, it took an operator like Arthur Schlesinger to establish urban history as a field. A Harvard professor could produce a generation of graduate students, a flood of scholarship, a conference, and finally a journal. For most of the twentieth century, such were the criteria necessary to generate a legitimate subfield where most departments hire and teach today.

Navigating the Information Glut

Such interdisciplinary plenitude foreshadows an age of information glut. Even as I write, I too cringe, already worn out from a morning preparing a nineteenth-century cities lecture, lured outside my historical canon by the ready availability of literature scholars’ studies of early detective fiction.

The temptation to meander is a serious one. I could waste hours there, thinking about the difficulty of finding information for the urban police, and the way those searches find their way into the middle-class fascination with Sherlock Holmes; a historical problematic of information glut not unlike my own. The task looks impossible, though, and for the moment I’ve simply avoided the other canon. I’ll concentrate on historians’ accounts of police, and leave their literary imaginal for another day. Here’s the gist of the problem: much like those urban subjects, today’s researchers have the problem of knowing which categories of information are relevant.

A second temptation is to decline responsibility altogether. Clumsy navigation of information results in a glut of citations that don’t actually reflect their user’s experience. That happens in the last efficient way now whenever a scholar cites relevant articles in a footnote without reading them, guessing from title or first page alone their content. Scholars show their finesse at navigating digital canons by citing only the works essential to their argument. A smaller list of citations frequently demonstrates real mastery.

Reducing the canon effectively is always a matter of outsourcing responsibility. Traditionally, the scholar relies upon the advisor for setting the bounds of discovery, the questions of debate, even the nature of inquiry. Derrida gives us the image of Socrates prodding Plato with his stylus to get him going; Socrates comes up with the questions and Plato does all the work. In traditional graduate departments, the student is somewhat relieved of this relationship by the possibility of multiple members of a committee. Barbara Johnson, Derrida’s feminist pupil, gives another image: Moliere’s Agnes in The School for Wives, who gains her freedom by having two teachers and choosing her own path between them.

The age of digital citation raises the possibility of hacking through the canon with other prosthetics than the human teacher. Each of them has their own limits and rewards.

Crowdsourced citation, in its most blunt form, creates simple accounts of which texts are most read. In the world of tagging, however, readers assign labels to a text or passage. Tagclouds rank the labels used by a particular group of users by frequency. A set of crowdsourced labels produces a folksonomy, or the set of terms of greatest interest to that particular set of users. Individual folk publics can emerge, each of them generating their own set of terms. Each advisor and her graduate students can communicate, it seems, in a common language of labels applied to the texts the commonly read. One could sort the entire western canon for texts labeled “governmentality” by students of Patrick Joyce. The crowdsourcing is hypothetically open: our drop-out in Cleveland can theoretically acquaint himself with the canon interpreted by Patrick Joyce and followers by searching Zotero for “governmentality.” He can theoretically contribute his own readings from Mayhew.

The generation of new terms in a folksonomy is organic, as well. For another scholar to highlight another term to the tagcloud, they need only to begin abundantly tagging themselves. A body of sympathetic users who adopt a new term can grow and find each other. In the age of digital citation, subfields have the chance to emerge in a new way. They emerge with less certainty and coherence, to be sure, than those directed by graduate advisors, but they emerge nonetheless. Landscape Studies, so long on the periphery of a dozen canons, perhaps only has a chance in the age of digital citation.
To whom lies innovation in such a setting? To the advisor, to be sure, who launches a generation of students tagging the world through a new taxonomy; but also to the innovator, who dives into the established canon, passionately splicing the world according to a new set of values: leaving behind a trail of texts for a Marxist reading of the eighteenth century or a landscapey reading of the nineteenth.

And here the problem of originality reemerges. For these alternative taxonomies to be persuasive, they much seem relevant to other taxonomists. They must not seem redundant, the mirror of so many Patrick Joyces or T. S. Ashtons who have looked at the literature beforehand.

Another means of sorting through the noise of the digital canon is to outsource the reading to artificial intelligence. A program such as Devonthink, taught by a user to group together the readings and excerpts for a single undergraduate survey, can learn that texts that mention Britain, Adam Smith, and the 1750s belong together. It can even browse JSTOR for new passages of immediate relevance to the topic, excerpt them, and highlight some of the most important words that seem to appear for frequency. The scholar still has to read: but the machine performs the work of the research assistant, diving into the archives and coming out with particular passages neatly marked.

Working with such an apparatus creates the problem of an echo chamber. If you liked this, you will also like something like it. How does the scholar find an alternative telling? Where lies innovation? The answer to this question is probably the same as it has always been in academia: one does something innovative by mastering the canon and looking outside of it. In the age of digital citation, the canon is easier to find than ever, which means that the economy of time can spare more room for reading beyond the canon in search of fresher ideas. The healthy scholar will employ digital prosthetics towards mastering established canons, leaving more energy to spare for creative praxis.

It is an age for the flourishing of scholars who have the time to read deeply and the energy to think outside of the canon. This is what’s scary about it: to keep up in the age of digital citation, scholars will have to master a series of intellectual prostheses – tagging circles, artificial reading bots, quick skimming – that will help them navigate through the masses of texts. The age of digital citation will punish scholars who merely reduplicate the canons of their mentors. This is what’s exciting about it: you no longer have to go to a university to find out what books are on the canon.

The age of digital citation is almost guaranteed to produce a phalanx of interdisciplinary thinkers, skilled synthesists, capable of putting together the big picture from a variety of micro-fields and offering new perspectives on the whole. It will be to the credit of the rest of us if we can accept them.

Labels: academia, academics, citations, crowdsourcing, folksonomy, googlescholar, innovation, scholarship, tagclouds, zotero

Saturday, February 07, 2009

Reinventing the Academic Journal: First, Take Down Your Website

The web is thirsty for efficient, effective ways of retrieving useful information about the state of the field. This pressure creates an enormous market for those instruments that help individuals locate authoritative discourses and situated scholarship, and this, of course, is one of the traditional roles of the academic journal.

Academic Journals are in the course of rethinking their management, methods, and publication standards. This year saw major panels at the AHA (American Historical Association) and MLA (Modern Language association), largely through the leadership of the Council of Editors of Learned Journals.

If they face this transition with courage and ingenuity, journals have the opportunity to plant themselves firmly as pillars of professional utility, scholarly collaboration, and authoritative knowledge as a public utility. Much of it may require thinking in terms of shifting communities and the life of information, and shifting sharply away from current journals' dependence on issue-by-issue websites and pdf-servers like jstor. If you're a journal editor, the first step in a shift away may indeed be so radical as taking down your website, sharing information in new ways even more deeply integrated with the flow of information on web 2.0.

I list here four major headings for the consideration of those trying to adapt academic publication to a web 2.0 world.

1) Journals must pursue interoperability with the other online tools that are shaping the techne of scholarly practice.

Web 2.0 requires public visibility and interoperability with other web tools, in order that a searching aid should be found, adopted, and rendered relevant to the new research paradigms being adopted by scholars and members of the public alike. The more journals fit themselves into this paradigm, the better they'll thrive in the new order, finding readers both academic and para-academic as allies. They will function usefully as finding-aids for the most relevant, expert material in their disciplines.

In going web 2.0, journals have the ability to mesh their publications with tools that will allow readers to better integrate journal essays with the rest of their research. A scholar using zotero and jstor can download the article pdf and the citation, ready for use in footnote. Web 2.0 journals will go further into this zone: a scholar using zotero, jstor, google scholar, and delicious can instantaneously find other scholars' opinions of a particular article, the names of the disciplines and sub-disciplines they think it applies to best, and other articles of similar note to that particular scholar.

1.a) With these tools, every published article becomes easily interfaced with the tools new scholars are using to sort their data.

For example:, if you look at http://delicious.com/bibliparis4/revues you will find some sources of reviews recommended by the French librarian who holds that account. When I'm signed into delicious as joguldi, I have the option to save any of these citations from the list into my own account. Each visitor can refashion their own micro-reading-list from their colleagues' reading-lists, cutting and pasting collective knowledge into an individual canon suited to their own project.

1.b) The promise of resilience: continued relevance to changing research patterns.

The web 2.0 journal will encourage this kind of interface, working within technologies for co-tagging, sharing lists, and making-one's-own-list. In so doing, the web 2.0 journal will become intimately interfaced with scholars' processes of research, reading, and writing, remaining an indispensable part of scholarship in the next era of research. They will avoid the possible irrelevance to reading processes, subdisciplinary conversations on mailing lists/delicious/twitter, and other forms of scholarly information-sharing that are coming to predominate in the life of the digital scholar.

1.c) The need for permanence.

Web 2.0 journals must insure that some copy of whatever material they publish is backed up for posterity. They may rely upon a public, collaborative site such as archive.org for those purposes.

1.d) Real interoperability.

It is strongly desirable to use a public, widely-adopted instrument such as delicious or librarything, already equipped with full tagging, user interoperability, and visibility before the public, rather than one of the new, unstable, invite-only micro-communities for information sharing like academiacommons or scribd.

2) Journals have opportunity to reframe their role in the academy as curators of the noise of the web.

Dream Scenario: The Web 2.0 Journal as a web bastion of curatorial authority.

The web suffers from a crisis of authority which is being met on the individual, rather than the collective and disciplinary level. For questions of disciplinary fields, for example, wikipedia is likely to be irrelevant and useless. Far more useful, from my point of view, have been peer-to-peer exchanges on delicious.com, librarything, and twitter, where colleagues in proximate fields have openly shared their course reading material, current research, and private canons.

In these sharing sites, individuals tag interesting citations with a series of terms most relevantly useful to their own practice. Users are less concerned with the interoperability of those selected terms than with the project of generating as many accurate, natural-language keywords as possible (see "folksonomy" entry in Wikipedia). The collected mass of these tags becomes an ultimate subject catalog to all the possible subject headings that might apply to any given website. Particular individual users become peculiar sources of authority for a given subject heading (for example, http://delicious.com/bibliparis4/, an expert archivist at the Université Paris-Sorbonne, is an authority on the best online archives, especially in the Francophone world.

Journals have the opportunity to weave themselves as crucial threads in the fabric of online conversations if they begin tagging, becoming collective repositories of the best, collectively-ratified articles and citations available for download on the web.

In a world where the primary tools for finding new scholarship are tagged, social databases like delicious and librarything, the most efficient form of journal interface with the world might be a for journals to scrap their websites and become collective, tagging entities.

2.a) The advantage of having an official canon of online material ratified by editors.

In the world of the traditional print journal, scholars vied to get a Journal of Modern History citation on their vita because it stands for something. What if there was a http://delicious.com/victorianstudies and http://delicious.com/journalofmodernhistory?

Such a stream of official citations could come to stand in for the private account of a collective recognized for setting a standard in the field, providing much the same function as the old print citation in terms of scholarly participation and professional standing. Being collected in those entries could still stand for the product of collective vetting among recognized scholars, standing out in the same way that my more famous colleague Danah Boyd's collection, http://delicious.com/zephoria, is better-read than my own (http://delicious.com/joguldi).

2.b) The editorial voice.

It might seem that if the Journal of Modern History disbanded its website in favor of a delicious stream, much would be lost: for instance, the editor's voice. Not necessarily. Perhaps invited keynote editorials might deserve a special tag, setting them apart from other tags; perhaps certain articles in the JMH tagging stream would also be tagged "featured article" or "special edition."

Consider: the editor-in-chief of The Journal of British Studies for 2009-2011 has a blog, which she has maintained since 2007 and keeps writing through 2020. For the years 2009-2011, the blog entries which she writes that pertain to the field of British Studies and are ratified by the rest of the Board become tagged "editor-in-chief" on http://delicious.com/journalofbritishstudies. The researcher who searches "2009" and "editor-in-chief" under that stream will find that subset of her articles, or they can search "editor-in-chief" for the full download of editorials for JBS.

2.c) The freedoms of web 2.0 journal operation.

Web 2.0 journals that take their primary responsibility as curatorial have no need for official publication from the university press system. They are not dependent on the income model of the university press, and they have no reason to collect subscriptions: their purpose is disciplinary service and public access. There is no reason for the articles published in this format to be made private, or to require elaborate fee-charging mechanisms.

3) Electronic journals will have the opportunity to expand their curatorial mandate include different forms of publication.

3a) Past the essay model.

The traditional journal collects and publishes only three sorts of essays: the editorial, the peer-reviewed essay of new research in 15-50 pages, and the book review. There is nothing platonic about these forms: they evolved from the culture of eighteenth-century coffee-house journals, reviewing the books in circulation, and the canonization of eighteenth-century essayists like Addison and Steele in the English curriculum of higher education at the end of the nineteenth century. They are considered the template for developing a reasoned, supported argument, and so the metric for measuring the ability to research, argue, and write.

3b) Broader forms of inclusion.

The traditional canon of essays, editorials, and book reviews has excluded much of other forms of scholarship, the circulation of whose best models are of value to the scholarly community, including: syllabi, subject division lists for qualifying exams, lectures, paragraph-sized notes/queries, lists of relevant new electronic tools, reviews of electronic tools, reports on best methods in the archives, lectures, and blog-sized opinions about exciting new directions for the field. An electronic journal has no reason to exclude a twenty-minute audio segment, a selection of maps shared on Slideshare.net, or a video segment of a conference paper shared on Youtube. Properly curated, any of these categories would be of immense disciplinary interest, worthy of collection in a journal stream.

3c) Against exclusive publication.

It is contrary to utility, in the world of web 2.0, to maintain exclusive publication rights on an article. Exclusivity of publication places a text in only one domain. Yet non-exclusive text gets reproduced and recopied, circulated around the internet, and rapidly floats onward to mimetic influence in other cultures, excerpted and referenced. For every web 2.0 author, non-exclusivity and easy republication is ideal. For every would-be-idea-of-influence in the age of web 2.0, easy reduplication is crucial.

Exclusivity has been the format followed by most online journals, which seek to mimic in form the traditional journal: one essay, neatly formatted, looking as professional as possible. Exclusive re-publication suggests the old model of authority, and is superficially reassuring to editors without actually promoting the real functions of the journal: disseminating ideas and establishing the authority of the journal-as-canon and disciplinary metric.

Significantly more desirable would be setting a different precedent: for all disseminated forms of the text to advertise the article's accreditation as having been curated by inclusion in the journal-as-stream. (the text might end with, for instance, "please recirculate with this citation: by-Professor-Bonnie-Wheeler, SMU, 2009; officially tagged in 'Arthuriana,' [link] May 2010") Advertising the link between article and journal in many reproduced/cross-referenced copies would function both to the benefit of the article and the prestige of the journal.

Again, if the dissemination model is followed, the journal homepage need not include reprints of the articles themselves: merely links to the original blogspace or university-housed-pdf or slideshow where the material was originally posted, with all of its links, illustrations, video, and wallpaper as the author originally presented it. The journal's role is reduced to curation, not to presentaiton. Not having a use for a graphic designer, typesetter, or illustrations layout person, the journal's workflow will be considerably reduced.

4) Broadening the criteria for participation.

Another major question opened by the age of the electronic journal is the issue of expertise. Like the essay, the journal peer-review process is the relic of another age: an age of abundant, unbegrudging emeriti with plentiful leisure to foster the development of younger peers who had, on average, three years of training by way of a PhD. The limited number of peer-reviewers and editors responsible for the operation of the journal at any given time, is the relic of the system limited by the expense of the US Post Office, the limited social networks of the people who invented the system, and the era of fewer PhD's on the world scene. In a new era, many of the burdens of editing and curation can be more broadly distributed to both the aid of the editors and the thriving of the discipline itself.

4a) Benefiting from a wider array of input.

In the age of web 2.0, journals have the opportunity to reconsider the distribution of time and responsibility. Is peer review a top-down mentoring process for scaling up the academic ladder, or will it be reconceived as an open playing-field (a sort of open seminar for peer review rather than a two-vetted-readers-read-you)? With the aid of wikis, it becomes possible for a single text to be usefully edited by hundreds of individuals, vetting their understanding of significance, authentic fact, and argument flow. For young scholars, accreted small suggestions of other citations, references, examples, and counterexamples, from a wider array of supporters, could conceivably enhance an article on multiple levels.

4b) The opportunity to expand disciplinary boundaries.

In web 2.0 collaboration, the thinking of interdisciplinary members of the broader academy might be usefully invited. The pressure of other ideas could hypothetically encourage the discipline to take account of the findings of related sub-disciplines (invited participation from scholars in postcolonial studies for Victorian Studies issues on empire), the concerns of related fields (are economists convinced by new findings in economic history?), and the legibility of argument to the public (does this ground-breaking, relevant article on tyranny and empire actually parse to the average reader of the NYT?)

4c) The reconsideration of timelines.

In the age of web 2.0, it is also possible for a writer to continuously revise an argument over an extended period of time, even indefinitely. For the sake of scholars' multiple projects, an indefinitely revised work is probably not ideal, but extended revisions, over the course of a year, become possible and useful for the author and the discipline. An article could be published as "officially under review" in a sub-category of the journal stream, subjected to gradual wiki conversation for a year, and remain available to a reading public for the entirety of that time.

The product that would emerge at the end of a year of wiki-ratification would be very different than that at the beginning. If the author failed, in the course of wiki revision, to produce a stronger article than at the beginning, the article could be removed from the journal stream at the end of the year.

4d) Indefinite projects.

An exception to the rule against indefinite revisions might be the case of a collectively-authored, introductory textbook (editions #33-150 of Arnstein's Introduction to British History could easily be collectively rewritten over the course of 20 years by a team of collaborators). Similarly, the journal might include a wiki article on "the state of the discipline" that was collectively revised by the journal's readership, year after year, to consider the best collective knowledge of subjects of inquiry.

(I've had the honor of being in conversation with Bonnie Wheeler of CELJ, and I want to express my gratitude here for being invited into the conversation. Editors of academic journals have been the heroes of professional support processes like peer review for a long time, and they have a brave future ahead of them, whatever course they take.)

Labels: academia, aha, authority, celj, david weinberger, delicious, experts, information, journals, mla, peer review, publishing, search, web 2.0, wikipedia

Tuesday, March 27, 2007

New Tools in Old Disciplines: Working Magic with Google Books, cont'd

The earlier post about Google Books on this site is creating quite a buzz among librarian and historian communities online -- partly because famed tech blogger Tim O'Reilly reported my having "dissed" the experience of libraries for virtual research, partly because Google Books is so hot, and partly because the image of libraries disappearing for computers raises hackles among academics everywhere.

The fears start flying. Will historians neglect the skills of traditional research because they've discovered the internet? Probably not. We spend years training in arcane research methods, and we make our names by doing something new, which even today generally means finding some measure of unknown documents in the archive. Will the material archives disappear? That's a fear, because any time a university can cut funds, it will, and then, as Rick Prelinger can testify, entire corpuses of periodicals and log books from the eighteenth century are jetissoned in the dumpster. Are internet archives going to be exhaustive? Definitely not, and in no case is every last spare bit of paper -- the forms, the doodles, the enormous maps -- getting scanned. Some of the fears are legitimate, and some of the fears are false. All give evidence of a rapidly changing world.

The real excitement around tools like Google Books is the possibility of applying new tools that are now simply not available with the other kind of text. The word-count and documentation databases I mentioned are now only a dream -- Google's caution with copyright laws puts them out of the realm of possibility for the moment. But should those become possible, they will open up a realm of research possibilities that are now only experimental in the humanities.

To give but one example, it is now possible in the text-searchable, online Oxford English Dictionary to find all words with "road" or "walking" in the definition that had their origin between 1810 and 1840. I discovered a variety of pieces of slang pertaining specifically to the way people walk down the new streets -- suggesting that they were parading, performing, acting in some way so new to the culture that an entire vocabulary had to be invented to explain what they were doing. By traditional methods, most of these would never have turned up; they're too far apart in occurance, we tend to focus on polemic rather than slang texts, and the shift would have escaped me. This data from the OED is now a major piece of evidence in one of my chapters, allowing me to advance conclusions I would not have been able to make before.

Similar searches on the Dictionary of National Biography have allowed me to perform acrobatics with the networks of different professionals in the 1780s, people like artisans and innkeepers who rarely turn up in traditional historiography, about whom the data is scarce. These professions make brief appearances in the DNB, and by tracing the lives of a hundred innkeepers in the 1780s, patterns of politics, religious belief, and marriage emerge that suggest that innkeepers, with their access to horses and carriages and strangers, were among the best-connected and most political people in the nation. We are only beginning to see what this kind of research can do.

Doing this sort of number crunching on texts yields amazing results. In the future, historians will demand access to the full text of Google Books for exactly this reason. If Google doesn't provide it, many of its competitors -- including the Internet Archive -- may. So a fertile world of sorting searches is ahead of us.

The rosiest scenario includes tech geeks and academic researchers teeming up to talk about framing the search queries. The raw text in the Dictionary of National Biography, for example, has no fields except the entry for "name" and "years." I have to sort through myself to find the number of children, the profession, the religion, the political beliefs, and the books he wrote. But sorting this kind of material against each other in searches is immensely powerful. Did Quakers have more children than Catholics? I don't know, but the archive does. And if the DNB has too few variables to be the right resource for this sort of search, a variety of local archives and court records around Britain are now going online, with exactly that potential. These include the entire proceedings of the Old Bailey court in London, 1674-1834; the census, 1801-1964; the British Parliamentary Papers, 1688-1905, and the LSE Booth Archive (maps of poverty in the 1860s). More often than not, historians like myself with no technical background are in charge of creating the data fields and search algorithms. We rarely find what we want, because we don't know how to use the technology to get what we want. The marriage of technocrats and historians could be a happy one.

Right now, none of these archives talk to each other, none allow tagging or comments from researchers, and those that have tried to provide fields or tags have done so by hand, over years, at immense expense with little to show. To this sort of labor, some open databases provide a vista of solutions. GoogleBase and Freebase are the two important ones now. In open databases, even small archives can contribute the raw data from their holdings, and anyone -- from the genealogist to the professional historian to the computer scientist looking for a Masters Thesis project -- can start putting together the evidence into interesting patterns, and then sharing those tools with others. Analytic tools like Swivel, Pipes, and DabbleDB can start finding patterns immediately. The miracles to come will happen when data starts talking to data, bubbling into new patterns yet undiscovered -- when we start getting entire life histories of shoemakers and Quaker populations out of the traces they left across a dozen government and local databases, when we start discovering shoemakers across vast swathes of England who knew each other and were talking, and when we start following the spread of religious or political ideas across those networks. We must believe that there are patterns locked in the data that are burning to get out, and we must apply all the tools we have to release them.

Google Books blew my socks off because it was able to contribute something new to my research after I had already circled the world for this information, pillaging a variety of specialty libraries, among them, Harvard's Dumbarton Oaks landscape collection, the Maps Collection and Center for British Art at Yale, the Royal Institute of British Architects Collection, the Victoria and Albert, the British Museum, the Cambridge libraries, and the Public Records Office. I was also going through whatever ILL could bring me through the well-organized mechanisms of the University of California. I've seen ephemera and political documents pertaining to the road that were never looked at by any of the thirty major historians who wrote about the road in the course of the twentieth century. It is utterly a delight, then, to encounter other books that did not turn up in my exhaustive ramble through the traditional methods. New tools in old disciplines can do us a world of magic.

Labels: academia, archive, books, google books, history, libraries, research

Thursday, March 22, 2007

How Delicious is Changing Academic Research

As of a recent post on Google Books and the research of History, our quiet little blog here on academic history, activism, and spirituality has suddenly gotten more notoriety than it's accustomed to. Hi world! Thanks for stopping by. To carry on with the thread of how information travels for academics, and what the 'net is doing, let's talk about another of my favorite sites for research, del.icio.us.

Delicious is the Rome, Jerusalem, and Paris of my existence as an academic these days. It's where I make my friends, how I get the news, and where I go to trade. All this from a little server that does nothing but share bookmarks in public.

Why? Two reasons it's cool. 1) It sorts things. 2) it makes them public.

1) it sorts things.

For two years I've been using Delicious as an information organizer. It's produced an impressive encyclopedia of the most interesting information, images, articles, citations, books, and subjects on the internet to which I might want to refer. Consider my dissertation tag, under which are a wide variety of online images and google books that I'll be using for my research. Not only can I come back to them, but I can also find related subjects -- dissertation material related to walking -- navigating seamlessly from one to another. As an improvement on the index card system, or on my own terrifying piles of articles (even now ornamenting my bookshelf), or even on the folders within folders within folders of word documents, this represents a definite improvement.

I've been building a taxonomy -- the way some people use wikis, the way my boyfriend uses that utterly cool personal software, "the brain;" the way my father uses his vertical file, the way my DC friends use their rolodexes -- so I sort out all the information I take in, annexing technology to memory, sorting factoids and spare threads and notable evidence in neat, interlocking piles where I can find information again, draw connections, and create new connections.

The result is a navigable taxonomy of my thoughts. If I want to find my stuff on the history of "walking," the taxonomy already knows that my material on walking is associated with other categories of knowledge which I've tagged nearby.

After a year of using delicious for my own bookmarks, helping other people find things becomes remarkably easy. Many of the link lists below are simply cut and paste over from delicious. Lists of citations for colleagues are cut and paste from delicious into email. The forty American history students I teach are instructed to go to my delicious page for writing help, research help, maps, and images relating to the class.

Second reason delicious is cool:

2) it makes things public.

Not only can you look at your own bookmarks, but you can also look at others'. When you find something noted to be queer and interesting, you can find out what other topics that same person thinks to be queer and interesting.

What's rapidly happening with these shared tags is academics finding each other in rapid numbers. I have some twenty people in my network, at least half of whom I've never met in real life. They include:

* Javier Arbona, a graduate student in Geography who's also at the University of California, Berkeley
* Travis Brown, a graduate student in literature
* LeahB, an editor at Cabinet Magazine, my favorite periodical
* bibliparis4, a librarian at one of the public universities in Paris

Each of these is another intellectual putting together rarified connections about strange pieces of thought somehow related to my world.

I found them because they were, like me, publicly tagging with some arcane tag that I also use. c19 -- the nineteenth century tag. vernacular -- a tag used by other people who work with ephemera.

Every morning, I log into my delicious network and read the links that my small army of admired, clever, canny, eccentric brains has put together for me.

What's more, I'm developing what I'd consider an actual working relationship with these other scholars. A few of them have added me to their own networks. Day to day, I watch their reactions to Bush, I get a sense of where their research is going, and they get a sense of mine. It's low-level, low-commitment hanging out with high levels of information exchange.

And this is something different than the social activity I know anywhere else on the internet.

Normally, if you want to meet people on the internet, the connections are typically time-limited and action-specific. You want a date, you want sex, you need a friend of a friend for networking in Argentina. You meet up online and then you meet in real life. Or you meet online at Myspace and then, unless you have a crush on the person, forget to ever go back again. But my scholars are folks I'm seeing on a regular basis in the course of my regular research. This is the nearest thing to running into someone else at the card catalog yet.

I don't check in with them. I don't have, nor do I really need, the capacity to send email to them. Some of them I may actually encounter at academic conferences later, and we'll share more of a bond, through our years of doing collaborative research, than many scholars who have labored through the years in adjoining offices.

As Hannah Arendt understood, the modern democratic state happened when people in public spaces began interacting, and thus began taking action together. For this reason, she identified the medival carnivals and fair days of Europe as the seat of literature, culture, debate, and politics. The rule goes like this: make a public, get action. Today, Delicious does for the internet what open-air markets did for medieval society. Low key, high-information, continuous-formation community building.

All hail the bookmark market.

Labels: academia, academics, bookmarking, delicious, dissertation, information, politics, research, social networking

Wednesday, March 14, 2007

How Google Books is Changing Academic History

Google Book Search is a relatively recent phenomenon... six months ago, right? About six months ago I was pottering around there, finding a few illustrated nineteenth-century texts, a lot of contemporary books for sale, and not much of too much interest.

Six months turns out to be a long time in book land. In that period of time, Book Search has accomplished enough to transform the academic profession.

I was idly trying a search on "roads" to see what sort of a literature would turn up for the period of my dissertation research, 1740-1850. I didn't expect much. I've spent the last two years wandering through the Yale, Harvard, and California libraries, the British Library, Britain's National Archives, and the immense reserves of North American Inter Library Loan reading every book on London, pavement, or travel I could get my hands on.

Surprise. In a single idle search I just added twenty extra full-text books to my list.

Which are, by the way, full-text searchable --

-- and subject to word-count analysis --

-- and replete with full illustrations --

-- and instantly digestable into visuals for powerpoint presentations.

Hallelujah, GoogleBooks. And holy mackerel! Good work.

By now, the first half of the nineteenth century exists in a very complete form on Google Books. In the last six months, while academic history has meandered in its habituated paths of grinding research, the possibilities of scholarship have been utterly transformed.

To give just one example, this little puppy -- Henry Parnell's A Treatise on Roads (1833) -- one of the key texts for my dissertation exists on our campus in Berkeley's transport library, a quaint but understaffed, spare room
hidden on the third floor of the engineering building, far, far away from where historians ever go. It wasn't actually on the shelf when I got there, so it took some patient emailing with the transport library librarians before the book was found, returned to the correct place, held at the desk for me, to be picked up during the library hours specific to that particular institution (10am-4pm, M-Fr). Wild with enthusiasm at having at last obtained it, I held the volume prisoner at my desk in San Francisco for six straight months, unruffled by overdue notices, until at last the plaintive emails from the circulation desk were too much for me to bear. Research in my world is very often a personal matter of haggling for more time with the particular librarian in question. They're used to us, and I figure they need a good struggle to keep them alert. But thanks to Google Book Search, these days of scavenger-hunt and tug-of-war are drawing to an end.

Time for a professional dialogue about the new kinds of research these texts have opened up. For a very vast vista has erupted before us, and with it, a more serious set of comparative questions as a standard for social history, and new levels of rigor to be expected from the individual researcher. No longer can historians afford to stay in the empty, lonely world of the weary scholar, pouring of close readings of dialogue. Time for all those structural analysis skills to come back in full force. Quantitative and open databases of word-count and thematic analyses. Open databases of pictures, tagged by keywords and available for classroom use.

What this signals, by the way, is the opportunity for a new age of scholarship. Cultural and image analysis used to be painfully time-consuming, heavy lifting, involving rare kinds of access, full fellowships, immense travel, and long waits for delicate books. Comparison between different cultural sources was even harder, placing absurd demands on the cultural historian's personal memory and note-taking skills. Cultural historians, despite their many skills, stood second in depth of research on any particular topic to political historians, for whom one visit to a Parliamentary archive and one visit to a personal residence outfitted them with every last detail of historical change. Now all that is changing. Comparing a hundred images is no longer a problem for a year's labor in an out-of-the-way museum reading room. Comparing a hundred personal accounts from working men is no longer a task to eat up a social historian's entire year.

I'm looking forward to seeing what the future holds. Any reports of historians currently putting together databases? Please post them here. In the meantime, check out this afternoon's dissertation links...