The Reluctant Digital Historian: February 2015

Monday, February 23, 2015

Week 5: Digital Archives and Full-Text Databases

General questions for consideration:

What is a database? In what ways is a full-text database parallel to an archive? In what ways is it different?
Are we talking about data or evidence or something else?
Generate list of questions you automatically ask yourself when picking up a book or looking at an archival collection. What is a comparable list for using a full-text database for research?

Kathryn Kish Sklar and Thomas Dublin, “Creating Meaning in a Sea of Information: The Women and Social Movements Web Site,” in Writing History in the Digital Age

What do they mean by “document project”? What is involved in producing one? How is it different from the kinds of research projects that historians usually conduct?
In what sense is it a database? In what sense is their site a journal?
Why did they join a contract with Alexander Street Press? What are the advantages and disadvantages of such a commercial arrangement? What provisions have they made against ASP’s disappearance?
Why do they combine primary sources and interpretive texts?
What form do the primary sources take in their database? Why don’t the documents appear in their original form?
What advantages are there to having so many primary sources digitized on a single site?
How did this project come to take on preservation as a mission?
Is this project a realization of the “recombinant documents” that Mills Kelly wrote about?
What happens to your interpretation of a document when it is extracted from its archival context?

(D2L) Nancy Chaffin Hunter, Kathleen Legg, and Beth Oehlerts, “Two Librarians, an Archivist, and 13,000 Images: Collaborating to Build a Digital Collection,” Library Quarterly 80(1) (2010): 81-109.

What is the University Historic Photograph Collection at Colorado State University?
Who created it, and how?
In what ways is digital browse better than file-cabinet browse? Are there disadvantages?
Can you make sense of the work flow visualization?
What do you learn about librarians and archivists from this article?
How does a library-science literature review differ from a historiography?
What is a metadata librarian?
Would you consider this project a digital history project? Why or why not?
In what sense is it a database? In what sense is it interpretive?

Charles Upchurch, “Full-Text Databases and Historical Research: Cautionary Results from a Ten-Year Study,” Journal of Social History 46 (1) (Fall 2012): 89-105.

What are the advantages of full-text databases? What are the disadvantages?
What do you need to know about the databases you are working with before you start to seriously analyze your data? Develop a list of questions. How would you find answers to these questions? How useful did Upchurch find it to ask the database publishers?
What do you learn from this article about how OCR works? What is “article zoning”? What is “fuzzy searching”?
What does this article teach us about research design?
What does it teach us about how to keep track of our own research processes?
What do you learn about use of keyword search from this article?
Under what kinds of research plans would you want to keep track of all the searches you conducted? What would be a good method for keeping track?

Cohen and Rosenzweig, chapter 3-4

Chapter 3, “Becoming Digital”

How do you know that it is worth it to conduct a digitization project? Should we just be digitizing everything? How can we set priorities?
What losses should you be cognizant of when you think about digitized sources?
What are the possible options for digitizing text that they describe?
How did the authors of the other articles we read for today go about answering the kinds of questions Cohen and Rosenzweig raise about what is worth doing and what is not worth doing?
Do you think it is better for scholars to annotate (mark up) documents for other people to use, or to work with full-text search? Are there reasons you might choose one rather than the other for one project, and then use the other for another project?
Laying OCR underneath a scanned image.
Why is typing sometimes better? I wonder if this is still true now that a decade has elapsed since this book’s publication.
When thinking about digitizing images, audio, or video, what qualities do you need to consider that you would not bother with for text?
What considerations should you keep in mind about whether to contract out the digitization?
If you “do the work yourself” is it really free? How could you account for the cost of doing it yourself?
How should you find out what standards to use now?

Chapter 4, “Designing for the History Web”

What elements of website design do you consider essential?
How important is visual appeal for the project your group is developing?
To what extent is it important to make design choices for your grant application project?
Will you chose a URL?

Monday, February 16, 2015

Week 4: There is a lot of information out there for historians to work with

Roy Rosenzweig, “Scarcity or Abundance: Preserving the Past in a Digital Era.” American Historical Review 108 (3) (2003): 735-762.

What digital aspects of your life alone disappear?
What are the difficulties of preserving digital primary sources?
How do you know when you have done enough research? Have you ever faced a situation where you thought you had found and examined all the relevant primary sources?
Rosenzweig argues that we face a future task of writing history in a world in which there are too many records for us to cope with, disappearing evidence, and a broadened audience.
Are these technical problems, or should we historians truly be concerned?
Why are digital documents vulnerable?
Blurring and merging of professional responsibilities. Historians, archives, and museums. Who should be responsible for keeping the machines needed to read old digital primary sources?
Is it important to read (or at least store) digital primary sources in their original format, or would physical copies suffice?
How do copyright and ownership issues enter the picture of preserving digital sources?
Why have historians been ignoring the problems of preserving digital sources?
What are the advantages and disadvantages of letting commercial enterprises have control of archiving?
How might the challenges outlined in this article shape the kind of historical writing that we will see over the next decades?

David Armitage and Jo Guldi, The History Manifesto, chapter 4:

Did you read this online or download it? What is Open Access?
What can you infer from the website hosting The History Manifesto about the authors’ goals and relationship to digital history? How successful do they seem at achieving those goals, based on their site? What opportunities for interacting with The History Manifesto do they make available?
How does it affect your reading experience to have the outline of the book always available in the left hand margin of the screen? How about the footnote functionality? What was your experience of “turning pages” and “turning sections”? Why isn’t there a “next section” button at the bottom of the page?
What do they mean by “machine-read”? Should we think of this activity as reading?
How do their inquiries fit particularly with their other scholarly focus on the longue durée?
What kinds of unfamiliar historical research approaches do they discuss? Can you imagine yourself needing to use any of them? Wanting to?
What is Paper Machines?
Panama Zotero group: again the blurring of historians, archivists, and librarians.
What is involved in visualizing text-based “data”? Is the visualization enough?
What does this sentence mean?: “Traditional research, limited by the sheer breadth of the non- digitised archive and the time necessary to sort through it, becomes easily shackled to histories of institutions and actors in power, for instance characterising universal trends in the American empire from the Ford and Rockefeller Foundations’ investments in pesticides, as some historians have done.”
Examples of the “untapped sources of historical data”? Are those sources digitally available? How much work is involved in digitizing them so that historians can work with them in the ways Guldi and Armitage envision?
How are we ever going to keep track of this hyperabundance of information—and scholarly discussions of it—so that we can know what to go look at? How would you know to go looking for the Declassification Engine, for example? How can you prevent yourself from going out and laboriously duplicating someone else’s tech work?
“This enterprise points to the hunger in the private sector for experts who understand time – on either the short durée or the long.”
Does this chapter represent a demonstrated claim or an extended assertion (or polemic)?
“In a world of mobility, the university’s long sense of historical traditions substitute for the long-term thinking that was the preserve of shamans, priests, and elders in another community.”
What special skills do historians bring to the discussion of big data? “The reading of temporally generated sequences of heterogeneous data is a historian’s speciality.”
If you “read” big data broadly, can you still know something deeply?
Are you persuaded by this: “Their training should evolve to entertain conversations about what makes a good longue durée narrative, about how the archival skills of the micro-historian can be combined with the overarching suggestions offered by the macroscope. In the era oflongue-durée tools, when experimenting across centuries becomes part of the toolkit of every graduate student, conversations about the appropriate audience and application of large-scale examinations of history may become part of the fabric of every History department.”
Do you see yourself in this?: “Historians may become tool-builders and tool-reviewers as well as tool-consumers and tool-teachers.”

Saturday, February 14, 2015

Prepping for class

One of my unsolved problems as an instructor of digital history is managing the readings that I assign. Not, of course (!) in the sense of getting the reading done. But because I have assigned almost all digital materials, I am doing my reading on the computer. Which, in this case, is the laptop I work with at home. I have not figured out how I as the instructor should manage the readings when we talk about them in class. I do make a practice of pulling them up on the class computer and monitor so that we can all turn to the same "page" if necessary. But I can't idly and subtly look through an article for a particular passage without making it patently obvious to the students that I am not attending wholly to what they are saying. I could possibly have the readings on a second laptop in class, but then I as an instructor will be sitting behind a wall of screens, dividing myself from my class.

I wonder how other professors deal with this problem.

Monday, February 9, 2015

Week 3

This is a key week in the class: we will divide into groups to work on semester-long projects. I have never put students into long-term groups before, so I am a little apprehensive about how to do it. Right now my intent is to let some (controlled) chaos reign and see if the students sort themselves into groups without help. If they need help, my plan is to have them engage in "speed dating," each talking to everyone else in the class for a fixed amount of time (e.g. two minutes) to test whether they will get along well enough to work together. Following the speed-rounds, students could submit to me a list of people they want to work with, and I could take a couple of minutes to organize the groups on that basis.

No one signed up to give a presentation this week, which might be just as well given the need to break into groups. Depending on how long it takes to organize into groups, I intend to take the students on a guided tour of the Encyclopedia of Milwaukee's successful grant application to the National Endowment for the Humanities.

Reading questions:

Kelly, Teaching History in the Digital Age

Chapter 3

What does he mean about books “reading each other”?
What does he mean by “recombinant documents”? Why might you want such things?
What kind of metadata do you take in about primary sources, almost unconsciously, when you start to work with them in traditional formats? How do you get this metadata from digital sources?
What kind of new questions are made possibly by the availability and searchability of digital primary sources? How would this availability change your plans for your own research projects? Would you still travel to archives to conduct research?
What is “text mining”? Why would you want to do this? Given the amount of information produced, how would you make sense of it? Is close reading obsolete?

Chapter 4

What history-writing skills do you think that you (and students) need in the 21^st century? Does the standard college essay format teach you those skills?

How do you feel about requiring students to write in public instead of just handing in their work to the professor to read? How should we take into consideration student privacy issues when putting their work in the public realm?

What do you think about the comparison to basketball: that we teach basketball by having students handle the ball from the outset; and we should have students “make” history right from the outset as well?

Are you familiar with the idea of the Dublin Core standards?

What do you think about designing (undergraduate) courses around skills and understanding rather than around content?

Cohen and Rosenzweig, chapters 1-2

Chapter 1:

19: why wouldn’t a history website be accepted as “academic venture”?

22: how Yahoo’s organization (“librarian’s touch of classification”) helped; how is history presently organized/accessible on the internet?

What does he mean by “deep web”? Why don’t searches pull up materials behind paywalls? What would you do if they did?

25: list of 5 main types of history websites: archives, secondary sources, teaching, discussion, organizational. Have these five categories blurred more or separated more since this book was published in 2005?

29: it is easy to see why amateur enthusiasts don’t care about provenance of primary documents. But why does provenance matter to scholars?

Why isn’t there a convention to italicize (or put in quotation marks) the titles of websites in use in this book?

Do you think blogs have successfully challenged the journal article?

44: why have libraries and archives taken to the web to expand their mission to teaching?

50: Is their charge to become familiar with how history is done on the web before getting started with your own project still realistic?

Chapter 2:

53: example of 20,000 documents stored in a database and displayed on the web only when called up. Why is this a sound strategy for presenting information digitally? What does it imply about how the contents have to be organized and stored?

Do you need to know HTML to do digital history? What do you need to know about HTML?

Do you agree that HTML is basically readable?

How do you look at the source code for a web page, as they recommend?

What kind of planning process for your digital project do they recommend?

Generate a list of questions you should be asking yourself as you plan your group projects.

59: In discussing the question of whether you need to learn to code, they make a comparison to reading Dante’s Inferno in English translation. Does this comparison work for you? Would you venture to produce scholarship on Dante if you read it only in English?

Why do they so routinely include the price of software in their discussion?

Monday, February 2, 2015

Week 2

This week class is held in the library, where a reference librarian will introduce students to the use of RefWorks. In the second part of class, we will talk more about possible collaborative final projects and then discuss the following assigned readings:

Andrew Abbott, Digital Paper: A Manual for Research and Writing with Library and Internet Materials (Chicago: The University of Chicago Press, 2014), chapter 4.

Abbott advises in this book that it does not need to be read in a linear fashion, so I have assigned selections from Digital Paper in both this class and my undergraduate history research methods class this semester. Part of my goal is to buy myself the time to read it for my own purposes, for my intuition tells me that this is a really important book about how to do research—as crucial as Anne Lamott’s Bird by Bird is for writing. I am an admirer of other work by Abbott, who appears to me to be the smartest contemporary scholar in the world. This chapter is not precisely about digital history, but it points to issues that graduate students in general should be thinking about as they approach seminar papers and their theses. From what I have read so far, I have to agree with this blog post, which calls for pretty much everyone to read it.

Discussion strategy: start with generally what he is telling us, and then move to what implications it might have for digital history.
What advice does Abbott give about library research? What are his major points about organizing a research project?
How would you implement the suggestions practically, using digital tools (acknowledging Abbott’s own preference for paper)?
What is the difference between scanning and browsing?

T. Mills Kelly, Teaching History in the Digital Age, Preface through chapter 2.

Preface

What do you think of his point that if students are engaged with what they are doing, they are probably learning better—even if it’s technology and not history that focuses them?

Introduction

Why did Kelly’s student feel free to mash-up a primary source? In what sense was it “better” than the original? Why would historians object to this practice?
Where do you stand on the “authenticity” vs. “originality” dichotomy he sets up?
Do you agree that lecture is the worst possible way to teach anything?
What does he mean by “remix culture”?

Chapter 1: Thinking

What distinction does Kelly draw between considering how best to teach history and how students learn history best?
What skills, facts, and ideas about history should we be trying to inculcate in students?
What is Kelly’s attitude toward students? Is this an attitude many history professors share?
What does he mean by “do history” and “make history”?

Chapter 2: Finding

Kelly opens with discussion of the difficulty of teaching Eastern European history to American students because of the language barrier—he was mostly limited to teaching primary sources that originally in English or were translated into English. Does translation include barriers that student readers should be aware of? Are those barriers in any way comparable to those we should consider when consuming primary sources made available online?
Do students actually wander freely around the internet, finding all sorts of historical sources without professorial encouragement?
What is “disintermediation,” which Kelly defines as “the removal of hierarchical controls over information in the digital realm”?
Kelly suggests that it is a mistake to expect 21^st century students to rely only on sources vetted by historians. Is it possible to reconcile this stance with Abbott’s clearly stated belief that not only should we rely on peer reviewed sources, but we should also depend on a hierarchy of prestige among university presses and academic journals? Are there other important differences in how Kelly and Abbott think about (student) research paths that we should be aware of?
What is the problem with the Adolf Hitler Historical Museum? (Is it still out there? I tried Kelly’s searches and did not get similar results.)
What “digital literacy” information skills should we be inculcating in students?
Why does Kelly provide the date of his Google searches? Should we be worried about the fact that Google searches are actually individualized by computer?

Daniel J. Cohen and Roy Rosenzweig, Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web (Philadelphia: University of Pennsylvania Press, 2006),Introduction.

Note the publication date
What are the reasons they are optimistic about the utility of the web for history? Are there any new reasons for (or against) that have emerged since they published?
Qualities [capacity, accessibility, flexibility, diversity, manipulability, interactivity, and hypertextuality (non-linearity)] and dangers of networks (quality, durability, readability, passivity, and inaccessibility)
How is authority established on the web? How do you know what to trust, what to be skeptical of, and how to use it?
How do you know what order in which to read hypertext historical materials?
Should we look for argument in digital historical scholarship? Should we try to embed argument in digital historical scholarship?
What do you think of the practice of academic publishers of charging (relatively high) prices for access to their digital databases of journals? Do you feel the same way about book publishers?
What does “open source” mean?