Data protection: More than saving data, it's saving culture
I made a plea to my Facebook friends a few weeks ago to donate to the Internet Archive. Realizing thousands of entities petition us for our money, I also told them I’d write an article about why the Internet Archive — a nonprofit organization dedicated to the preservation of humanity’s digital culture — and other entities like it are so important. After all, why should I expect people to blindly donate money to a cause if it’s not soon shown to be a relevant and functional cause?
But before I dig into the relevancy and functionality of preserving our digital culture, let me lay out the canvas, giving a little background to the topic. That background may not be as attractive as Bob Ross’ happy little trees, but it should do the trick.
A little history…
Subjective estimates have true writing systems coming into existence around the 4th millennium BC. As human cultures developed the need for documenting how much grain was traded for how many sheep, or disseminating a code of laws to the people, the need for these writing systems grew. By extension, the sharing and archiving of information through the written word became increasingly vital. Books in the form of wood blocks (note I’ve already found a functional use for the Internet Archive with the previously linked information, which seems to no longer exist at the original URL) began to appear as tools for the documentation of oral stories and religious teachings , and then eventually moveable type came along, leading to the Printing Revolution in the mid- to late-fifteenth century. With it came the ability to circulate information and ideas in ways never before seen, a true “democratization of knowledge.”
Fast-forward to the information age, a time when the not-so-personal personal computer of the 1970s became much more personal, and the Internet began to gain critical mass. With widespread adoption of the Internet and the protocols it was built on came a profound change in how people communicated and collaborated. Corresponding technological advances in data processing and storage technologies brought a similarly dramatic change to how people used and retained the information arriving from their communications and collaborations.
It’s important to note at this point one of the keys to the Internet’s rise was “the free and open access to the basic [founding] documents, especially the specifications of the protocols” it was built on. This free and open foundation persists today, especially in the way we perceive how the Internet and its disparate components should ideally be treated by businesses, regulators, and governments. I note this because it parallels our discussion of the value of openly documenting our own digital legacy: just as the Internet has flourished and contributed to our culture from its open documentation, our growing digital culture has and will continue to flourish as we openly document it.
What of today?
So here we are in 2012, with Twitter users producing nearly
250 million tweets per day, e-book sales and distribution seeing huge gains, and what’s likely well over one trillion web pages floating out on the open Internet. A massive media explosion — not just of the written word, but also of digital files of every imaginable type: audio, visual, database, program, archive, etc. — is occurring, the likes of which humanity has never seen.
As computing costs continue to decline and free and open-source software alternatives become more numerous, humanity is increasingly creating and publishing in quantities that would have the most industrious Romantic raising an eyebrow. Yet as all this new material is being created and uploaded via the Internet for public consumption or being stored for later use, other material is being removed, getting replaced, or simply becoming lost. It’s rapidly becoming clear this abundant input and output of digital material both mundane and creative is forcing to light serious questions about if and how it should be preserved.
They say problems come in threes
In the first episode of the historic 1980 series Cosmos, renowned astronomer Carl Sagan talks about the ancient Library of Alexandria.
After wistfully talking about the lost knowledge of the Library, Sagan removes a fake placeholder papyrus scroll from a shelf and he says:
How I’d love to be able to read this book. To know how Aristarchus figured it out. But it’s gone, utterly and forever. If we multiply our sense of loss for this work of Aristarchus by… a hundred thousand, we begin to appreciate the grandeur of the achievement of Classical civilization and the tragedy of its destruction.
Putting the scroll away, he continues:
We have far surpassed the science known to the ancient world, but there are irreparable gaps in our historical knowledge. Imagine what mysteries of the past could be solved with a borrower’s card to this library. For example we know of a three-volume history of the world, now lost, written by a Babylonian priest named Berossus. Volume one dealt with the interval from the creation of the world to the Great Flood, a period that he took to be 432,000 years, or about 100 times longer than the Old Testament chronology. What wonders were in the books of Berossus?
Here Sagan elegantly describes the “irreparable gaps” in humanity’s history due to what we may less eloquently refer to today, in technological terms, as “data loss.” Yet as striking and relevant as his story is to our current concerns about preserving our own digital culture, a few fundamental differences arise. Those differences (primarily in culture) will hopefully become more obvious during the process of looking at the similarities.
Like the keepers of the Library of Alexandria, today we face three very similar problems. Those problems are:
What should we keep?
Why should we archive it?
Where and how should we store it?
What should we keep?
When the great Library of Alexandria was being maintained, it’s fair to say it likely contained nowhere near the amount of collected information residing in the analog and digital world of today. At that time the collections of Euclid, Dionysius Thrax (this link is a second fantastic example of the Archive, demonstrating relevance), and Archimedes were being collected for the benefit of the (rather small) known world. The sum of information contained in today’s Internet “library” surely must dwarf that of the old Library of Alexandria, not only in its quantity of information, but also in the sheer amount of information bordering on pedestrian at best.
“Pedestrian?” you ask. “Who are you to say what’s trivial and what’s meaningful?”
Now you see why the question “what should we keep?” is not a trivial one. Just as a library’s curator must make decisions on what material should be included in its stores, so too do we all play a similar role of curator with our own personal information. Shoebox of photos? Scan those and save them to external disk and Flickr. Source code from a game made in the late 1980s? Extract that information from an old floppy disk and upload it to SourceForge. The tweets from a Twitter account? Grab them for future reference with Backupify.
Yet with all this digital media and conversation enriching our lives, it’s quite possible our lack of ability to prioritize it may lead us down the path to digital hording. This underlines the difficulty historians and citizens alike have in deciding what’s worth saving. Sometimes we just have to let go and prioritize. Easier said than done, right?
Why should we archive it?
Identifying the “why” from a personal basis is easy enough; when it’s your digital stuff, you likely care what happens to it. Sharing your photos and videos with grandchildren; compiling your online prose for publishing; and passing on your online accounts, passwords, and digital assets to one or more beneficiaries should you die are all very personal reasons for wanting to archive your online data. Other, more complex reasons exist for wanting to archive this digital heritage, however.
Earlier this year journalist and historian Benj Edwards discussed the cultural legacy of software and other digital media. While the brunt of his essay focused on how software piracy benefits the preservation of our digital past, he managed to sneak in several arguments for why we should preserve our digital legacy:
1. Just as today’s archaeologists have studied how humans used tools like the thresher to better harvest and process wheat, tomorrow’s archeologists will want to better understand the tools we used to jump from photography to software-based digital image creation.
2. Future historians will better understand how cultural icons like Mario and Luigi weaved their way into the fabric of our lives, both metaphorically and literally.
One additional “why” I’d like to point out: because if we don’t, we lose history and thus valuable citations which could be used to corroborate other interconnected history.
Granted, all of these points have involved some historical aspect of saving cultural information. As one who used to find history to be stale and tedious (now not so much), let me at least attempt to make this more relevant to you, the reader.
I’m going to guess you’ve used Wikipedia within the last year to learn a bit more about something. Whether or not you take a Wikipedia article at face value, know a properly written one should be based on multiple citations. While physical books and journals make excellent sources, an increasing amount of web content is viably being used for citations. But what happens when an excellent piece of historical online material simply vanishes? Goodbye citation; one less piece of corroborative history. And just like that a common online tool — one that often gets taken for granted — becomes a tiny bit less effective and relevant.
Where and how should we store it?
Earlier I referenced, as example, the extracting and saving of software source code from a floppy disk. If you actually read that article, you gain a keen understanding of why it’s important we consider location and format of our stored information. Here’s an excerpt from Gus Mastrapa’s article:
Diaz begins to set up his gear on a broad, wooden table. He begins conjoining the beige plastic cases of Apple II computers, disk drives and monitors into an impressive stack of near-ancient technology.
The geek squad’s goal is not simply to find out if Mechner’s old disks work. Today is about getting that old code out of its magnetic tomb and getting it onto the [I]nternet. That’s why they need Diaz’s Gordian knot of boxes and cables and not just any old Apple II — because Diaz has wired his computers into a network via Ethernet cable. Underneath, he plugs in a modern Dell laptop to serve as the receptacle for whatever treasures the crew manages to unearth.
The disks in question date back to the mid- to late-1980s. That storage technology is well over two decades old. Considering information degrades on magnetic storage medium, it’s surprising the group was able to retrieve all of the data they did. Yet what a poignant statement their efforts make in our discussion of where and how to store data!
We must consider what storage medium we should use and in what format we should archive the data. But that’s not all: in tandem, we must have a plan for moving that data to updated mediums as time progresses, and we must also be cognizant of retaining the ability to read the file format with time. If the software required to read a file only runs on Windows 98, then a plan must be made on how to emulate the environment on newer systems so it can be read, or the file must be converted to a more readable format.
The relevancy and functionality of data archiving
What should we keep, why should we archive it, and where and how should we store it? The discussion of these three questions reveals the issue of preserving humanity’s digital culture to be complex, yet a relevant and functional one.
It’s a relevant topic because it affects us all: the single user uploading family holiday photos to the computer; the business maintaining a strong Internet presence; the research facility sharing terabytes of data with other research facilities; and the mass of the world learning the history of the Internet’s social media phenomena.
It’s a functional topic because data preservation “contribute[s] to the development or maintenance” of humanity as a whole. It’s difficult to say how far backward humanity may have stumbled with the unfortunate destruction of the Library of Alexandria. Perhaps our scientific and cultural evolution would be further along than it is now.
As such, we have a certain duty to continue humanity’s development by preserving our history. That in itself is nothing new; rather, the format we use to preserve it is changing. The questions we ask are also still the same; rather, the cultural context with which we address them is changing.
I close with my original plea: consider donating or contributing to the Internet Archive and other preservation groups like it, all while archiving your own cultural legacy to share with others. Cultures aren’t forever, but they are worth sharing.