News Photography

Finer Points: Web rot is erasing our images and videos

September 2, 2023

The Internet is burning away our photos, videos and older websites daily. At nearly 40 years old, the Internet had lost much of its early history to changing technology and corporate and user desires.

Image credit: Tim Macpherson/Getty Images

The Internet is in the midst of a midlife crisis. Its age is anywhere from 33 to 40 years old, the ripe age for contemplating past misses and successes, its legacy and an overbearing questioning of choices, behaviors and goals.

At nearly 40 years, the Internet has grown from a wacky idea of connecting the world’s knowledge centers to aid sharing of scientific research, and evolved into our modern omnipresent world of screens. Its presence is so great we now define generations generally by those who remember the birth of the Internet (millennials), those who grew up with the expansion into mobile Internet (Generation Z), and those who will not know a time before the Internet (Generation Alpha).

Finer Points: Ideas for issues big and small

About this series:
Innovation pushes our industry and the craft forward. This series aims to share solutions rooted in research and sourcing, to some of our biggest and smallest problems in the world of photography and video.

Read the entire series here.

As a quadragenarian, the Internet is different than it was. Right before our eyes, the web is not only changing but rotting. Sometimes, the changes are small: Apple removes a photo product, Google removes unlimited photo storage, and Twitter breaks its legacy referral links. Other times, it is much larger, such as Flickr deleting thousands of user photos, Vimeo deleting user videos (they got my files with this one), book publishers revising books you thought you owned digitally, video digital media you purchased just vanishing one day, or entire websites disappearing when a company shuts down servers.

‘Sunsetting’ is Internet speak for deletion from the Internet. Some famous sunsets have included Friendfeed, MySpace, Vine, Google+, Delicious, Periscope and a slew of other platforms that have come and gone, taking all their users’ text, photos and videos with them. Thankfully, DPReview was spared this fate.

Image credit: Jeff Keller/DPReview

Personally, I’ve lost many photo and video files to the Internet. I lost some of my early work in photojournalism and documentary filmmaking to the death of Flash when the move to HTML5 killed multiple personal and professional projects, and organizations scrubbed the data from servers. I’ve also lost work created in SoundSlides and projects produced for publications across the US from coast to coast (The Santa Cruz Sentinel updated their servers and lost some of my work to The Daily Beast, which removed their photo blogs). I’ve also lost several short documentary films to a Vimeo update, including work featured in a segment on HBO’s Last Week Tonight. DPReview itself also just went through our own close call with the trash bin of internet history. It had been announced our site was closing, and the fear of losing decades of how-tos, explainers, reviews, photos, videos and more kept us up at night. Truly, it did. Luckily, we’re still here, and everything was saved, but what happened to DPReview is more the exception than the rule.

Breaking the Internet

The exact day the Internet began can vary depending on what mile marker you feel is most relevant: Was 1983 when the TCP/IP standard was established, allowing local networks across the globe to communicate using a shared interface standard? Was it 1989 when the first consumer ISPs arrived; is it 1993 with the release of Mosaic, the first free consumer web browser; or is it 1996 and the birth of one of the first viral memes:

Whenever you think it started, the Internet today is having a midlife crisis. Behind it are the founding goals of saving human knowledge and making it accessible to all. This is the early Internet, created as a repository for information, engineered to address a human need to preserve knowledge and the things we create and to finally make it accessible to others.

Today’s Internet, seemingly driven by economic pursuits, is a cornucopia of tech companies racing to leverage AI to make operations more efficient, always on service with monthly subscriptions, social media clout chasing with revenue sharing schemes or plain old quarterly growth financials. Ahead of it lies an undefined path for what the Internet wants to be and if it can ever live up to its initial promise.

In this flux and midlife searching, the Internet is also forgetting its past, literally.

Past websites, scholarly papers, early streaming videos, photos and more are vanishing daily, lost to the ether. Web rot reminds us that the Internet is not permanent. Every time a bookmarked link doesn’t work, an old social media post takes us to a 404 error, an elder millennial tells you about MySpace or their high school Geocities website you can no longer visit, or you remember a band that you swear was real and that you saw live (‘!!!’), but no one else can remember because their name is horrible for SEO (seriously ‘!!!,’ I know I didn’t make you up).

Web rot can be hard to spot until you’re searching for something you knew existed, often too late to save it from the deletion bin of history.

Image credit: Stephen Swintek/Getty Images

It’s not lost on our team that it’s late August 2023 as we publish this, and we weren’t supposed to be here today. When Amazon announced its decision to shut down DPReview, it seemed that the site would disappear from the Internet, despite being a publication of record for an entire industry, as well as a massive crowd-sourced knowledge base. For some of you, it was the end of a community resource or a daily ritual. For us, it was an eraser of decades of our blood, sweat and tears (like tears in the rain).

As the internet ages, web rot erases its past through broken links, deleted servers, bad backups, ‘sunsetted’ platforms, government actions, business decisions, user error and malicious actions.

Nothing lasts forever

We were reminded of web rot again last week when a Twitter X user posted that several images were missing from the platform, including the 2014 selfie posted by then Oscars host Ellen DeGeneres during the show’s TV broadcast. The photo, taken by actor Bradley Cooper’s outstretched arm, crashed Twitter upon its posting and became one of the most viewed images of the year. Setting aside whether you’re a fan of the celebrities or deem the photo of value or not, its place in social media history is notable and worthy of preservation.

It’s unclear how long DeGeneres 2014 was offline before the social media platform formally known as Twitter brought it back online. It’s also unclear how many other user’s images were affected and if they are also back online.

Image credit: Photo by Bradley Cooper; Art Direction by Ellen DeGeneres; posted to Twitter.

Twitter X caulked it up to a software bug which only affected posts from 2011 to 2014. Users of the platform started to notice missing images and broken URL links over the weekend, and by Monday, engineers had posted the glitch had been addressed.

The famous Oscar selfie was once again online. However, it’s still unclear when the glitch started, how long images were offline, if all affected images are back online or what would have happened if the deletion was never noticed.

For instance, this period falls smack dab in the middle of another piece of world and internet history, the Arab Spring, which began in Tunisia in December 2010 and spread across Africa and the Middle East as a series of anti-government protests and rebellions.

Arab Spring organizers used Twitter (and, to a lesser extent, Facebook) to spread information and organize quickly. Twitter and social media became a leading source for images, videos, text updates and other play-by-play documentation on what was happening on the streets in near real-time. The Tunisian protesters’ posted photos and videos of heavy-handed police marching on mostly peaceful protesters. Their messages of anti-corruption, the need for government accountability and how economic inequality was harming society resonated with neighboring Arab nations, sparking protests in Egypt, Libya, Yemen, Saudia Arabia, Jordan, Oman, Algeria, Lebanon and others.

A youth films Arab Spring protesters being tear-gassed by police near Tahrir Square on November 23, 2011, in Cairo, Egypt. Several Arab nations across Africa and the Middle East saw protests against their governments during this time, and protesters documented much of it firsthand over social media. Changes to social media platforms would wipe out this history overnight.

Image credit: Peter Macdiarmid/Getty Images

The governments of Tunisia, Libya, Egypt and Yemen were eventually overthrown, and the Arab Spring also played a key role in sparking the ongoing Syrian civil war.

All of this history has been captured on social media platforms by protesters, governments, bystanders and journalists. Today, it’s a treasure trove of information, captured on a verifiable timeline for scholars, historians and anyone else interested in learning what the Arab Spring was and the lasting impact it has had on the world.

But the bulk of these images and videos were made between 2010 and 2013, and if a glitch in code can erase it all, what a scary prospect for our future knowledge. Or, if not a glitch, perhaps the closure of a platform, a change in ownership, a change in law where a company resides, a change in a platform’s financials dictating which servers can be taken offline, or simply a redesigned file architecture and migration which breaks all existing links.

Web rot isn’t that hard to achieve, and once it takes hold, it can go unnoticed or, worse, go noticed when it’s too late to undo it.

A mesh of knowledge

The early stages of the web were dubbed the ‘information age’, and the ‘information superhighway’, and its framers did not have to think about business plans, IPOs, branding or quarterly projections and valuations. Their goal was simple: to allow any computer anywhere in the world to communicate and access information from any other computer anywhere in the world. In these early days, they envisioned the Internet as a repository for the world’s knowledge, a place where things would be preserved forever and freely available.

It was the culmination of humanity’s need to document and save: an evolution that began with oral traditions and cave drawings passed through clay tablets and papyrus paper and continued with photography, book publishing, newspapers, magazines, CDs, VHS tapes and more. The Internet seemed like the culmination of it all. Finally, a global library to hold digital versions of all of human history, creation and knowledge, available to anyone with a computer and a modem.

That was the vision, but it hasn’t played out that way.

An identity crisis

Perhaps the only real truth of the Internet is that it’s always changing and is never any one thing. The Internet, as we thought we knew, is revealing itself to be an illusion, the original ideals long consumed and scattered to the winds, and in their place are pseudo-walled gardens of big data and means of monetization.

It’s a far cry from the Internet of yore. The framers didn’t have anything in mind beyond needing a way for networks of computers across the globe to have a common and shared way to talk to each other. Beyond this, the World Wide Web allowed any device anywhere in the world to access and share information; it was a means to produce the world’s greatest repository of information, first and foremost – the information age. These framers did not have to think about money, quarterly projections, IPOs, programmatic advertising or commerce. Yet today, we do.

What will the Internet be, what does it want to be, what do we want it to be? These are the questions of this midlife crisis.

The Internet is in search of an identity.

One place to look may be in the 90s, when the Internet was young and full of promise, before Facebook, before Amazon, before Google, back when Apple was considered a joke and AOL the future.

Way back in the 1980s, government, university and scientific employees created the connection protocols that allowed for the disparate networks of computers to communicate. This gave way to the commercial Internet.

Image credit: Carol M. Highsmith/US Library of Congress

Let’s take it back to 1991, in a room at the Computer Laboratory of the University of Cambridge, England. Within the room, about 15 computer science researchers are tinkering away on what would eventually help create the Internet as we know it today. Several other team members are spread out across the building and several flights of stairs.

These future internet pioneers had one problem: they needed coffee, and between the team, they only had one coffee machine that lived in a hallway. Walking down multiple flights of stairs or across a room to find it empty was not ideal; even worse, not knowing it needed to be refilled for later just delayed the process.

Seeing a problem and the potential of their computer network to solve it, they got to work and set up a camera to monitor the coffee maker. Their team needed knowledge (is there coffee?) and used their network to make that information accessible to anyone. In 1993, with the commercial Internet in full swing, the camera would go online and become the world’s first webcam.

The xcoffee camera refreshed a few times a minute, giving researchers at the University of Cambridge, England, a 128×128 px heads up when a new pot was ready.

Image credit: xcoffee webcam/University of Cambridge, England

‘The image was only updated about three times a minute,’ recalled Quentin Stafford-Fraser, one of the researchers who helped set up the camera. ‘But that was fine because the pot filled rather slowly, and it was only greyscale, which was also fine because so was the coffee’, The camera stayed online until August 2001.

You can read all about it here.

This story, however, has been lost. If you follow the link above, you’ll see the official page for the project is no more. I found this page referenced in multiple other articles about the official history. With the link broken, the official record is lost to history.

Only once I knew that the project was colloquially known as the ‘Trojan Room coffee pot’ did I start to find verified and reliable news media accounts from the time that fill in gaps in history.

Mo money mo problems

With a protocol for dispersed networks across the globe to have a common ‘language’ to interface, it wasn’t long before old companies started figuring out how to sell pizza over the Internet or for new ventures to start launching e-business and IPOs. These new ventures had different motivations and used slogans like ‘move fast and break things’ to justify their gold rush, and reminded us constantly how they were making the world a better place.

I had a front-row seat to it all. I grew up in San Jose, California. My parents both worked in printed circuit board design (the original reason it’s called Silicon Valley), and the promise of tech was all around us in the late ’80s and ’90s when the Internet came to homes. In these early days, I did see the well-intentioned university researchers, NASA scientists and educators trying to leverage the Internet as a learning tool and platform for storing human knowledge. But then the money came.

Beyond the framers of the Internet were people who saw the new roads and started to build their ‘cars’ to drive on them, but cars cost money, and to get the money, they needed business plans.

Soon, the information age gave way to the start-up age, peer-sharing platforms, broadband internet, social media, and the mobile Internet. The Internet’s identity and modus operandi also changed.

Along with that change comes web rot, where the shifting priorities and powers of the Internet fundamentally change and, in some ways, undermine the goals the Internet was created to achieve.

But perhaps, like all good things, they are only as good as the people behind them and the intentions around them. A good mid-life crisis requires us to be unsure of where we’ve been and what we want going forward.

Thank goodness this page didn’t become real.

We’ve gone from the early days of the Internet, driven by military infrastructure needs, scientific researchers and educators, individual enthusiasts seeking means of sharing information with others and the promise of the ‘information superhighway’ where the driving goal was to preserve human knowledge and make it accessible to all. From these lofty goals, we’ve traveled to the modern era of segmented echo chambers within an ever more commercialized commodification of the Internet, driven by VC funds, profits and big data. A new world where librarians, researchers and well-meaning technologists have been replaced by VC firms and Wall Street bros who realized there’s money in Silicon Valley, where inventors have given way to investors, where move fast and break things have become break things and move fast.

In this new era, platforms change hands, tech debt leads to rebuilds, and the need to show profit means you pivot fast to outpace your burn rate. Is it any wonder that the old web is rotting right before our eyes in this jumble, but we’re too busy to notice? On a financial spreadsheet, it costs money, time and energy, and there’s no line for social, historical or educational preservation or what benefit it has.

The overwhelming inevitably and danger of web rot is always present. This reality hits a little close to home, having just lived through it when DPReview was on the chopping block. Beyond web platforms going away and taking user data with them, there have also been many publications that have gone to 01110100 01101000 01100101 00100000 01100111 01110010 01100101 01100001 01110100 00100000 01100010 01100101 01111001 01101111 01101110 01100100

Save yourself, and save often

At its core, the Internet was created to store and make data accessible. Which begs the question, what ‘rules’ should apply to the commercial Internet? Is it ‘right’, ‘ fair’ or ‘okay’ for private institutions to shut down public media companies and platforms, particularly when those platforms host user-created data like images and videos or those properties house historical or educational material?

When DPReview was going to shut down, the fact that Amazon owned the site led our users and journalists to raise an issue that probably wouldn’t have come up if it was a small private company: corporate responsibility. In other words, do large companies that own media companies have any moral obligation to maintain or find a home for a publication’s archives when shutting it down?

I am unsure if I can answer that larger question, but I can make some suggestions.

First, practice good data hygiene, log into all your accounts where you post photos and videos, and ensure your email and other contact info are up to date. When a service (Flickr or Vimeo, for instance) changes its rules around how it treats your data, they’re required to notify you, and if your email is out of date, you can easily miss the news and lose your files.

Next, take stock of where you are posting and ask yourself if you trust the platform with your images and videos. If you do, continue posting, but consider downloading your data once a year to save files locally in a backup. Most social media sites have download tools to request your data; they’re often buried in account settings and can require waiting for an email with download instructions. You will likely receive multiple zip folders with a limited time to download them, but once you’ve unpacked and organized them manually, you’ll know that your files are safe should a company ‘pivot,’ ‘sunset,’ or change owners.

Finally, back to the question of whether it’s okay for owners of media companies to delete at any time the accumulated history of everything it had ever published. As a private company, legally, of course, they can. But as a social responsibility, maybe they shouldn’t. These archives have historical value, whether you are DPReview, Buzzfeed News, The New York Times or Wiener Zeitung (one of the world’s oldest newspapers).

How do you save publications from web rot and changing business needs?

Here’s a suggestion: work with preservationists within public libraries, tech companies and universities to maintain an archive and continue to make it accessible to the Internet. One good model to follow is the Life magazine picture collection’s approach. When Life ceased publication, the magazine kept its copyright to the 10 million photographs created across 120,000 stories. It also worked with partners to preserve all the photos and stories, eventually making them available to the public. Today, every issue from 1936-1972 is available on Google Books for free (Life was a monthly from 1978 to 2000, and those issues require a fee to access).


The Google Books Life magazine archive includes this September 20, 1948 issue.	The issue is famous for publishing one of the first examples of a photojournalistic photo essay.

Among the archived issues is the 1948 publication of ‘Country Doctor,’ an iconic photo essay by photojournalist W. Eugene Smith.

For 23 days, Smith traveled with Dr. Ernest Ceriani, the lone physician serving a 400 square mile area of Colorado, capturing his travel, house calls, tending to patients from birth to near death and a moment of recreational fishing to rest.

Photo essays as a form were relatively new in 1948, and ‘Country Doctor’ helped establish the format as a means to tell a story through images. This body of work is still used as an example in journalism schools today, but it could have been just as easily lost to the ravages of time.

Life magazine avoided web rot (sorta, Life.com’s published digital stories that weren’t in print in the late 90s seem to have vanished) by seeing the social, historical and educational significance of works like ‘Country Doctor’ and practicing good corporate responsibility to preserve the work.

We can preserve our digital treasures with little effort, but we have to first recognize the risks of web rot and ask the platforms that hold our data and the companies that own our media to recognize their inherent value. Some things can’t be fully captured as a line item on a corporate spreadsheet. Some things require a heart.

Author:
Shaminder Dulai
Source: Dpreview

856

0