Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search
 

jberryhill

(62,444 posts)
Wed Apr 25, 2018, 08:44 PM Apr 2018

One Internet Archive Quirk Which May Not Be Relevant

If you remember, a lot of the early web, in which the IA was first launched, consisted of static HTML pages, making it relatively easy to store and compress content. However, if you think the IA is merely gorging down a "copy of everything on the internet" on their budget and without the storage space of the Almighty, then you may have a simplistic view of how the IA works, and how it has worked at different times during its development.

And, oddly, since I was looking for an old picture I tweeted from an IA server location several years back, I found that my photograph wasn't archived....




In any event, as things moved beyond static HTML and storage capacity varied, IA implemented, at various times, different sorts of tricks to deal with either problem.

Since I deal in IP disputes which often hinge on claims of "who was first", one of the tricks I noticed was that IA would skimp on storage space by sometimes making external calls to the existing site for images. If the image wasn't stored at IA, one of two things would happen, (a) you'd get a broken image icon, or (b) if the same filename still existed on at the reference URL and the same last-modified date (which is easy to change on some systems), then when you called up the "archived" version of the page, IA would simply inline the presumed-to-be-the-same image file from the referenced site.

There was a time, and I haven't checked this lately, where active content - i.e. content generated by scripts or served up from databases - would be handled in a similar way: if the relevant php file existed on the live and current server, then it would be invoked to serve up the content.

None of this may be even remotely relevant to the teacup tempest at hand. I am only saying that there are circumstances I have encountered in the course of my career where there had been issues involving "things in the Internet Archive not actually being what they seem to be". That's all. Whether it applies in this instance - I have NO IDEA.

But more importantly:

Joy Reid is a living breathing human being. IMHO if you want to know what sort of person she is, and what sorts of opinions she holds, you don't need to consult the Internet Archive. I would imagine the best way to know what sort of person she is and what sorts of opinions she holds, would be to converse with her.

8 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies

GusBob

(7,286 posts)
1. Malcolm Nance, speaking on the Stephanie Miller show
Wed Apr 25, 2018, 08:54 PM
Apr 2018

Described this at "black propaganda" He said it was pulled on him

He said don't believe every word you read on the Internet

Me, I wouldn't put nothing past the Russians

unc70

(6,115 posts)
2. I know several ways to fool the archives
Thu Apr 26, 2018, 01:23 AM
Apr 2018

Doing research a couple of years ago I discovered ways to "poison" the IA in ways similar to what can/could be done with archives like Google. It too depends on problems with dynamic content.

Lots of other issues can affect IA. I have seen it reported that changing the "robots.txt" file can make archived content disappear.

These kinds of issues affect you when simply trying to "save" a web page on your desktop. In a simple example, when you try to redisplay an online news article, you might get one that was updated in a later edition long after you thought you had "saved" it. The time stamp of your save would not indicate the content had been modified. Something similar can be used to change the historic archive.

I have seen such techniques being used deliberately in the wild to re-write news articles. Very 1984. I posted about this at DU several years ago.

BTW these techniques require no hacking of the IA, only a source for the dynamic content. Lots of subtle ways it can be done. To long and technical to describe at the moment while traveling

struggle4progress

(118,309 posts)
3. It seems the disputed archives have disappeared from the Wayback Machine
Thu Apr 26, 2018, 01:27 AM
Apr 2018

... Reid's claim that the posts were fraudulent and the result of a hack was met with immediate and widespread skepticism.

The scrutiny only intensified after a representative for the Internet Archive, a nonprofit dedicated to storing old digital content, said Tuesday that the organization could not verify the claim. Links to Reid's old blog were stored in the Wayback Machine, a service run by the nonprofit ...

... But at some point, unbeknownst to the people working at the Archive, the archives were removed from the Wayback Machine via an automated process ...

http://money.cnn.com/2018/04/26/media/joy-reid-hacking-fbi-investigation/index.html

Response to jberryhill (Original post)

Azathoth

(4,610 posts)
7. Where is the archive loading the pages from if not from an archived snapshot?
Thu Apr 26, 2018, 03:55 AM
Apr 2018

Her actual blog was taken down awhile ago. Any links to her server would presumably 404. Moreover, lets assume the archive was loading pages from her actual site -- then it would be loading pages from the lastest version of her site. Which means the first page would have dates from say 2010 or whenever it was last updated, which would be immediately apparent to anyone who was loading a 2006 snapshot.

All of this hand-waving about the details of the Wayback Machine is starting to give me flashbacks of the Great Superscript Hunt when suddenly everyone was trying to concoct ways that a 60's typewriter could produce a document with Microsoft Word default settings.

 

jberryhill

(62,444 posts)
8. ....which may not be relevant
Thu Apr 26, 2018, 04:18 AM
Apr 2018

Last edited Thu Apr 26, 2018, 05:07 AM - Edit history (1)

Do you see those words in my OP?

Yes or no?

I haven't drilled down through every detail in this fundamentally irrelevant controversy about a media figure for whom no one voted to obtain her position.

To the extent I've looked at it at all, my only comment is that I have encountered circumstances in which IA content is not what it seems to be. Whether there are other circumstances, I don't know.

You will also notice a further statement to that effect at the end of my post, which you also seem not to have noticed.

That picture, incidentally, is one I took of an IA server location in Redwood City circa 2011. As you might imagine, it was not at that time what one would consider to be a high-security facility.

Latest Discussions»General Discussion»One Internet Archive Quir...