Usenet archive

From Computer History Wiki
Revision as of 15:25, 2 August 2020 by Neozeed (talk | contribs)
Jump to: navigation, search
David Wiseman

There is an incredible archive here of all the early usenet posts from February 1981 until June of 1991.

Magi's recount

Magi's NetNews Archive Involvement Well, the thank-you's have been rather ebullient all day long today and I feel somewhat embarrassed by the attention. Especially given how long it took us to get the archive on line and visible! It has to be close to 10 years now. Sigh. The story is more a story of fits and starts than of resolve. And our contribution accounts for some (most?) of the first 10 years of the Google archive.

If I recall correctly, the issue of Henry Spencer's (actually, the University of Toronto, Department of Zoology's) NetNews archive was raised at a Usenix conference in the early 90's. The question: can we get at them? Bruce Jones was especially interested in this. Henry's answer was that it really wasn't going to be easy because he had neither the disk space nor the tape drive to pull them all down to make them available.

I, it turned out, did. So one bright winter day I drove from London (Ontario Canada) to Toronto (Ontario Canada) -- a two hour drive in my shiny new pickup truck and picked up 141 magtapes from the Zoology department at UofT and brought them back to the Department of Computer Science at the University of Western Ontario. (A not unimpressive bandwidth, by the way, of some 18Mb/sec :-) never underestimate the bandwidth of a pickup truck on the highway!)

Then with the help of several people (some of whom have not yet been credited) we started to pull the data off of the tapes and onto disks in both the Computer Science department and the Robarts Research Institute. Lance Bailey, then with the Robarts Research Institute, did the pulling there and I with assistance from Bob Webber did it at Computer Science. Bruce Jones from UCSD took some vacation time and came up here to help pull data down for a week or so as well.

But we quickly ran out of space and time: Lance left Robarts for UBC, Bruce's vacation ended, and Bob and I got busy doing other things (like our jobs). As a result, the archive project made very little progress over the next few years.

Then Brewster Kahle started pushing on us (thanks Brewster!) to get it done. He even bought us a large disk to hold the archive when we truly ran out of space. With the help of Sue Thielen, who was out of work and bored, we got all of the rest of the tapes read down onto that disk. Unfortunately, that disk was not "close enough" to either a tape drive or the ftp server to make the data available to anyone. And it wasn't organized in anyway usefully.

Brewster pushed very gently for a very long time but the new archive project was far from the top of the list of projects I was supposed to be working on and I just never got it going again.

Late this summer Michael Schmitt from Google started pushing as well. And as luck would have it, I was able to hire a student to do the final sorting of the archive as well. And, that luck still holding, I managed to "steal" enough space on the ftp server for the entire archive! But it still took months to get that figured out and the archive transferred to a machine from which they pull the archive. It was the middle of October before we were able make the collection available to Google. And it is actually available, although totally unsorted, to anyone who wants it and can deal with pulling some 160 files ranging in size from 1.4Mb to 65Mb. Just drop me a line to say please and we'll arrange to make it visible to you.

I'd still like to impose a bit more order on the raw archives than we have but the time just hasn't allowed for that...

Original ftp site headder

230-Yes, you have found the on-line copy of Henry Spencer's UTZOO NetNews
230-Archive.
230-
230-It is not in a very reasonable format but it is all here. There were
230-141 magtapes in the collection and, so far, we have them organized to
230-reflect these tapes. You will find two types of files in this
230-directory:
230
230- newsNNNfM.tgz the archives, tar'd and compressed
230-and newsNNNfM.toc the tables of contents (tar listings)
230-
230-where NNN is the tape number (from 001 to 141)
230-and M is the save set number (from 1 to 3 but usually 1).
230-
230-There is also a file called AllTOC.tgz which is a compressed tar archive
230-containing all of the .toc files.
230-
230-Please note that news001f1.tgz contains A news. All of the other tapes
230-contain B news. The news in this archive was collected between February
230-of 1981 and June of 1991. (The Scavenged.tgz file contains A news articles
230-which were "found" off the end of some of the earlier tapes.)
230-
230-Also below here is an 'info' directory which contains several useful
230-files with "overviews" of some of the archive information. Hopefully
230-the filenames will explain what they are. (I know, probably not.)
230-
230-The 'pc' directory contains Windows code which can unpack .tgz files. I
230-have also been told that the latest versions of WinZip can also read them.
230-
230--- magi

Controversy

I just found out that the UTZOO archives have been destroyed. Only 3rd party mirrors exist. What an incredible loss, of nearly 2 decades of written history. Absolutely tragic.

From Archive.org:

This is not a collection of the UTZOO Wiseman Usenet Archive.

In 2020 after sustained legal demands requesting a set of messages within the Usenet Archive be redacted, and to avoid further costs and accusations of manipulation should those demands be met, the archive has been removed from this URL and is not currently accessible to the public.

Included in this item is a file listing and the md5 sums of the removed files, for the use of others in verifying they have original materials.