this post was submitted on 14 Oct 2024

961 points (99.7% liked)

Technology

58799 readers

3927 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

961

The Wayback Machine is back as a read-only service after cyberattacks (www.theverge.com)

submitted 1 week ago* (last edited 1 week ago) by misk@sopuli.xyz to c/technology@lemmy.world

63 comments fedilink hide all child comments

edit: adjusted title slightly

top 50 comments

sorted by: hot top controversial new old

[–] Lojcs@lemm.ee 133 points 1 week ago (2 children)

...Google started adding links to archived websites in the Wayback Machine

They better be compensating it..

[–] SirEDCaLot 60 points 1 week ago (3 children)

I don't agree. Free linking has always been a vitally important part of the open internet. The principle that if I make something available on a specific URL, others can access it, and I don't get to charge others for linking to a public URL is one of the core concepts of the internet itself.

[–] AlligatorBlizzard@sh.itjust.works 152 points 1 week ago (2 children)

Google killed off their own cached pages last month and they're now using IA as a replacement. Free linking is definitely important, but this is Google we're talking about, and them using IA to save money - this feels a lot more exploitative if Google isn't funding them in some way.

[–] Crackhappy@lemmy.world 75 points 1 week ago (1 children)

I think you're both right. Anyone should be able to link to an IA page, but Google basically was doing the same thing as IA with their cached pages. Now they've gotten rid of that service and are simply relying on IA to take all of the load that they had. I think they should help fund IA to compensate for the extra load.

[–] beejjorgensen@lemmy.sdf.org 20 points 1 week ago (2 children)

I agree they should. But I also agree they shouldn't be required to. And if they don't, that we should just live with it as the lesser of two evils.

load more comments (2 replies)

load more comments (1 replies)

[–] sugar_in_your_tea@sh.itjust.works 21 points 1 week ago (1 children)

There's a difference between your average Joe linking something and a massive tech company linking something. The first should always be allowed, the second should have an expectation of some form of compensation. That's why there are differences in licensing terms for lots of services, if you're using something commercially, you pay a different rate than if you're using something privately.

That said, this is on IA to enforce, and I believe they should.

[–] SirEDCaLot 5 points 1 week ago (1 children)

Strong disagree. If I make a website people like, and Google links to it, should Google have to pay me? If so, Google basically can't exist. The record keeping of tracking every single little website that they owe money to or have to negotiate deals with would be untenable. And what happens if a large tech journal like CNET or ZDNet Links to the website of a company they are writing an article about? Do they have to pay for that? Is the payment assumed by publicity? Is it different if they link to a deep page versus the front page?

What you are talking opens up a gigantic can of worms that there is no easy solution to, if there is any solution at all.

I will absolutely give you that what Google is doing is shitty. If Google is basically outsourcing their cache to IA, they should be paying IA for the additional traffic and server load. But I think that 'should' falls in line with being a good internet citizen treating a non-profit fairly, not part of any actual requirement.

[–] sugar_in_your_tea@sh.itjust.works 8 points 1 week ago (1 children)

What you are talking opens up a gigantic can of worms that there is no easy solution to, if there is any solution at all.

It might if I was suggesting any kind of legislative solution here. I'm not. I'm merely saying that IA should be more selective about how it can be accessed.

For example, if a journalist is doing a piece about how websites secretly change content, I think it's entirely reasonable for them to pay for accessing IA for the purposes of that article, because it's directly related to a commercial endeavor. However, I don't expect random internet users to pay for access to that same information, because it's not related to a commercial endeavor.

In general, you should pay for content that you're going to use commercially.

If Google is basically outsourcing their cache to IA, they should be paying IA for the additional traffic and server load.

And that's precisely what I'm saying. I'm also taking it a step further and suggesting that IA should be on top of it so companies like Google (who are profiting from their service) pay, while regular internet users don't.

[–] ChairmanMeow@programming.dev 3 points 1 week ago (1 children)

In general, you should pay for content that you're going to use commercially

Sure, but merely linking to a page isn't reusing the content. If said content was being embedded, rehashed or otherwise shown then a compensation would be fair. But merely linking to a page should absolutely be free. That's a massively important cornerstone of the internet that shouldn't be compromised on.

Linking directs traffic which can be monetized by the website itself, it shouldn't require additional fees on top.

load more comments (1 replies)

[–] avidamoeba@lemmy.ca 2 points 1 week ago* (last edited 1 week ago) (2 children)

This view is a bit naive in that it doesn't take into account a lot of variables. It favors established large actors in their ability to extract and accumulate ever more value from the ones they link.

load more comments (2 replies)

[–] lud@lemm.ee 26 points 1 week ago

I don't know if there is compensation but the internet archive says it's a collaboration and they seem to be happy about it.

https://blog.archive.org/2024/09/11/new-feature-alert-access-archived-webpages-directly-through-google-search/

[–] abofim@discuss.tchncs.de 67 points 1 week ago

op forgot to mention that it is a "provisional, read-only manner,” according to founder Brewster Kahle.

[–] FlyingSquid@lemmy.world 51 points 1 week ago (2 children)

I really hope the rest of the archive comes back soon. I was in the middle of a book and it was a book I hadn't read since I was a kid.

Yeah, I could pay for it or wait for it to come via interlibrary loan (it's not exactly a well-known book), but I really didn't need a physical copy. And it isn't even all that long.

Sigh.

[–] empireOfLove2@lemmy.dbzer0.com 27 points 1 week ago* (last edited 1 week ago) (1 children)

Damn it'd be a shame if someone DM'ed me the name of the book and I had to go looking to see if there's an epub/pdf version available for download in certain places. A real shame indeed.

[–] FlyingSquid@lemmy.world 12 points 1 week ago (2 children)

I don't care saying what book it is right here, because I've looked for both and came up wanting. It's not available normally as an ebook for purchase, so I have my doubts.

https://www.goodreads.com/book/show/997118.Doktor_Bey_s_handbooks_of_strange_sex

Basically, the IA had it because they scan in masses of texts without even caring what they are. As long as they get a copy and it isn't in the archive yet, they'll scan it in.

FWIW, it's pretty amusing.

[–] empireOfLove2@lemmy.dbzer0.com 16 points 1 week ago (1 children)

Oh thats a super off the wall book. It barely exists anywhere let alone an ebook. I stand corrected and humbled.

[–] FlyingSquid@lemmy.world 11 points 1 week ago (1 children)

It was found for me by someone else! I am amazed.

[–] empireOfLove2@lemmy.dbzer0.com 13 points 1 week ago

Damn! And I thought I knew all the weird nooks to find books online.... I have much to learn

[–] fossilesque@mander.xyz 9 points 1 week ago (2 children)

I got you fam, dm you a link in 1 sec.

[–] FlyingSquid@lemmy.world 4 points 1 week ago (1 children)

Wow! Thanks! I looked and looked!

[–] fossilesque@mander.xyz 13 points 1 week ago (7 children)

Anna's Archive, just author's name search. :)

[–] Rai@lemmy.dbzer0.com 2 points 1 week ago

What a dope site!

load more comments (6 replies)

load more comments (1 replies)

[–] small44@lemmy.world 10 points 1 week ago (1 children)

That's why I download everything

[–] FlyingSquid@lemmy.world 4 points 1 week ago (1 children)

Downloading books you have to borrow from the IA is not easy these days.

[–] Appoxo@lemmy.dbzer0.com 3 points 1 week ago

Other sides

[–] Snapz@lemmy.world 44 points 1 week ago (1 children)

Capitalism hates a memory. Hates/fears anything it can't update, whitewash or otherwise directly control or obscure after the fact.

If humanity had any hope, we'd surround this thing with torches to defend it tooth and nail.

[–] BitsAndBites@lemmy.world 9 points 1 week ago (1 children)

Thanks, I just used their PayPal link to send my support and light my torch!

https://archive.org/donate/

[–] Snapz@lemmy.world 1 points 5 days ago

You give me hope, I've done the same.

[–] argh_another_username@lemmy.ca 26 points 1 week ago (2 children)

Ok, serious question. Why is it normally read/write? I’ve always treated it as being read only.

[–] TheLugal@lemmy.world 68 points 1 week ago

To you as a user it's readonly. To the thousands that submits urls for archival it is readwrite.

[–] antonim@lemmy.dbzer0.com 15 points 1 week ago (1 children)

You can (well, could) put in any live URL there and IA would take a snapshot of the current page on your request. They also actively crawl the web and take new snapshots on their own. All of that counts as 'writing' to the database.

[–] SkaveRat@discuss.tchncs.de 6 points 1 week ago (1 children)

Not just websites. Basically any digital media. From PDFs, book scans, manuals, floppy disks, CDs, basically anything even remotely worth archiving

load more comments (1 replies)

[–] dread@lemmy.world 25 points 1 week ago* (last edited 6 days ago) (2 children)

What's frustrating is that the ones who claimed to have done this are self-proclaimed "hacktivists". You're stupid if you think the Internet Archive is the enemy in this day and age.

[–] Flax_vert@feddit.uk 5 points 1 week ago (1 children)

What were they hacktivising?

[–] misk@sopuli.xyz 19 points 1 week ago (1 children)

Some anonymous group claimed it was attack on USA for supporting ethnic cleansing in Palestine. This is why they did something that benefited Disney and Nintendo. Makes perfect sense!

[–] Flax_vert@feddit.uk 6 points 6 days ago

It's an internet archive. Not an american government site.

load more comments (1 replies)

[–] leanleft@lemmy.ml 17 points 1 week ago

currently* back only as readonly

[–] Corno@lemm.ee 10 points 1 week ago

Glad to see it's recovering. I hope the whole archive can come back up soon!

[–] argh_another_username@lemmy.ca 6 points 1 week ago (2 children)

Ok, serious question. Why is it normally read/write? I’ve always treated it as being read only.

[–] altima_neo@lemmy.zip 21 points 1 week ago* (last edited 1 week ago) (2 children)

I mean how else would they archive web sites or content?

[–] BossDj@lemm.ee 8 points 1 week ago (1 children)

Web crawling?

[–] misk@sopuli.xyz 14 points 1 week ago* (last edited 1 week ago) (1 children)

IA hosts TONS of user uploaded content. They’re not uploading those Gameboy ROMs themselves.

[–] v_krishna@lemmy.ml 5 points 1 week ago

Live music archive is still down for example 😞

[–] argh_another_username@lemmy.ca 7 points 1 week ago (2 children)

I’ve always thought they were a crawler.

[–] kautau@lemmy.world 8 points 1 week ago

The Wayback machine is a crawler, which is big part of what they do but not everything. The Wayback machine crawls its own pages, but you can also submit URLs to be crawled.

The other part of what they do is hosting a significant number of digital archives of media that is no longer sold / in print / distributed. Much of that content is user uploaded. Like “oh hey I found this old clip art cd from the early 90s. I don’t really have a use for it, but if this doesn’t get uploaded somewhere it’s probably going to be lost to time. I’ll submit it to the internet archives.”

[–] pmc@lemmy.blahaj.zone 2 points 1 week ago

They do some crawling themselves, but Archive Team (a third party group) does a lot of web archiving as well.

[–] pmc@lemmy.blahaj.zone 2 points 1 week ago

My most frequent use case of the IA in general is the Cover Art Archive, and I frequently upload cover art for albums to the CAA via MusicBrainz. That's how I discovered the IA was down, when an upload failed.

[–] Sam_Bass@lemmy.world 3 points 1 week ago

great job, mr. peabody

load more comments