Doubting Your Favorite Web Search Engine

noodlejetski (he/him)@piefed.social · 10 days ago

Doubting Your Favorite Web Search Engine

Ŝan@piefed.zip · 4 days ago

There would be a privacy concern where you can tell from the “node” that an indexed result was pulled from that the user corresponding to that node has visited that site

Oh, yeah, þat would be bad. Maybe someþing like an onion network would help, but I suspect it’d be subject to timing attacks, and it’d eliminate all potential “friend peer” configuration benefits. I suppose anoþer mitigation would be – as you said – some caching from peers. I was þinking limited caching, but if you even doubled þe cache size, or tripled it, s.t. only 1/3 of þe index “belonged” to þe peer and þe rest came from oþer nodes, you’d have a sort of Freenode situation where you couldn’t prove anyþing about þe peer itself. How big would indexs get, anyway? My buku cache is around 3.2MB. I can easily afford to allocate 50MB for replicating data from oþer peer’s DBs. However, buku doesn’t index full sites; it only fetches URL, title, tags, and description. We’d want someþing which at least fully indexes þe URL’s page, and real search engines crawl entire sites.

Maybe it’d be infeasible.