All hosts added and sorted.
I’ll try to change the script to include those whitelists in the process, thanks for the idea.
All hosts added and sorted.
I’ll try to change the script to include those whitelists in the process, thanks for the idea.
I have now added a simple whitelist generator script that fetches the lists you guys mentioned. So now our ddg list shrank from 24.6k hosts to 22.4k hosts, so I think we removed a lot of false positives
static.twitchcdn.net
usher.ttvnw.net
are required for twitch streaming
cdn2.downdetector.com
is needed to check for offline websites
149366088.v2.pressablecdn.com
for OMGubuntu is still on the blacklist.
dl.gmx.com.edgekey.net
dl.gmx.at.edgekey.net
dl.gmx.fr.edgekey.net
dl.gmx.co.uk.edgekey.net
dl.gmx.ch.edgekey.net
dl.gmx.ch.edgekey.net
more gmx domains that need a whitelist
Updated.
assets-bwbx-io.shared.bloomberga.com
stores design elements for bloomberg.com. Without this it is barely readable.
www2.ati.com.edgekey.net
blocks download.amd.com
dl.flathub.org
breaks flatpak updates.
assets.jimstatic.com
image.jimcdn.com
u.jimcdn.com
are required by websites hosted on Jimdo
asset.saturn.de.cdn.cloudflare.net
breaks the design of the retailer saturn.de
@Karol, this is a good source to run through our allowlisting script regularly, to help cut down on the false positives: 1Hosts/exclude_for_all.txt at master · badmojr/1Hosts · GitHub.
Another one: src/exclude.txt · master · malware-filter / urlhaus-filter · GitLab
Also consider something like step 2 here: faq · Wiki · malware-filter / malware-filter · GitLab. But I would want to make sure it doesn’t exclude trackers.
base.maps.api.here.com.edgekey.net
breaks the amazon shipment tracking for example
I downloaded the latest list from https://blokada.org/blocklists/ddgtrackerradar/standard/hosts.txt and sorted it and then removed duplicates. The list reduced from 23,000+ to 13,000. That means your host list has over 10,000 duplicates wasting both yours and client connections bandwidth, as well as CPU for programs that process it and need to check for duplicates. You may want to sort and “unique” the lines so as to save everyone including yourselves bandwidth and CPU time.
cdn.akamai.steamstatic.com
breaks the Steam Store
All hosts posted until here have been whitelisted. Also I added the false positive whitelists mentioned by @yokoffing, and added the sorting+dedupe step mentioned by @CleanupTheInternet . Thanks!
static.files.bbci.co.uk.edgekey.net
ichef.bbci.co.uk
break bbc.co.uk as they host images and css
open.spotifycdn.com
breaks the open.spotify.com webplayer
pbcdn1.podbean.com
cuts off podcatchers from getting images to podcasts hosted by PodBean.
slot9428.ebay.com.edgekey.net
breaks the ebay layout.
cs2001.wpc.edgecastcdn.net```` is a cname for
cdn.cms-twdigitalassets.com``` which breaks the layout of developer.twitter.com
player.odycdn.com
breaks videos on Odysee.com
media.forgecdn.net
breaks curse.com & curseforge.com and the Overwolf CF App as their images are hosted on this domain.
Updated.
js.static-cache.de
breaks image loading via lazy loading on junghanswolle.de
dl.web.de.edgekey.net
breaks attachment downloads on web.de
via dl.web.de
img.medscapestatic.com
img.medscapestatic.com.edgekey.net
breaks https://www.medscape.com/
In fact, how do we feel about removing all the image subdomains, such as:
im
im0 im1 im2 etc.
image-
image
image1 image2 etc.
images
images1 images2 etc.
imagesd
imageservice-assets-ht
imagesjoins.
imagesrv.
imagesxf.
img
img-
img1 img2 etc.
imgb
imgbp
imgcdn
imgcover
imgcy
imgd
imgg
imgix
imgopt01
imgproxy
imgs
imgs
imgs10
We just need to add these subdomains on this line, correct? @Karol, I’m sure you can code it where we don’t need to list all of these individually.
This alone will remove ~500 lines, almost all likely false positives.