No I'm getting down-voted because hackners is notoriously pro corporate medium - of course people don't care about public data and data freedom here.
> It's super annoying that crawlers are so awkward to write these days, and I miss the days when they worked better.
It has never been easier to write crawlers witht he exception of purposfully built in barriers. Just check youtube-dl
> I don't disagree, but I also think we may be asking to keep model T's or gasoline driven 1-person bikes around. These technologies made more sense once, but make much less sense now.
What are you on about? For example to get around some crawler protections you need to execute js with some specific stack of libs. Distributing crawler.py vs distributing a whole stack is much more difficult.
Your logic makes absolutely no sense. In the web there is no distinction between who is behind the ip address. It's a net of ip addresses and headers, right? If I'm asking for your resource that you choose to serve publicly I only need to give you my IP and some http cruft, right? So now it turns out you don't want to serve _some_ ip addresses.
Now you have to introduce an extra layer that is not part of the web - a layer that is incompatible with your goal. You need to use javascript to fingerprint your client - except you know what? client is the one executing your fingerprint code so they can send whatever they want to you.
I've never seen more idiotic medium. On one hand I get job security on the other the web is absolutely broken by complete bafoons who have zero logical capabilities.
> It's super annoying that crawlers are so awkward to write these days, and I miss the days when they worked better.
It has never been easier to write crawlers witht he exception of purposfully built in barriers. Just check youtube-dl
> I don't disagree, but I also think we may be asking to keep model T's or gasoline driven 1-person bikes around. These technologies made more sense once, but make much less sense now.
What are you on about? For example to get around some crawler protections you need to execute js with some specific stack of libs. Distributing crawler.py vs distributing a whole stack is much more difficult.
Your logic makes absolutely no sense. In the web there is no distinction between who is behind the ip address. It's a net of ip addresses and headers, right? If I'm asking for your resource that you choose to serve publicly I only need to give you my IP and some http cruft, right? So now it turns out you don't want to serve _some_ ip addresses.
Now you have to introduce an extra layer that is not part of the web - a layer that is incompatible with your goal. You need to use javascript to fingerprint your client - except you know what? client is the one executing your fingerprint code so they can send whatever they want to you. I've never seen more idiotic medium. On one hand I get job security on the other the web is absolutely broken by complete bafoons who have zero logical capabilities.