I imagine you guys already know this but considering we’re up against the timeline, I’d use the captcha solving service (easy to google yourself) and Luminati to distribute the IP addresses while swallowing my ethical qualms.
Thanks! I never heard of that before; just like project SETI though for archival purposes.
What are the hardware requirements of that VM?
I'm attempting to import it on my NAS4Free home NAS Virtualbox service which is the only machine I keep up 24/7 atm, but it takes forever to import. The hardware is very limited however (Atom D410 + a bit over 1GB RAM available), so I'm not sure it would succeed, but so far it loads forever, no errors given. I'd like to run it for this project to start contributing quickly albeit with limited hw before the deadline, then find better iron in the future.
I’m running the Docker image on the smallest Hetzner VMs, with 5 concurrent groups and 40 shared rsync threads per container, and 12 containers per server. Start one container, do docker top on it to make sure it’s pulling, then start the others one by one, taking a few seconds between each to avoid overwhelming the CPU. I’ve got 6 of those little VMs going, and have rolled up 4GB and 2800 groups worth in 6 hours.
After they settle down, they’re more memory than processor intensive. I’ve considered playing with the settings a bit, but thought it was more important to get a bunch of them running on a couple different VMs at different sites.
If I were really feeling fancy, I’d write a nice deployment definition for orchestrating this with microk8s...
I'm running it on a Synology NAS (Celeron J3455), and the docker manager UI claims it's using 180 MB RAM and less than 1% CPU (and I just confirmed it's currently working on archiving Yahoo! Groups)
An ova file is just a tarball containing an ovf file and a vmdk file. The ovf file is a text-based configuration format, so you can get a basic idea of the config you'd need for qemu. Then the vmdk can be converted with qemu-img.
I think they were doing some kind of port forwarding, but I didn't bother, and I just access the web interface using the VM's IP (you can hit alt-right arrow to go to a login prompt and log in as root then run "ip a" to see the IP).