I imagine you guys already know this but considering we’re up against the timeli...

stavros · on Dec 8, 2019

I would donate my IP/bandwidth to archive.org if I could run a scraper easily.

ignoranceprior · on Dec 8, 2019

You can install ArchiveTeam Warrior:

https://www.archiveteam.org/index.php?title=ArchiveTeam_Warr...

squarefoot · on Dec 8, 2019

Thanks! I never heard of that before; just like project SETI though for archival purposes.

What are the hardware requirements of that VM? I'm attempting to import it on my NAS4Free home NAS Virtualbox service which is the only machine I keep up 24/7 atm, but it takes forever to import. The hardware is very limited however (Atom D410 + a bit over 1GB RAM available), so I'm not sure it would succeed, but so far it loads forever, no errors given. I'd like to run it for this project to start contributing quickly albeit with limited hw before the deadline, then find better iron in the future.

MandieD · on Dec 9, 2019

I’m running the Docker image on the smallest Hetzner VMs, with 5 concurrent groups and 40 shared rsync threads per container, and 12 containers per server. Start one container, do docker top on it to make sure it’s pulling, then start the others one by one, taking a few seconds between each to avoid overwhelming the CPU. I’ve got 6 of those little VMs going, and have rolled up 4GB and 2800 groups worth in 6 hours.

After they settle down, they’re more memory than processor intensive. I’ve considered playing with the settings a bit, but thought it was more important to get a bunch of them running on a couple different VMs at different sites.

If I were really feeling fancy, I’d write a nice deployment definition for orchestrating this with microk8s...

kalleboo · on Dec 9, 2019

I'm running it on a Synology NAS (Celeron J3455), and the docker manager UI claims it's using 180 MB RAM and less than 1% CPU (and I just confirmed it's currently working on archiving Yahoo! Groups)

icebraining · on Dec 9, 2019

I don't find it processor or memory heavy, it's mostly doing a lot of IO (network and disk).

Avamander · on Dec 8, 2019

Unfortunately it doesn't offer a qemu-compatible image or an image that would work when converted, it's a shame and shooting itself in the foot.

scarejunba · on Dec 9, 2019

You should be able to trivially run the Dockerfile[0] on a standard Ubuntu image for qemu, should that be your only reason for desisting.

0: https://hub.docker.com/r/archiveteam/warrior-dockerfile/

BGZq7 · on Dec 10, 2019

An ova file is just a tarball containing an ovf file and a vmdk file. The ovf file is a text-based configuration format, so you can get a basic idea of the config you'd need for qemu. Then the vmdk can be converted with qemu-img.

I used the following qemu-img command:

    qemu-img convert -O qcow2 archiveteam-warrior-v3-20171013-disk001.vmdk archiveteam-warrior-v3-20171013-disk001.qcow2

I use the following to run the VM (I gave it some more memory because I have plenty to space):

    qemu-system-x86_64 -m 1024 archiveteam-warrior-v3-20171013-disk001.qcow2

I think they were doing some kind of port forwarding, but I didn't bother, and I just access the web interface using the VM's IP (you can hit alt-right arrow to go to a login prompt and log in as root then run "ip a" to see the IP).

Avamander · on Dec 11, 2019

I know, I did that and it didn't boot. Couldn't be bothered further and I ain't installing docker on my system, it's incompatible with my setup.