That's a separate project: - https://github.com/fake-name/ExHentai-Archival - ht...

someuser18 · on Aug 24, 2016

At least Ex supports torrents and also has some custom p2p software which you can run (serves content) from which data can be siphoned off.

And what is served through their website is resized. So web-scraping is an inferior approach.

fake-name · on Aug 24, 2016

You seem to be assuming

1. I'm scraping the resized galleries.

2. I don't have the Hath perk that makes the galleries full sized.

3. I don't have a phash-based fuzzy image deduplication system on top of all this (see https://github.com/fake-name/IntraArchiveDeduplicator). It's main purpose is to deduplicate manga (https://github.com/fake-name/MangaCMS).

ycombinatorMan · on Aug 27, 2016

Jesus, your projects are massive. Does your job involve working on these or are these just side things?

fake-name · on Aug 30, 2016

It's all entirely hobby things.

andai · on Aug 26, 2016

Oh my god. Can you share any results?

fake-name · on Aug 30, 2016

The project never went anywhere, unfortunately, and I haven't had time to look at it recently.

I have huge, uh, "datasets" around still, though.

clevernickname · on Aug 24, 2016

You're doing god's work.