It's a neat implementation for a very annoying (and common) scenario. I've used various font ID tools in the past and always found them rather limited. The vast majority are manual ones that ask you to identify particular characteristics of the font in question and then go from there. While that might be helpful, it's also pretty limited. Most of the time, I wound up posting an image to a typography forum to get an ID.
A few sites like My Fonts have an API that lets you generate font samples.[0] You could probably use that to automate generating your font database. Throw in a few other APIs (Google Fonts[1], Typekit[2], etc.) and some other type stores--along with some of the foundry websites that don't sell their fonts on other sites (Hoefler & Co., cough)--and you could build up a very rigorous database/recognition tool in short order.
My Fonts had a tool for doing this back in (at least) 2009... I remember because I tried to run the burning "NERDS" sign from "Revenge of the nerds" through it. Didn't work very well though: I had to resort to human intervention. http://www.mrspeaker.net/2009/01/21/revenge-of-the-font-nerd...
Keen for an online version of this project so I can see if it fares any better!
Please stop using "AI" to describe simple algorithms like this. This has nothing to do with the field of artificial intelligence. The code just extracts individual characters from the image and compares them visually to a prepared database of known fonts, using the hamming distance. You wouldn't call OCR an "AI algorithm", would you?
Edit: "nothing to do" is not accurate, but AI is definitely not an appropriate term to use here
Yes, I know it's a hard problem. Tesseract (the OCR library that is used in this project) is in development since 1985, but even google doesn't dare to call it an AI algorithm.
OCR is of course a part that is required to build an artificial intelligence, but the term "AI" is just to overused nowadays and calling every single algorithm capable of recognizing something in some data is poison to the field itself. Solving AI is a huuge and daring problem and we should use the term appropriately.
This is a canonical AI problem. This is a task usually performed by human beings that was found to be excellently performed by computers. It falls within the grand field of computer vision. It is a solved and trivial problem nowadays, but is sure is AI.
This problem is nice because unlike other datasets i think you could automate the creation of the data. I'm thinking you could generate a couple million photos with text captions all with known fonts.
Yes. You wouldn't need to worry about breaking it down to individual letters, though. Not necessarily. You would need pages and pages of writing from the individuals.
The term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds. Understanding the shape in a image, reading the text from a photo can also be considered AI.
The key term here is mimics - to be classified as intelligent, the an algorithm should be able to significantly mimic human level intelligence in your domain. For example if your algorithm could read human handwriting and predict what font it was written in, or maybe what font it's closely related to, is something that's closer to the AI bandwagon.
It uses OCR then compares what it find with everything in the database. It doesn’t “learn” anything from what it finds; in fact you can’t even train it because it uses a static database.
Yes, it is a necessary part of an AI, but I still wouldn't dare to call it an "AI Algorithm". Overuse of the term poisons the actual AI research field, it's only use is to aid click bait.
A few sites like My Fonts have an API that lets you generate font samples.[0] You could probably use that to automate generating your font database. Throw in a few other APIs (Google Fonts[1], Typekit[2], etc.) and some other type stores--along with some of the foundry websites that don't sell their fonts on other sites (Hoefler & Co., cough)--and you could build up a very rigorous database/recognition tool in short order.
0. https://dev.myfonts.com/#font_sample
1. https://developers.google.com/fonts/docs/developer_api
2. https://www.adobe.io/apis/creativecloud/typekit/docs/overvie...