O(n) algorithm to find longest palindromic substring w/o suffix trees

karamazov · on April 9, 2012

Does anyone know why the suffix tree algorithm is the one commonly shown as the solution to finding the longest palindrome? Manacher's algorithm is conceptually simpler, has better performance and memory demands, and isn't new (the original paper is from 1975).

mjn · on April 9, 2012

My guess is because suffix trees are covered in data-structures courses, and then finding palindromes comes up as a possible application. Whereas Manacher's algorithm is pretty specifically a palindrome-finding algorithm, so people only know about it if they're looking for palindrome-finding algorithms in particular (as opposed to example applications of data structures).

Although, it and variants have gotten some web press. A 2007 blog post was much passed around (http://johanjeuring.blogspot.com/2007/08/finding-palindromes...), which turned into some software (http://www.staff.science.uu.nl/~jeuri101/homepage/palindrome...). There's also a good exposition, and Python code, in another 2007 blog post (http://www.akalin.cx/longest-palindrome-linear-time).

I used the latter code to scan a local mirror of Project Gutenberg for word-unit palindromes, which are palindromes where words rather than letters are the units (http://www.kmjn.org/notes/word_unit_palindromes.html), which I why I know about it. It actually works for that with minimal modification, since it can operate on any Python sequence, not just strings.

bbi5291 · on April 9, 2012

I think the answer to this is that the suffix tree algorithm is conceptually simpler if you already know suffix trees. Personally, I find Manacher's algorithm very difficult to intuit (and I'm the one who wrote the article in this post in the first place). Its better performance is indisputable, though.

dhruvbird · on April 9, 2012

I should thank you for the brilliantly written article. All the claims have been mentioned clearly, and the reasoning strung together in a coherent manner.

I agree with the fact that getting the optimization isn't intuitive, but the basic O(n^2) on which the idea is based is almost trivial.

Sembiance · on April 9, 2012

Does finding substring palindromes have any practical purpose at all? Just curious.

jacquesm · on April 9, 2012

Yes, for instance in genetics:

http://johanjeuring.blogspot.com/2007/08/finding-palindromes...

rheide · on April 9, 2012

Such a long explanation for what could be explained in a couple lines of code.

jgrant27 · on April 9, 2012

Clojure vs. Common Lisp http://news.ycombinator.com/item?id=1016566

VDegesys · on April 9, 2012

I'm honestly just curious and I could be missing something trivial... but how does the "simple asymptotically optimal solution" take into account an even length palindrome? in the sense of "deed" where there are never matching letters to the left and right of the interesting position...i think it would still be O(n) but I just don't see the algorithm taking it into account...

bbi5291 · on April 9, 2012

Quoting the article: "Every palindromic substring has a centre. For an odd-length substring, this is its middle character; for an even-length substring, this is the imaginary space between its two middle characters. Call each character and each space in the original string a position. Then there are 2N+1 positions in any string of length N (to simplify things later on, we assume there is a position just before the first character and one just after the last), and any longest palindromic substring must have one of these as its centre."

VDegesys · on April 9, 2012

ah perfect. thanks... skimmed right over it... apparently easter wore me out...

EricDeb · on April 9, 2012

site is down? I hope this isn't just an O(N) algorithm using Suffix Arrays

mjn · on April 9, 2012

From Google Cache, this one exploits the theorem (which the page proves) that the longest palindromic substring of a string S is the longest common substring between S and its reversal S'. Since there are linear-time longest-common-substring algorithms, finding longest palindromes therefore is linear time. It doesn't actually say which LCS algorithm they'd use, since the page is more about the proof. They even admit that it probably isn't the best approach in practice.

shasta · on April 9, 2012

S = "abcdefdcba" S' = "abcdfedcba"

So I suppose substring in this context is defined to be a subsequence of the string (not necessarily consecutive characters)? Site is down...

EricDeb · on April 9, 2012

Ooh if it's a subsequence I am not aware of the solution. For a substring though, you can just concatenate S and its reverse , find the suffix array and corresponding height array in O(n), and parse through the suffix array looking for suffixes with high height array values where one suffix is from S and the next suffix is from its reverse.

Generally speaking anything you can do with a suffix tree you can do with suffix arrays... though oftentimes it is more complicated

dhruvbird · on April 9, 2012

substring is not subsequence in the context of this problem

dhruvbird · on April 9, 2012

Nope, see the section "A simple asymptotically optimal solution" - it doesn't use Suffix Arrays.