I've posted about this many times, but it's worth noting that the Machine Learning course on Coursera is CS229A(plied), which is different from CS229 the one most Stanford students take. The Coursera one is more useful if you want to apply machine learning; the Stanford one linked by the poster is more useful if you want to enter the field. The Coursera one glosses over a lot of the mathematics behind the algorithms; the Stanford one delves into the mathematics supporting the algorithms.
i did take the cs-229A when it was first offered on coursera, and your comment is exactly right. most of the mathematics was glossed over, and the focus was on not "worrying" about it, but on implementation (of various algorithms) using octave, and observe the results. which for neural-nets mostly boiled down to some slightly complicated matrix-multiplication.
having said that, it was an excellent overview of a broad spectrum of ml techniques, and i for one, would heartily recommend it to anyone with high-school background in maths, and a deep interest in the field. supplementing the material with simon-haykin's text or christopher-bishop's (most excellent) book, would make it slightly tougher.
unfortunately, due to some time constraints i could not partake on geoff-hinton's wisdom on deep-learning. would you happen to have material for that stashed somewhere ?
afaik, octave uses blas, which in turn should be using strassen's for it's sgemm, dgemm etc. computations. at least it would be very surprising if it didn't...
Having done a very small amount of research into it I actually think it would be surprising if it did use Strassen's. EDIT: At least frequently, I assume it would use Strassen's for larger matrices much in the same way as introsort works.
Apparently it's becoming irrelevant/unwieldy as computing power increases.
"Practical implementations of Strassen's algorithm switch to standard methods of matrix multiplication for small enough submatrices, for which those algorithms are more efficient. The particular crossover point for which Strassen's algorithm is more efficient depends on the specific implementation and hardware. Earlier authors had estimated that Strassen's algorithm is faster for matrices with widths from 32 to 128 for optimized implementations.[1] However, it has been observed that this crossover point has been increasing in recent years, and a 2010 study found that even a single step of Strassen's algorithm is often not beneficial on current architectures, compared to a highly optimized traditional multiplication, until matrix sizes exceed 1000 or more, and even for matrix sizes of several thousand the benefit is typically marginal at best (around 10% or less)." - http://en.wikipedia.org/wiki/Strassen_algorithm
However I do strongly suspect that this is a Galactic algorithm.
In fact, judging from this (http://www-cs.stanford.edu/~virgi/matrixmult-f.pdf), the Big O of time taken for matrix multiplication has been decreasing steadily and without much fuss ever since Strassen.
Again, I suspect these are even more galactic algorithms than strassen's is.
Right. Basically, Strassen's is not friendly to the cache - any improvement you get in the asymptotic behavior is usually swamped by the cost of cache misses.
In that vein, the naive implementation (O(n^3)) implementation is also not cache friendly - if you flip the inner loops you will get far better performance (in row major languages).
But I mainly replied because I love the concept of "galactic algorithms" and Regan&Lipton (orignators of the idea and name).
EDIT: the BLAS routine in question is dgemm.f (double general matrix multiply) and is easily googled so I won't paste it here. No Strassen's in sight.
For a second, I misread this as "Machine Learns Course Materials". Must make this happen - wonder if I can use these machine learning course materials to help.
https://class.coursera.org/ml/lecture/preview