The Limits of Machine Learning

mturmon · on Sept 26, 2016

Note to casual commenters: the precise real-world implications of the NFL theorems proved by Wolpert and collaborators have been difficult to appreciate, even to people well-versed in the computational learning world.

Starting point: http://www.santafe.edu/media/workingpapers/12-10-017.pdf

where we read: "However, arguably, much of that research has missed the most important implications of the theorems."

eli_gottlieb · on Sept 27, 2016

Well, the real-world implication (at least, as taught to me when I took my ML class) is that you have to make some assumption about P(f). If you assume that "reality" has to "spend energy" on building complicated f's ("reality functions" when we learned it), then you get usual probabilistic assumptions about function learning and everything goes back to normal (as we usually experience it in the real world). If you make other sorts of assumptions, you get many popular and useful ML algorithms.

It's all a matter of finding prior assumptions specific enough to bypass NFL, while still general enough to encompass useful real-world tasks.

Animats · on Sept 26, 2016

Is that a corollary of the information theory theorem that for any lossless compressed representation, there must some data pattern for which the compressed representation is bigger?

xapata · on Sept 26, 2016

Anything with more information/entropy requires more space to store.

Animats · on Sept 26, 2016

Right, not too helpful.

Also, the machine shown in the picture isn't even a computer. It was a special-purpose machine used to read microfilms of mark-sense Census forms and write the results on tape. (I once had a summer job at Census HQ in Suitland MD, and saw the FOSDIC machine.)

There are fundamental limits to hill-climbing. So far, nobody has something that just keeps running and continues to get better. Hill-climbing maxes out after a while and stalls.

We still need another big idea after deep learning and machine learning in its present form. No idea where that will come from. Anybody see anything on the horizon?

AndrewOMartin · on Sept 26, 2016

Yes.

The critiques of AI from Hubert Dreyfus have stood the test of time, those who want to understand or challenge them directly can read What Computers Can't Do (1972, 1979), or even better the updated reprint What Computers Still Can't Do (1992). He's a Heideggarian Philosopher but all you need to know is that modern AI is ignorant of vast swathes of 20th Century investigation into the human mind, and state of being.

Hence, I think the big idea you're talking about is in AI that takes Hubert Dreyfus's critiques seriously.

Luckily for anyone reading about this for the first time, that process has already started. Dreyfus wrote a 2007 paper on the successes and failures of the first few steps of what he called Heideggarian AI with the snappy name of "Why Heideggerian AI failed and how fixing it would require making it more Heideggerian".

The "fixing" refers to work of a Neuroanatomist, with a suitable philosophical background, called Walter Freeman III, and is broadly described in the paper, but properly investigated in Freeman's (also eminently readable) book How Brains Make Up Their Minds (2000).

A note of caution, you'll be introduced to concepts that blur the line between body and environment, subject and object, intention and influence, and eventually things like relinquising your belief in causality and an objective "out there" universe (or at least any value in such a belief), all whilst staying perfectly scientific and evidence based.

Finally, bear in mind that if we do create an intelligence worthy of the name, we have reason to believe it will take about 18 years of "raising" by two adult humans, after which it will want to do its own thing, and not your dumb image classification tasks.

Animats · on Sept 26, 2016

If you're going to obsess over the nature/nuture thing, you have to consider the counterexample - animals that are born ready to go. These are called precocial species. Most of the large grazing mammals, including all the equines, are precocial.

You can see this by watching the first day of life of a horse. Within the first hour, the foal stands up by itself. This is a complex coordinated operation for an animal with such long legs. The sequence for doing this is dynamic (not statically stable and can't be done slowly), and clearly built-in, but requires tuning. The foal may fail the first few times, but eventually gets up on the long spindly legs and wobbles. The first few steps are tiny and cautious, but the nervous system calibrates rapidly. Stable walking is achieved quickly.

Trotting appears after a few more hours. Within a day or two, a newborn foal can run with the herd. At that point, all the locomotion functions are working - balance, coordination, visual foot placement, obstacle detection, and collision avoidance. That's a lot of capability. Having worked on both automatic driving and legged robot balance, I know how hard that is.

This demonstrates that mammal brains don't come up blank and learn. There's a lot of hard-wired capability.

(I sometimes comment on mobile robotics that the main thing is getting through the next 15 seconds of life without screwing up. If you can do that, you can then add task-oriented back-seat driving to get something done, and that's the easy part.)

ilaksh · on Sept 26, 2016

I just read a review of "How Brains Make Up Their Minds. Thanks for mentioning that. It looks like a good book with mostly correct information (based on the review).

From the summary, I can tell you that researchers in fields such as AGI, deep learning, robotics, etc. have absolutely been working from many if not all of Freeman's assumptions for years and almost all (if not all) of that has been integrated into various research programs and systems. Freeman's pragmatist view is now the most popular.

Of course, all of Freeman's ideas aren't _usually_ together in every one of these systems or the conceptualization of them, but there are at least a few that have most of them.

Certainly AGI researchers are aware of the concept of higher-level abstractions being formed at root on the basis of sensory input and action output. And most of the recent serious AI research such as pretty much any NN for example demonstrates the idea of meaning from global patterns.

These are some interesting AGI videos in case people haven't seen them. https://www.youtube.com/channel/UCCwJ8AV1zMM4j9FTicGimqA/pla...

AndrewOMartin · on Sept 26, 2016

I strongly object to the ideas in that book bring described with the terms such as "higher-level abstractions", "sensory input and action output", and "meaning from global patterns".

As I read it, they were precisely the notions Freeman was arguing against.

There is no "input" when the history of a being, it's current sensitivities and orientation in the environment are all inseparable and mutually defining.

There are no higher level abstractions, and none needed, when you're so poised in a situation with your whole brain and body that any appropriate stimulation is already meaningful.

Freeman, for me turned thought upside down. I.e. we don't go, "I've detected that as being food, so I may eat it", rather we go "I've noticed I interact with all these things in a similar way (e.g. eating) that's how I recognise them as related".

ilaksh · on Sept 26, 2016

Your conclusion aligns with my statements. So I think you are interpreting what I wrote incorrectly.

Animats · on Sept 26, 2016

"The critiques of AI from Hubert Dreyfus have stood the test of time..."

This is the guy who in 1965 said "no computer can play even amateur chess". He was right. It took a lot longer than expected for computers to get good at chess. But they did. Now they're better, a lot better. Chess.com now says "any decent chess program could easily beat the world's top humans". No need for a supercomputer; Komodo costs $59.96 and running on a laptop will trounce any human.

Dreyfus is too much into "humans are special snowflakes", and he writes book-length arguments which assume you agree with that.

WilliamDhalgren · on Sept 26, 2016

I thought the point of the article wasn't that a particular approach to learning has intrinsic limits, but that underdetermination of the pattern to be learned by the available data poses fundamental limits to what any learner could hope to discover.

The example motivating the discussion in such; even a perfect learnenr would hardly have any criteria to prefer one solution to another. Guess simplicity arguments could play in this example, if they weren't so vague, but even that doesn't happen in the general case.

That has to be true, for machines and humans alike. So ofc there's motivation to think about what comes after deep learning - but this isn't it, for that too, whatever it is, would be just as hopeless in examples such as there.

sgt101 · on Sept 26, 2016

I think that the work on Leonardo at MIT showed how interaction could provide an alternative way of training an agent, and I think that the ongoing development of SOAR shows how learning can be done as a series of capability creations/integrations by an agent.

It interesting to me how both efforts seem to be peripheral to the main focus of the Machine Learning and Autonomous Agents communities.

jeyoor · on Sept 25, 2016

The author seems to say that the no free lunch theorem (NFL) indicates that creating a "Universal Learner" (or artificial general intelligence) is an impossible task.

I disagree based on the following:

1. I think NFL's definition of a universal learner is broader than the definition used by the average AGI researcher.

In practice, we are interested algorithms producing behavior at near-human levels of intelligence, not absolutely universal learning algorithms.

2. NFL does not seem to directly address the overall effectiveness of applying a combination of algorithms based on real-world experience to everyday problems.

xapata · on Sept 25, 2016

I was expecting a discussion of the limits of statistical inference, but instead read a hand-wavy bit that only briefly mentioned a priori knowledge and without using that term.

xg15 · on Sept 26, 2016

Somewhat off-topic, this article mentions a property of samples that has puzzled me for a long time. Maybe someone can shed light on that?

[...] Field A would receive Fertilizer 1, Field B would receive Fertilizer 2, and so on.

But as Fisher pointed out, this type of experimentation was doomed to produce meaningless results. If the crops in Field A grew better than those in Field B, was that because Fertilizer 1 was better than Fertilizer 2? Or did Field A just happen to have richer soil?

[...] The way around the problem, Fisher concluded, was to apply different fertilizers to different small plots >at random<. [...] On average, the soil under Fertilizer 1 ought to look exactly like the soil under Fertilizer 2.

The article talks at length about how randomisation was the revolutionary new thing that Fisher introduced. Yet, in the given expample, it's the repetition of the experiment with different combinations of fields and fertilizers that does the trick, isn't it? It seems to me I should get the same results if I repeated the trial 50 times with Fertilizer 1 on field A and 50 times with 1 on field B.

So why is adding unpredictability (randomness) so important?

xg15 · on Sept 27, 2016

... I didn't mean it to be that off topic. This got posted on the wrong thread, I'm sorry. If any mod could remove this, I'd be grateful.

graycat · on Sept 26, 2016

> The Fundamental Limits of Machine Learning

Depends on the largely mathematical assumptions can bring to the data. What can be done with the variety of assumptions is illustrated with details beyond belief in the QA section of most research libraries.

For more, the OP has

> Almost all of the learning we expect our computers to do—and much of the learning we ourselves do —is about reducing information to underlying patterns, which can then be used to infer the unknown.

Ah, NOW I see! The OP has stated a relatively narrow problem.

E.g., consider arrivals at HN: Over each 30 minutes or so, they about have to be a sample path of a Poisson process. Why? The renewal theorem, as in W. Feller's second volume. Can say that without looking at "patterns" in the data, indeed, without looking at any data at all.

Then from knowing that the arrivals are a Poisson process, there is a nice stream of results can get right away, without the data and even more with the data. E.g., the sum of two independent Poisson processes is another Poisson process. Then more generally can have a continuous time, discrete state space Markov process subordinated to that or a related Poisson process. From that can have some, say, network queuing calculations good for capacity planning, optimization of capacity planning, stochastic optimal control, anomaly detection, etc. Have a good shot at using the strong law of large numbers and the martingale convergence theorem.

Can say nearly all of this, and more, without looking for "patterns" in the data or looking at the data at all. Again, looking at the data can say still more.

There's a lot in the QA section of the library!

szemet · on Sept 26, 2016

I don't understand why it says "2 * (5 + 1) = 12" fits equally well as "x + 5 + 2 = 12" to the original pattern:

"5 + 2 = 12"

As it changes one of the terms of the original addition (2 to 1). The second solution fits better by my judgment. Changing the the puzzle, seems like a bit of cheating...

ricksplat · on Sept 26, 2016

I think what he is saying is, if you remove preconceptions, e.g. that '+' means "addition" then you can infer that it means "multiply" and that the "+ 1" is implied, based on the answer.

As a human, using our prior experience we will most commonly say that + means addition and will make that an axiom of our solution, which involves (for me at least) adding the result of the previous line to the sum of each line.

    8 + 11 = 19 [ + 21 ] = 40

As a machine (or as a mathematician thinking with more flexible abstractions) the numbers are intractable, but the symbols are and perhaps it is more "logical" to look at each line separately and change the meaning of the symbols.

    8 * (11 [ + 1 ]) = 96

The point is that humans have a different set of logical precepts to machines.

IshKebab · on Sept 26, 2016

No it doesn't. It changes the + to * and adds +1.

ppod · on Sept 26, 2016

The universe is no narrow thing and the order within it is not constrained by any latitude in its conception to repeat what exists in one part in any other part. Even in this world more things exist without our knowledge than with it and the order in creation which you see is that which you have put there, like a string in a maze, so that you shall not lose your way. For existence has its own order and that no man's mind can compass, that mind itself being but a fact among others.” Cormac McCarthy

jkuria · on Sept 26, 2016

IN response to some comments below, Here's a good article on the differences between ML, Deep Learning and AI:

https://blogs.nvidia.com/blog/2016/07/29/whats-difference-ar...

wrong_variable · on Sept 26, 2016

How is this article on nautil.us ? Did the author just read the wikipedia.com page on Machine Learning ?

There is an entire field in ML called unsupervised learning. Labelling data that do not have labels attached to them.

Its not a Fundamental (uggh) limit, I am not sure if the author even knows what Fundamental means, its not like the halting problem, or heat death of the universe.

ML is also a very young field with poor mathematical understanding, optimism is the best way forward. Christopher Columbus didn't discover an entire NEW WORLD because he had a pesimistic attitude towards his ideas.

Its going to take time, but historially, we are rapidly learning about how the human brain and intelligence works - similar to how we rapidly learnt a lot of physics in the 20th century.

j1vms · on Sept 26, 2016

In popular discussion, Deep Learning has unfortunately become synonymous with Machine Learning, which also, as of late, has itself become synonymous with AI.

Now, this might sound facetious, but to help avert another AI winter (maybe 10 years down the road), we need to be loud and vocal, educating at the very least the investing strata of society as to hyponymy/hypernymy relationships between these terms.

ankurdhama · on Sept 26, 2016

It is bound to happen when researchers uses words like "Machine" and "Learning" to describe their field. Why not use words that actually describe it, like function approximation.

interdrift · on Sept 26, 2016

Because you could say learning is function approximation for the public.

pron · on Sept 26, 2016

1. Just because someone decided to use the words "intelligence" and "neural" when describing a class of statistical clustering algorithms often based on backpropagation of errors doesn't mean these algorithms have anything to do with the brain or intelligence, and if they do, the relationship is not necessarily direct and immediate. Speaking about the two as if the connection is clear only muddles our understanding. It's important to remember that these terms are meant to capture the imagination, not to describe scientific knowledge.

2. Even if there are proven limitations to those algorithms, and even if those algorithms are related to human intelligence, so what? We're not sure what intelligence is, and it is easy to show that it is not "a general ability to solve problems". Humans are great at solving some problems and pretty terrible at others. Obviously, the "intelligence algorithm" (whatever that means) has some serious limitations, and is not so good at some things. For example, human intelligence doesn't seem helpful in approximating solutions to computationally hard problems. It is obvious that intelligence (or any algorithm) has its limitations.

3. Understanding the limits of a field and having optimism are two separate things. A few good impossibility (or infeasibility) theorems help serve as a map, so you can be optimistic while knowing a bit more about your surroundings, rather than being optimistic while fumbling in the dark.

joe_the_user · on Sept 26, 2016

We're not sure what intelligence is, and it is easy to show that it is not "a general ability to solve problems". Humans are great at solving some problems and pretty terrible at others.

Now wait a second. I would say humans are better at solving some problems directly and computers programmed by humans are better at solving other problems.

However, a computer with a single, fixed program alone will choke completely at some problems and humans are far more robust at finding a solution or at least "dealing" with any problem whatsoever one throws at them.

In the end, you're right that we don't know what intelligence is. And so just about any description is going to be somewhat tautological but "general problem solving ability" seems relatively less tautological than other concepts - ie, "general problem solving ability" seems about right for some value of "general" which we'll have to determine as we go along.

pron · on Sept 26, 2016

> a computer with a single, fixed program alone

I'm not sure what you mean by a "fixed program". Is a statistical clustering algorithm not a "fixed program"? Even the human brain is running some "fixed program", as we cannot reprogram our brains to efficiently run general sorting algorithms on "bare neurons".

> "general problem solving ability" seems about right for some value of "general"

It may be more useful than nothing at all, but I don't see how it conveys too much information. Clearly, a universal Turing machine has a "general problem solving ability" for much more general values of "general". So I'm not sure how much information "general problem solving" conveys. More importantly, why do we even need a definition? Statistical learning is very effective at solving some useful problems. Why do we need to philosophize about the relationship between those algorithms and human intelligence before we have more data?

TheOtherHobbes · on Sept 26, 2016

Because it would be interesting to solve a wider class of problems. And having a deeper insight helps define which data to look at.

Having said that - I'm always bemused that AI is supposed to compete with a hypothetical single human brain, and that the Turing Test is supposed to prove we have AI when we can build a brain that appears human.

Learning is a social activity, not an individual one. Individual brains are useless without social training and access to culture.

And some humans are actually quite stupid.

A machine that passes the Turning Test for a hypothetical middle-of-the-bell curve human isn't so interesting a thing.

djit · on Sept 26, 2016

I wonder if trying to emulate"human intelligence" is the way to go. What if we could develop a synthetic form of intelligence distinct from our ability to analyse and solve problems?

If I had to build an AI, I would like to "train it" by pitching it against other AIs. Imagine an open AI network where bots will learn by challenging each others. Does such a thing exist?

pron · on Sept 26, 2016

> What if we could develop a synthetic form of intelligence distinct from our ability to analyse and solve problems?

Until we know what intelligence is, I'm not sure we could classify something as being distinct from human intelligence. If AI is any computational model that can learn (i.e., adapt its parameters as a response to inputs to solve various problems), then evolution would qualify as well (well, maybe it is).

But currently, "AI" is a marketing or science-fiction term, not anything we define even remotely rigorously. If we use the Turing test as a definition to AI (i.e., intelligence is the quality of an agent that is indistinguishable from a person through communication), we are not much closer today than we were forty years ago.

Science fiction can inspire science, but using sci-fi terms as if they were scientific terms is confusing, and I think we should similarly leave marketing terms to marketers (over the past decades, marketers have assigned the name AI to very different algorithms solving very different problems).

ilaksh · on Sept 26, 2016

> where bots will learn by challenging

Many such things exist and have existed for years, just google it.

Also google AGI (artificial general intelligence) or see the videos I linked to on youtube in this thread.

Really the questions you are asking are sort of interesting but it indicates that you are completely unfamiliar with the decades of AI and AGI research that have been done.

djit · on Sept 26, 2016

Never claimed any expertise on AI research. Thanks for the suggestion nonetheless.

zitterbewegung · on Sept 26, 2016

Actually it appears that they read that supervised machine learning has limitations according to the no free lunch theorem and applied it to all of machine learning. See http://www.no-free-lunch.org/

tree_of_item · on Sept 26, 2016

Christopher Columbus didn't discover "an entire new world", he was just one of the first Europeans to land on a continent that had already been there, with plenty of people, for thousands of years.

visarga · on Sept 26, 2016

Well, it was a discovery, but just for Europe. American natives discovered Europe as well.

Chris2048 · on Sept 29, 2016

Unless American natives knew of Europe, it was a trivial discovery. When one continent discovers another, the relevance is the new link between them, and interactions between the two economies.

If we discover aliens on a new planet, the fact that the aliens knew about themselves before that doesn't change much - first contact may be when they discover us too.

But let's not pretend this isn't slightly about political correctness, and sensitivity over colonialism, which ruins the objectivity over the subject.

katherineduh · on Sept 26, 2016

It wasn't even a discovery for Europe because Vikings had already been to the Americas in the 10th century.

nkozyra · on Sept 26, 2016

Yeah, this article was not well-researched or useful at all. As mentioned upthread, it seems focused on a very specific type of supervised learning that has since gone through a major leap in usefulness in the last few years.

The problem posited at the beginning of the article is in fact one of the example applications in a couple of ML courses/textbooks.

ilaksh · on Sept 26, 2016

AI/AGI is so interesting, everyone (including me) wants to have their own unique take on it. And comment. Even though we are not really that familiar with the field.

I think that the group of researchers who have been calling their work AGI should get a lot of credit, and their research should be a starting point for discussions. Rather than people just spouting off as I am about to do.

Here are some AGI videos https://www.youtube.com/channel/UCCwJ8AV1zMM4j9FTicGimqA/pla...

I think that some type or combination of (deep or something) neural networks, _in_ a general intelligence framework with things like attention, (virtual?) embodiment, etc., may be enough for us to be able to more or less emulate (general) human capabilities and behavior. That's my guess. The biggest issue is slow learning or requiring quite a bit of data. If typical artificial neurons aren't enough, there are actually some promising advances in spiking neural networks some of which are actually able to perform quite accurately and learn much more quickly. So that is another possibility. Can't be sure. I think we should be optimistic that it doesn't require too many more major breakthroughs (if any).

The reason that I think some (many?) people are sooo skeptical still is that deep down they may believe that the cause or explanation for _animalis_ (animus?) is somehow uniquely human or magic. What I mean is the thing that makes animals and people seem (or be) alive and conscious. This is somewhat related to the concept of https://en.wikipedia.org/wiki/Panpsychism which my understanding is somewhat more popular in Asia.

I think that with existing techniques, maybe some type of deep neural net, combined with more dexterous and anatomically correct, dynamic and sensory-integrating robots, we will shortly (if not already) have robots that do seem to be quite alive. Consider a lizard. How many robots do we have that can really emulate the dynamic and lithe behavior and interaction of a lizard? Perhaps none. Our robots are quite slow and generally have limited freedom of movement. I bet that if we did a good job of emulating most of the entire complex anatomy of a lizard with some type of robot (including the breathing) (maybe use some type of EAP muscles like https://www.seas.harvard.edu/news/2016/07/artificial-muscle-...) and then came of up with a way to train its behavior generation as based on a deep neural network from detailed videos of interactions with human handlers, people would say "that thing is alive" and change their minds about the reality of even human-like AIs in the next few decades.

bilbobeer · on Sept 26, 2016

The title of this article should really be "The fundamental limitation of 1950's Perceptron style ML".

These days ( 2016 ) there are 1,000's of algorithms, all tuned for a specific problem, ... image recognition, speech, music transcription from audio, text 'learning' say the bible to generate automatic text. Algo's have names like CNN, RNN, ... again 1,000's.

All ML is a 'hack', every algorithm has to be tuned and dialed in to get the coveted 99% 'confirmation'.

It would be easy to fine tune a machine for both answers to the problem described, likewise a human expert might as well see the two ( or maybe more ) correct answers. Then another algorithm could be trained to choose which 'correct' answer is best.

A universal machine, that requires an infinite number of hacked machines is just another 'tower of babel', not unlike the WWW ( port 80/HTML ) of today.

The real problem with ML is the holy-grail of 99%, which means that 1 of 100 innocent people go to prison, or 1 in 100 children die from robot-cars.

A society that allows the technical ( rich elite ) to govern a society that accepts 99% or even 99.9% to live and to hell with 1% or 0.01% this is the real problem with Judges, Executioners, and cops that make decisions fed by Google, Facebook, and all our other favorite fronts controlled by the CIA/NSA.