Multiple images can be used to compute a 3D point cloud. This is computer vision...

tim333 · on Aug 15, 2017

Incidentally, Musk's take: "The whole road system is meant to be navigated with passive optical or cameras and so once you solve camera vision then autonomy is solved if you don't solve vision it's not solved so that that's why our focus is so heavily on having a vision neural net that's very effective for road conditions." https://www.youtube.com/watch?v=gv7qL1mcxcw&feature=youtu.be...

ariwilson · on Aug 15, 2017

Why would you want to limit yourself to passive cameras and make your life harder? This is like limiting yourself to flapping bird wings to make airplanes.

erikpukinskis · on Aug 16, 2017

No, it's like limiting yourself to using skis to move down a ski slope. He's right: the roads are design to be navigated using vision. Signage, regulations, paint, curbs, etc. There's no proof that you could safely navigate the roads with LIDAR, but we prove every time we drive that you can do it with vision.

And sure, there might be a better way to get down a ski slope, but skis would be a pretty good starting point. And they guarantee you don't end up in an impossible situation because you're doing things a fundamentally different way than the system expects.

makomk · on Aug 16, 2017

They're designed to be navigated using human vision, which has very different characteristics in terms of dynamic range, resolution, processing pipeline, inferring details about the scene based on past experiences, etc than machine vision.

aianus · on Aug 16, 2017

Because not everyone can afford to spend $20k on extra sensors that make the car 1% safer. And holding back autonomous cars until they're perfect can kill more people than near-perfect autonomous cars. It's an economic tradeoff like any other.

stouset · on Aug 16, 2017

His thesis is that relying on cameras makes it easier, since the entire preexisting road network is literally designed around optical navigability.

Adding other sensors isn't free. Every minute you spend on developing techniques to process inputs from other sensors, not to mention integrating their conclusions with that of other sensors, is time, money, and energy you could have used to improve your optical system.

I'm not saying I necessarily agree (though I find his position intuitively compelling), but he clearly thinks that it's easier, faster, and cheaper to bring an optical-only system to a point of reliability than it is to bring a mixed-sensor system to the same point.

loceng · on Aug 15, 2017

It's interesting if you consider we have 2 eyes [cameras] and we drive under all conditions, and under bad driving conditions - if you're sane - you slow down or even completely stop and pull over with your 4-ways on. When I've been in very heavy rain downpours on the highway, it feels like I'm only driving because I'm able to follow the flow of lights in front of me as having a bunch of guidance points - autonomous vehicles could likely do a much safer job of this..

canoebuilder · on Aug 16, 2017

So just from reading the pulled quote, is he saying current road users, i.e. humans, navigate using a passive optical system, i.e. our eyes take in photons, they don't emit lasers, but our eyes are also components of a general intelligence, does "solving vision" entail development of a general intelligence?

jfim · on Aug 16, 2017

> Multiple images can be used to compute a 3D point cloud. [...] Add a GPU enabled system to process (it's quite compute heavy) and you are set.

It requires enough information ("features") in the images to compute a stereo pair. Flat patches of color have no features and as such, cannot be correlated between cameras. In such a case, you have a spot with no depth information.

This is exactly why you want to have other sensors, and saying that "oh humans have two eyes and they do fine" doesn't really cut it. Humans can say "hey this flat patch of color is a sign and signs are not dangerous" or "hey this flat patch of color is a really clean semi truck, and crashing into trucks is bad" but computers aren't that smart.

yorwba · on Aug 16, 2017

> Flat patches of color have no features and as such, cannot be correlated between cameras.

Flat patches of color are also flat, which makes it possible to fill in the missing depth information.

I took a course in computer vision where one of the projects [1] involved monocular vision, and assuming that flat patches are flat, vertical lines are vertical, all others are horizontal and the background is flat, it was possible to get a pretty good reconstruction.

[1] http://groups.csail.mit.edu/vision/courses/6.869/notes/chapt...

rasz · on Aug 16, 2017

Everything has texture, even things that appear solid, this is how optical mice can work on glass. Throw in NIR/UV camera, or maybe thermal for good measure, slap LSTM on top and you are covered for everything human would spot.

int_19h · on Aug 15, 2017

I'm curious as to what happens when there are two cars with active sensors at the same place - how much would they interfere with each other?

kirrent · on Aug 15, 2017

There are a lot of techniques to avoid it. I remember listening to an interview with Greg Charvat who said that coding a polarised pulse train with some random phase distribution is a possibility for avoiding interference.

sroussey · on Aug 16, 2017

What about malicious interference?

int_19h · on Aug 16, 2017

If it's deliberate, I would imagine that there are plenty of techniques that would work with regular cameras, as well. Ultimately, you can always target the algorithm that works on the data, regardless of how the data is collected.