Multiple images can be used to compute a 3D point cloud. This is computer vision stuff around for many years. The challenge is this is a passive sensor in that the cameras count on light illuminating the scene. So at night; in bright light (that causes images to blow out); shadows; etc; you can have voids. If a person is in that void bad things can happen.
But cameras now cost under a $1 each in volume (thanks smartphones!) so dirt cheap. An imaging based point cloud extraction system main components are therefore cheap. Add a GPU enabled system to process (it's quite compute heavy) and you are set. OpenCV has the algorithms needed.
LiDAR is an active sensor in that the laser "illuminates" the target area. This adds cost but that is coming down quickly. Also as the sensor delivered 3D points (not images) the computational cost with images can be saved; so less CPU/GPU required.
Levandowski is a LiDAR guy. It's what he believes is the best solution for the problem.
Some feel that LiDAR is not a fit either as it doesn't work well in rain/fog/sleet/snow. There was a youtube video showing a self driving car running a test course in clear weather and again in the rain. You would not want to be a pedestrian during the rain test.
In reality this is all engineering dick waving. Prices will come down and the sensor payload will converge.
For full autonomy it is likely that cameras, LiDAR, Radar, and sonar all will be used. They all bring some advantage to the problem that addresses a weakness of one of the other sensor techs.
Oh yeah, and Levandowski is a complete prick. Someone should teach him about IP theft and give him a prison life lesson. He's going to need it.
Incidentally, Musk's take: "The whole road system is meant to be navigated with passive optical or cameras and so once you solve camera vision then autonomy is solved if you don't solve vision it's not solved so that that's why our focus is so heavily on having a vision neural net that's very effective for road conditions." https://www.youtube.com/watch?v=gv7qL1mcxcw&feature=youtu.be...
Why would you want to limit yourself to passive cameras and make your life harder? This is like limiting yourself to flapping bird wings to make airplanes.
No, it's like limiting yourself to using skis to move down a ski slope. He's right: the roads are design to be navigated using vision. Signage, regulations, paint, curbs, etc. There's no proof that you could safely navigate the roads with LIDAR, but we prove every time we drive that you can do it with vision.
And sure, there might be a better way to get down a ski slope, but skis would be a pretty good starting point. And they guarantee you don't end up in an impossible situation because you're doing things a fundamentally different way than the system expects.
They're designed to be navigated using human vision, which has very different characteristics in terms of dynamic range, resolution, processing pipeline, inferring details about the scene based on past experiences, etc than machine vision.
Because not everyone can afford to spend $20k on extra sensors that make the car 1% safer. And holding back autonomous cars until they're perfect can kill more people than near-perfect autonomous cars. It's an economic tradeoff like any other.
His thesis is that relying on cameras makes it easier, since the entire preexisting road network is literally designed around optical navigability.
Adding other sensors isn't free. Every minute you spend on developing techniques to process inputs from other sensors, not to mention integrating their conclusions with that of other sensors, is time, money, and energy you could have used to improve your optical system.
I'm not saying I necessarily agree (though I find his position intuitively compelling), but he clearly thinks that it's easier, faster, and cheaper to bring an optical-only system to a point of reliability than it is to bring a mixed-sensor system to the same point.
It's interesting if you consider we have 2 eyes [cameras] and we drive under all conditions, and under bad driving conditions - if you're sane - you slow down or even completely stop and pull over with your 4-ways on. When I've been in very heavy rain downpours on the highway, it feels like I'm only driving because I'm able to follow the flow of lights in front of me as having a bunch of guidance points - autonomous vehicles could likely do a much safer job of this..
So just from reading the pulled quote, is he saying current road users, i.e. humans, navigate using a passive optical system, i.e. our eyes take in photons, they don't emit lasers, but our eyes are also components of a general intelligence, does "solving vision" entail development of a general intelligence?
> Multiple images can be used to compute a 3D point cloud. [...] Add a GPU enabled system to process (it's quite compute heavy) and you are set.
It requires enough information ("features") in the images to compute a stereo pair. Flat patches of color have no features and as such, cannot be correlated between cameras. In such a case, you have a spot with no depth information.
This is exactly why you want to have other sensors, and saying that "oh humans have two eyes and they do fine" doesn't really cut it. Humans can say "hey this flat patch of color is a sign and signs are not dangerous" or "hey this flat patch of color is a really clean semi truck, and crashing into trucks is bad" but computers aren't that smart.
> Flat patches of color have no features and as such, cannot be correlated between cameras.
Flat patches of color are also flat, which makes it possible to fill in the missing depth information.
I took a course in computer vision where one of the projects [1] involved monocular vision, and assuming that flat patches are flat, vertical lines are vertical, all others are horizontal and the background is flat, it was possible to get a pretty good reconstruction.
Everything has texture, even things that appear solid, this is how optical mice can work on glass. Throw in NIR/UV camera, or maybe thermal for good measure, slap LSTM on top and you are covered for everything human would spot.
There are a lot of techniques to avoid it. I remember listening to an interview with Greg Charvat who said that coding a polarised pulse train with some random phase distribution is a possibility for avoiding interference.
If it's deliberate, I would imagine that there are plenty of techniques that would work with regular cameras, as well. Ultimately, you can always target the algorithm that works on the data, regardless of how the data is collected.
But cameras now cost under a $1 each in volume (thanks smartphones!) so dirt cheap. An imaging based point cloud extraction system main components are therefore cheap. Add a GPU enabled system to process (it's quite compute heavy) and you are set. OpenCV has the algorithms needed.
LiDAR is an active sensor in that the laser "illuminates" the target area. This adds cost but that is coming down quickly. Also as the sensor delivered 3D points (not images) the computational cost with images can be saved; so less CPU/GPU required.
Levandowski is a LiDAR guy. It's what he believes is the best solution for the problem.
Some feel that LiDAR is not a fit either as it doesn't work well in rain/fog/sleet/snow. There was a youtube video showing a self driving car running a test course in clear weather and again in the rain. You would not want to be a pedestrian during the rain test.
In reality this is all engineering dick waving. Prices will come down and the sensor payload will converge.
For full autonomy it is likely that cameras, LiDAR, Radar, and sonar all will be used. They all bring some advantage to the problem that addresses a weakness of one of the other sensor techs.
Oh yeah, and Levandowski is a complete prick. Someone should teach him about IP theft and give him a prison life lesson. He's going to need it.