Technically there are two clients: The camera and whatever device is used to access the feed.
I can absolutely imagine an architecture where video can be streamed in an encrypted manner, or stored in encrypted time-stamped blobs, allowing the server to provide rough searching, and then the client can perform fine-grained scanning.
This obviously doesn't enable any kind of processing of the video data on the server side, and doing it on the receiving client would require the feed to be active This means that any kind of processing would almost necessarily have to happen on the sending device, which would probably increase the power and compute requirements by a lot.
Yeah, the entire point of this seems to be "we'll watch your baby monitor and provide alerts if something happens". That requires either processing on a server (as they do), processing on the uploading client (the camera), or having a receiving client which is constantly receiving that data and analyzing it to provide alerts.
The third option is unreliable because if that "client" (a desktop app, a phone app, etc.) dies, then the process stops working completely. The second option is unreliable because if you increase the cost of the camera then most users will buy the other camera because everyone is financially constrained these days.
That basically just leaves the first option as the only practical one at an appealing price point.
I can absolutely imagine an architecture where video can be streamed in an encrypted manner, or stored in encrypted time-stamped blobs, allowing the server to provide rough searching, and then the client can perform fine-grained scanning.
This obviously doesn't enable any kind of processing of the video data on the server side, and doing it on the receiving client would require the feed to be active This means that any kind of processing would almost necessarily have to happen on the sending device, which would probably increase the power and compute requirements by a lot.