> Apple is actively achieving that goal, with their many year strategy of in hou...

nomel · on May 7, 2024

No. "On edge" is not a model existence limitation, it is a hardware capability/existence limitation, by definition, and by the fact that, as you point out, the models already exist.

You can already run those open weight models on Apple devices, on edge, with huge improvements on the newer hardware. Why is a distinct model required? Do the rumors appease these thoughts?

If others are making models, with no way to actually run them, that's not a viable "on edge" strategy, since it involves waiting for someone else to actually accomplish the goal first (as is being done by Apple).

SpaceManNabs · on May 7, 2024

> "On edge" is not a model existence limitation

It absolutely is. Model distillation will still be pertinent. And so will be parameter efficient tuning for edge training. I cannot emphasize more how important this is. You will need your own set of weights. If apple wants to use open weights, then sure. Ignore this. Don't seem like they want to long-term... And even if they use open weights, they will still be behind other companies have done model distillation and federated learning for years.

> Why is a distinct model required?

Ask apple's newly poached AI hires this question. Doesn't seem like you would take an answer from me.

> If others are making models, with no way to actually run them

Is this the case? People have been running distilled llamas on rPis with pretty good throughput.

nomel · on May 7, 2024

> And even if they use open weights, they will still be behind other companies have done model distillation and federated learning for years.

I'm sorry, but we're talking about "on edge" here though. Those other companies have no flipping hardware to run it "on edge", in a "generic" way, which is the goal. Apple's strategy involves the generic.

> If apple wants to use open weights

This doesn't make sense. Apple doesn't dictate the models you can use with their hardware. You can already accelerate LLAMA with the neural engines. You can download the app right now. You can already deploy your models on edge, on their hardware. That is the success they're achieving. You cannot effectively do this on competitor hardware, with good performance, from "budget" to "Pro" lineup, which is a requirement of the goal.

> they will still be behind other companies have done model distillation and federated learning for years.

What hardware are they running it on? Are they taking advantage of Apple (or other) hardware in their strategy? Federated learning is an application of "on edge", it doesn't *enable* on edge, which is part of Apple's strategy.

> Ask apple's newly poached AI hires this question. Doesn't seem like you would take an answer from me.

Integrating AI in their apps/experience is not the same as enabling a generic "on edge", default, capability in all Apple devices (which they have been working towards for years now). This is the end goal for "on edge". You seem to be talking about OS integration, or something else.

> People have been running distilled llamas on rPis with pretty good throughput.

Yes, the fundamental limitation there being hardware performance, not the model, with that "pretty good" making the "pretty terrible" user experience. But, there's also nothing stopping anyone from running these distilled (a requirement of limited hardware) models on Apple hardware, taking advantage of Apples fully defined "on edge" strategy. ;) Again, you can run llamas on Apple silicon, accelerated, as I do.

SpaceManNabs · on May 8, 2024

> Those other companies have no flipping hardware to run it "on edge", in a "generic" way, which is the goal

Maybe? This is why I responded to:

> It's everyone's goal. Apple is actively achieving that goal

This is is the issue I found disagreeable. Other organizations and individual people are achieving that goal too. Google says GPT-Nano is going to device, and if the benchmarks are to be believed, if it runs at that level, their work so far is also actively achieving that goal. Meta has released multiple distilled models that people have already proven to run inference at the device level. It cannot be argued that meta is not actively achieving that goal either. They don't have to release the hardware because they went a different route. I applaud Apple for the M chips. They are super cool. People are still working on using them so Apple can realize that goal too.

So when you go to the statement that started this

> Apple's AI strategy is to put inference (and longer term even learning) on edge devices

Multiple orgs also share this. And I can't say that one particular org is super ahead of the others. And I can't elevate apple in that race because it is not clear that they are truly privacy-focused or that they will keep APIs open.

> You cannot effectively do this on competitor hardware, with good performance, from "budget" to "Pro" lineup, which is a requirement of the goal

Why do you say you cannot do this with good performance? How many tokens do you want for a device? Is 30T/s enough? You can do that on laptops running small mixtral.

> What hardware are they running it on? Are they taking advantage of Apple (or other) hardware in their strategy?

I don't know. I have nothing indicating necessarily apple or nvidia or otherwise. Do you?

> [Regarding the rest]

Sure, my point is that they definitely have an intent for bespoke models. And why I raised the point that not all computation will be feasible on edge for the time being. My point with what raised this particular line of inquiry is whether a pure edge experience truly enables the best user experience. And also why I raised the point about Apple's track record of open APIs. Which is why "actively achieving" is something that I put doubt on. And I also cast doubt on apple being privacy focused. Just emphasize tying it back to the reason I even commented.

SpaceManNabs · on May 15, 2024

A week after this comment, google announced gemini nano locally in chrome.