AV2 delivers 30% better compression efficiency than AV1, which already compresses 30% better than HEVC (H.265).
AV2 encoding demands 2-3 times more computational power than AV1, requiring advanced hardware like RTX 5090 for practical use.
AV2 will officially release by end of 2025, with widespread hardware support expected around 2027 or later.
AV2 introduces advanced features like split-screen delivery, enhanced AR/VR support, and dynamic bitrate switching for adaptive streaming.
88% of AOMedia members plan to implement AV2 within two years, despite infrastructure and hardware compatibility challenges.
If there's any other difference then let me know too but Honestly a bit curious but it mentions that it requires RTX 5090
Wouldn't this be a little bad for the market too? Sure it compresses 30% more but not everybody has rtx 5090
Are we gonna see multi codec in things like say netflix where to devices which don't support av2 will be sent av1 but they would prefer to send av2 if the hardware category is matched?
Just in case you missed it, your quote was referring to encoding requirements. Decoding (eg. Netflix users) will have a different set of requirements. The situation will also improve over time as dedicated hardware encoders and decoders become available.
For the moment, I don't really mind if it requires more GPU power to encode media, since it only needs to happen once. I expect it will still be possible on a weaker card, but it would just take longer.
I highly recommend. As a tip, you can quite easily get into a chat like state by simply using in context learning. Have a few turns of conversation pre-written and generate from that. It'll continue the conversation (for both parties) so you just stop it from generating when it starts generating on your behalf.
That said, it's useful for so much more beyond. Outline the premise of a Book, then "what follows is that book\n #Chapter 1:" and watch it rip. Base models are my preferred way of using LLM's by a long margin.
I've done this out of curiosity with the base model of LLama 3.1 405B. I vibe coded a little chat harness with the system prompt being a few short conversations between "system" and "user" with "user:" being the stop word so I could enter my message. Worked surprisingly well and I didn't get any sycophancy or cliched AI responses.
For tiny models, the SFT data mixture is unbelievably critical to usability. They are unable to generalize in almost any way. If you don't have multi-turn conversations, they will not be able to do multi-turn conversations. If you have multi-turn conversations which are just chatting, and then single turn conversations for math, it will be unable to do math in a multi-turn setting. This is much less true for bigger models.
I'm not convinced that LLM training is at such a high energy use that it really matters in the big picture. You can train a (terrible) LLM on a laptop[1], and frankly that's less energy efficient than just training it on a rented cloud GPU.
Most of the innovation happening today is in post-training rather than pre-training, which is good for people concerned with energy use because post-training is relatively cheap (I was able to post-train a ~2b model in less than 6 hours on a rented cluster[2]).
reply