Having a large context window is very different from being able to effectively use a lot of context.
To get great results, it's still very important to manage context well. It doesn't matter if the model allows a very large context window, you can't just throw in the kitchen sink and expect good results
Even with large contexts there's diminishing returns. Just having the ability to stuff more tokens in context doesn't mean the model can effectively use it. As far as I can tell, they always reach a point in which more information makes things worse.
More of a question is its context rot tendency than the size of its context :)
LLMs are supposed to load 3 bibles into their context, but they forget what they were about to do after loading a 600LoC of locales.
The website clearly lays them out as 400k input and 128k output [1]. I just updated my AI apps to support the new models. I routinely fill the entire context on large code calls. Input is not a "shared" context.
I found 100k was barely enough for a single project without spillover, so 4x allows for linking more adjacent codebases for large scale analysis.