I mean, yeah. From the Table 9: Hallucination evaluations in GPT-OSS model card ...

I mean, yeah. From the Table 9: Hallucination evaluations in GPT-OSS model card [1], GPT-OSS-20b/120b have accuracy of 0.067/0.168 and hallucination rate of 0.914/0.782 separately, while o4-mini has accuracy of 0.234 and hallucinate rate of 0.750. These numbers simply mean that GPT-OSS models have little real world knowledge, and they hallucinate hard. Note that little real world knowledge has always been a "feature" of the Phi-LLM series because of the "safety" (for large companies), or rather, "censorship" (for users) requirements.

In addition, from Table 4: Hallucination evaluations in OpenAI o3 and o4-mini System Card [2], o3/o4-mini have accuracy of 0.49/0.20 and hallucination rate of 0.51/0.79.

In summary, there is a significant real world knowledge gap between o3 and o4-mini, and another significant gap between o4-mini and GPT-OSS. Besides, the poor real world knowledge exhibited in GPT-OSS is aligned with the "feature" of Phi-LLM series.

[1] https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7... [2] https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f372...