Groq and HUMAIN, a PIF company in Saudi Arabia, announced the immediate availability of OpenAI’s two open models on GroqCloud. The launch delivers gpt-oss-120B
and gpt-oss-20B
with full 128K context, real-time responses, and integrated server-side tools live on Groq’s optimized inference platform from day zero.
Groq has long supported OpenAI’s open-source efforts, including large-scale deployment of Whisper. This launch builds on that foundation, bringing their newest models to production with global access and local support through HUMAIN.
“OpenAI is setting a new high performance standard in open source models,” said Jonathan Ross, CEO of Groq. “Groq was built to run models like this, fast and affordably, so developers everywhere can use them from day zero. Working with HUMAIN strengthens local access and support in the Kingdom of Saudi Arabia, empowering developers in the region to build smarter and faster.”
“Groq delivers the unmatched inference speed, scalability, and cost-efficiency we need to bring cutting-edge AI to the Kingdom,” said Tareq Amin, CEO at HUMAIN. “Together, we’re enabling a new wave of Saudi innovation—powered by the best open-source models and the infrastructure to scale them globally. We’re proud to support OpenAI’s leadership in open-source AI.”
To make the most of OpenAI’s new models, Groq delivers extended context and built-in tools like code execution and web search. Web search helps provide real-time relevant information, while code execution enables reasoning and complex workflows. Groq’s platform delivers these capabilities from day zero with a full 128k token context length.
Groq’s purpose-built stack delivers the lowest cost per token for OpenAI’s new models while maintaining speed and accuracy.
gpt-oss-120B
is currently running at 500+ t/s and gpt-oss-20B
is currently running at 1000+ t/s on GroqCloud.
Groq is offering OpenAI’s latest open models at the following pricing:
gpt-oss-120B
: $0.15 / M input tokens and $0.75 / M output tokensgpt-oss-20B
: $0.10 / M input tokens and $0.50 / M output tokens