Meta has unveiled Llama 3.3 70B, a new open-source large language model (LLM) boasting significant improvements in efficiency and cost-effectiveness. This 70-billion parameter model achieves comparable output quality to its larger predecessor, Llama 3.1 405B, but at a fraction of the infrastructure cost. Meta reports that Llama 3.3 70B is nearly five times more cost-efficient, significantly reducing expenses associated with prompt responses. This reduction is attributed to an optimized Transformer architecture and improved attention mechanisms, lowering inference costs. The model was trained using 15 trillion tokens from public web data and over 25 million synthetic examples, utilizing Nvidia H100-80GB GPUs.
The model underwent further refinement through supervised fine-tuning and reinforcement learning from human feedback (RLHF), enhancing its performance and alignment with user preferences. Benchmark comparisons show Llama 3.3 70B trailing Llama 3.1 405B by less than 2% in several tests, even surpassing it in others and generally outperforming OpenAI's GPT-4o. Meta highlights the drastic cost savings: processing and generating a million tokens costs just 10 cents and 40 cents respectively, compared to $1 and $1.80 with Llama 3.1 405B. The source code for Llama 3.3 70B is publicly available on Hugging Face, making advanced AI capabilities more accessible to a broader range of developers and researchers.