US to safety test new AI models from Google, Microsoft, xAI
- company Google
- company Microsoft
- company xAI
- lab Anthropic
- model CoPilot
- model Gemini
- model Grok
- person Chris Fall
The U.S. Department of Commerce will now conduct safety evaluations of new artificial intelligence models from Google, Microsoft, and xAI before public release [1]. The companies have agreed to submit their models voluntarily for testing. The tech firms will submit their models through the Commerce Department's Center for AI Standards and Innovation (CAISI), expanding on earlier agreements with other AI companies [1]. The evaluations will cover "testing, collaborative research and best practice development related to commercial AI systems" [1]. CAISI's director, Chris Fall, stated, "These expanded industry collaborations help us scale our work in the public interest at a critical moment" [1]. Google's primary AI tool is the Gemini chatbot from its DeepMind subsidiary, which is used in U.S. defense and military agencies [1]. Microsoft's is CoPilot, while xAI's only product is the Grok chatbot, which has faced public scrutiny for generating images that undressed people [1]. Microsoft stated that while it already tests its AI models, "testing for national security and large-scale public safety risks necessarily must be a collaborative endeavour with governments" [1]. CAISI has conducted 40 previous evaluations of AI tools, including testing of certain "state-of-the-art models that remain unreleased," though it did not specify which ones [1]. This marks a shift from the Trump administration's previous hands-off approach to AI oversight [1]. Last year, President Donald Trump signed executive orders for an "AI Action Plan" aimed at removing regulatory red tape to ensure U.S. advancement in the technology [1]. However, with increased military use of AI and claims from companies like Anthropic about developing models too powerful for release, the administration's stance appears to be evolving [1].
Context we found (3)
-
arxiv.org —
https://arxiv.org/abs/2508.14883v1 ↗
In recent years, cloud providers have introduced novel approaches for trading virtual machines. For example, Virtustream introduced so-called muVMs to charge cloud computing resources while other providers such as Google, Microsoft, or Amazon re-invented their marketspaces. Today…
-
arxiv.org —
https://arxiv.org/abs/2507.20018v2 ↗
Large language models (LLMs) like GPT-3 and BERT have revolutionized natural language processing (NLP), yet their environmental costs remain dangerously overlooked. This article critiques the sustainability of LLMs, quantifying their carbon footprint, water usage, and contributio…
-
arxiv.org —
https://arxiv.org/abs/2505.09056v1 ↗
Large Language Models (LLMs) represent a major step toward artificial general intelligence, significantly advancing our ability to interact with technology. While LLMs perform well on Natural Language Processing tasks -- such as translation, generation, code writing, and summariz…
Sources
- feeds.bbci.co.uk — US to safety test new AI models from Google, Microsoft, xAI ↗