Nemotron 3.5: customizable multimodal safety for AI

TL;DR: NVIDIA introduces Nemotron 3.5 Content Safety, a multimodal model that lets enterprises customize content safety policies, trained on synthetic data and available on Hugging Face.

What happened?

NVIDIA has released Nemotron 3.5 Content Safety, a multimodal language model that processes text, images, and audio (with upcoming video support) designed for content moderation in enterprise AI applications. Published on the official Hugging Face blog, the model stands out for being fully customizable: companies can define their own safety policies, adapting to specific regulatory, cultural, or brand requirements. It is trained on synthetic data generated by AI, using the Nemotron-4 340B model, allowing it to cover a broad spectrum of risk categories without relying on manually labeled data. According to the Hugging Face blog, the model achieves performance comparable to closed-source models like GPT-4o on safety benchmarks and outperforms Llama Guard 3 in several categories.

Why is it important?

AI safety is a growing challenge, especially in global enterprise environments where norms vary. Previous models like Meta's Llama Guard or OpenAI's moderation system offered moderation but with fixed policies limited to text or images. Nemotron 3.5 introduces unprecedented flexibility: companies can configure toxicity thresholds, prohibited categories (such as hate speech, violence, sexual content, harassment, self-harm, among others), and even add their own categories via a JSON file. This allows compliance with regulations like the EU AI Act, which requires risk management systems, or local standards such as those in China or India, without relying on an external provider. Moreover, being multimodal, it covers emerging risks in voice chatbots, image generation, and video analysis, areas where unimodal models fail. For example, a meme with offensive text could be detected by both visual and textual content.

Technical features

Multimodality: Processes text, images, and audio (video coming soon). It uses separate encoders for each modality and a cross-attention mechanism to integrate information.
Customization: Companies define policies via a JSON file with categories and thresholds. The model accepts up to 100 custom categories, and policies can be hierarchical. For example, a company can define "graphic violence" as a subcategory of "violence" with a stricter threshold.
Synthetic training: Uses data generated by Nemotron-4 340B, a large language model from NVIDIA, to create training examples covering edge cases and multimodal combinations. This reduces costs and human biases, though it raises questions about the representativeness of synthetic data.
Performance: According to the Hugging Face blog, on the Safety Benchmark Multimodal (SBM), Nemotron 3.5 achieves 92% accuracy in detecting harmful content, compared to 89% for GPT-4o and 85% for Llama Guard 3. In robustness tests against adversarial attacks, it shows a success rate of only 5%, compared to 12% for previous open models.

Market implications

This release could accelerate AI adoption in regulated sectors like finance, healthcare, and government, where content moderation is critical. Companies like ServiceNow are already integrating the model into their workflows to moderate customer interactions. Others, such as SAP and Deloitte, have shown interest in testing it. However, customization also poses risks: if a company defines lax policies, it could face controversies or regulatory sanctions. NVIDIA offers the model under a commercial license, directly competing with solutions from startups like Credo AI (which offers bias auditing) or Hive AI (image moderation). The use of synthetic data also sparks debate: although it reduces costs, it can perpetuate biases if the generated data is not representative of human diversity. Additionally, reliance on NVIDIA hardware (recommended GPUs) could limit adoption in companies with heterogeneous infrastructure.

What readers should know

To implement Nemotron 3.5, companies need technical expertise in model deployment (recommended NVIDIA GPUs like A100 or H100). The model is available on Hugging Face and NVIDIA AI Enterprise, with a license that allows commercial use only for NVIDIA customers. It is essential to audit custom policies to avoid biases, using tools like NVIDIA's "AI Red Team" or external audits. Additionally, the open-source community could adapt it for non-commercial use, though the license restricts enterprise use. NVIDIA also offers a managed cloud service (NVIDIA AI Foundry) for companies that prefer to avoid self-deployment. Developers should note that the model requires approximately 16 GB of VRAM for real-time inference.

“Safety shouldn't be one-size-fits-all. Nemotron 3.5 lets companies own their moderation.” — NVIDIA

Historical context

NVIDIA has ventured into language models with the Nemotron series, starting with Nemotron-1 in 2023, which competed with Meta's Llama and OpenAI's GPT. This release reinforces its strategy of offering comprehensive enterprise solutions, from hardware (GPUs) to governance software (NVIDIA AI Enterprise). Unlike Meta, which released Llama Guard with an open license, NVIDIA opts for a customizable but closed-source model, allowing it to control the ecosystem. Training with synthetic data is a growing trend: OpenAI has also used synthetic data for GPT-4, but NVIDIA takes it to the extreme by generating the entire training set synthetically. This could reduce reliance on human data but also introduces risks of "model collapse" if synthetic data becomes homogeneous. Compared to previous events, such as the release of GPT-4 Safety in 2023, Nemotron 3.5 offers greater flexibility, but at the cost of requiring more technical expertise for customization. In the content moderation market, it is estimated to reach $15 billion by 2027, according to Grand View Research, and NVIDIA aims to capture a significant share with this offering.

Nemotron 3.5: Customizable Multimodal Safety for Enterprise AI