GPU
Cloud
Server
Colocation
CDN
Network
Linux Cloud
Hosting
Managed
Cloud Service
Storage
as a Service
VMware Public
Cloud
Multi-Cloud
Hosting
Cloud
Server Hosting
Remote
Backup
Kubernetes
NVMe
Hosting
API Gateway
Mixed precision training is a method of training deep learning models using a combination of 16‑bit (FP16 or bfloat16) and 32‑bit (FP32) floating point formats during forward and backward passes. Instead of running the entire training pipeline in full 32‑bit precision, selected operations are executed in lower precision to gain speed and save memory, while critical values such as master weights remain in 32‑bit to preserve numerical stability and accuracy.
GaaS on Cyfuture Cloud supports mixed precision training through NVIDIA GPU instances with Tensor Cores (such as V100, A100, H100 and similar generations) and modern deep learning frameworks like PyTorch and TensorFlow that offer Automatic Mixed Precision (AMP). By choosing a compatible GPU instance and enabling mixed precision in the chosen framework, users can leverage GaaS to significantly accelerate training workloads with minimal code changes.
Mixed precision training is a deep learning optimization technique that uses both 16‑bit and 32‑bit floating‑point arithmetic in the same training run to reduce memory usage and increase training throughput while maintaining model accuracy. It typically relies on features such as autocasting (automatic selection of FP16 vs FP32 per operation) and loss scaling to keep gradients numerically stable.
Cyfuture Cloud’s GaaS supports mixed precision training as long as you use supported NVIDIA GPU instances and a framework that implements mixed precision (for example, TensorFlow mixed_precision API or PyTorch AMP). Users can enable mixed precision by configuring their training scripts or containers to use FP16/bfloat16 operations on Tensor Core‑enabled GPUs, often achieving up to several times faster training compared to pure FP32.
Mixed precision training changes how numbers are stored and computed within the training loop to exploit GPU hardware capabilities.
- Models store a master copy of weights in FP32, but perform many matrix multiplications and convolutions in FP16 or bfloat16 to leverage specialized Tensor Cores on NVIDIA GPUs. This approach keeps updates numerically stable while allowing the bulk of math to run much faster on reduced precision hardware units.
- Frameworks add loss scaling: the loss is multiplied by a scaling factor before backpropagation to prevent very small gradients from underflowing to zero in FP16, and then gradients are scaled back down before the optimizer step. Modern APIs automate this process so users rarely need to manage scaling manually.
Using mixed precision on GaaS brings both performance and efficiency gains for deep learning workloads.
- Training throughput can increase significantly because FP16 operations on Tensor Core GPUs execute faster than FP32, leading to overall speedups of up to around 3x on compute‑intensive models in some NVIDIA benchmarks. This allows data scientists to iterate more quickly on model architectures and hyperparameters.
- Memory savings from using 16‑bit activations and certain intermediate tensors enable larger batch sizes or bigger models on the same GPU, improving utilization of GaaS instances and reducing the need to scale out to additional GPUs. With higher effective batch sizes, users can also experience more stable training on some architectures.
To take advantage of mixed precision on Cyfuture Cloud GaaS, several practical steps are involved.
- Select an NVIDIA GPU instance type that exposes Tensor Cores and supports FP16/bfloat16 acceleration (for example, A100 or H100) when provisioning GPU resources on Cyfuture Cloud. These instances are optimized for mixed precision workloads and are recommended for large‑scale AI training.
- Configure the software stack using a supported deep learning framework such as TensorFlow 2.x or PyTorch 1.6+ and enable their mixed precision utilities (for example, tf.keras.mixed_precision.set_global_policy('mixed_float16') in TensorFlow or torch.cuda.amp.autocast() with GradScaler in PyTorch). Cyfuture Cloud also offers pre‑configured GPU images or containers that already bundle CUDA, cuDNN, and framework versions compatible with mixed precision.
Mixed precision training is a specialized technique for combining FP16 and FP32 arithmetic to accelerate deep learning training while keeping model accuracy comparable to full‑precision runs. On Cyfuture Cloud, GaaS supports this capability through modern NVIDIA GPUs and mainstream frameworks that expose Automatic Mixed Precision, making it easy to adopt for both new and existing models.
By selecting appropriate GPU instances and enabling mixed precision APIs in code or containers, teams can reduce training time, lower memory footprints, and get more value from their GPU‑as‑a‑Service budgets without major changes to model design. For most deep learning workloads running on GaaS, mixed precision is a recommended default unless there is a strong reason to require strict FP32 everywhere.
In most modern frameworks, mixed precision works with existing architectures without requiring structural changes to the model. Users typically only add a few lines of configuration to enable autocasting and loss scaling, and the framework decides which operations should run in FP16 versus FP32.
When configured correctly with loss scaling and FP32 master weights, mixed precision training usually reaches comparable accuracy to full FP32 training on common vision and NLP models. However, a small number of numerically sensitive models may require tuning of loss scaling or disabling mixed precision for specific layers to match baseline accuracy.
Mainstream deep learning frameworks like TensorFlow (via the mixed_precision API), PyTorch (via AMP in torch.cuda.amp), and others such as MXNet provide built‑in mixed precision support that runs efficiently on NVIDIA GPUs. These frameworks are commonly available in Cyfuture Cloud GaaS images or can be installed in custom environments on GPU instances.
By increasing training speed and allowing larger effective batch sizes, mixed precision can reduce the amount of GPU time required to reach a target accuracy, directly lowering compute usage. Organizations can also fit larger models on a single GPU, potentially avoiding the need to scale to multiple GPUs or higher‑end instances for some workloads.
Mixed precision originated as a training optimization, but many models can also run inference with FP16 or INT8 for lower latency and higher throughput. On GaaS, users can deploy trained models using FP16 inference on compatible GPUs to accelerate production workloads while monitoring for any accuracy drift.
Let’s talk about the future, and make it happen!
By continuing to use and navigate this website, you are agreeing to the use of cookies.
Find out more

