Overall score: Summary of the individual benchmarks
Let’s start with the floating point benchmarks. Here, NVIDIA is the measure of all things and when the tensor cores come into play, all cards without this feature will no longer see any land anyway. Incidentally, you can also see that NVIDIA cards perform significantly better in FP16 than in FP32 when it comes to comparison with AMD cards. Even if the GeForce RTX 4090 usually wins easily against the full expansion on the RTX 6000 Ada – the workstation card is far ahead in terms of efficiency with a maximum of 300 watts compared to up to 450 watts for the consumer card. Cache and memory size and a few more processing units ensure that the lower clocked card is not significantly behind. With AMD’s consumer cards, the RX 7900XT and XTX are both almost on a par, which is somewhat astonishing, but is reproducible in total, because there are individual benchmarks that are obviously slightly better for the XT, for whatever reason. However, AMD could certainly benefit from further optimized drivers.
Integer score
The type of calculations in AI applications can vary greatly. Integer operations are often used in Quantized Neural Networks (QNNs), while floating point operations are more common in standard Neural Networks (NNs). These different workloads can lead to different performance requirements and benchmark results, even among graphics cards from the same manufacturer. Integer operations are often less compute-intensive and require less memory bandwidth compared to floating-point operations, which can also be reflected in the benchmark results.
Different caching strategies and cache sizes can have different effects on integer and floating point operations. Integer data can fit better in the cache and be used more efficiently than floating point data. In some AI models, floating point operations are reduced to integer values through quantization, which can lead to performance improvements. The performance differences in benchmarks may therefore be due to the efficiency of quantization.
Summary and conclusion
The bottom line is that all cards reach their performance limit often enough, even mostly electrically. And all benchmarks without the tensor cores reflect the current status quo of architectures without AI accelerators. This is basically raster graphics in the AI world without ray tracing. And even there, NVIDIA still has a clear lead. Of course, I only had normal workstation and consumer cards at my disposal and no special AI accelerators. But that would be something completely different anyway, and it doesn’t work in this form under Windows either. A small side note: Since I always do several iterations, I can manage two, maximum three cards a day (then without TensorRT). That’s exactly why the 12 maps have to be enough for now, because it took almost a week. But at least we can see that even consumer cards can keep up well, with a few minor exceptions.
35 Antworten
Kommentar
Lade neue Kommentare
Urgestein
Mitglied
Urgestein
Veteran
Urgestein
1
Urgestein
Urgestein
1
Urgestein
1
Urgestein
Veteran
Urgestein
Urgestein
Urgestein
Urgestein
Urgestein
Urgestein
Alle Kommentare lesen unter igor´sLAB Community →