Nvidia Announces Tesla T4 GPUs With Turing Architecture

Developing story....

Nvidia CEO Jensen Huang took to the stage at GTC Japan to announce the company's latest advancements in AI, which includes the new Tesla T4 GPU. This new GPU, which Nvidia designed for inference workloads in the data center, leverages the same Turing microarchitecture as Nvidia's forthcoming GeForce RTX 20-series gaming graphics cards.

But the Tesla T4 is a unique graphics card designed specifically for AI inference workloads, like powering neural networks that process video, speech, search engines, and images. Nvidia's previous-gen Tesla P4 fulfilled this role in the past.


FP16
INT8
INT4
Nvidia Tesla T4 (TFLOPS)
65
130
260
Nvidia Tesla P4 (TFLOPS)
5.5
22
-

The Tesla T4 GPU comes bristling with 16GB of GDDR6, which  320 Turning Tensor cores, and 2,560 CUDA cores. The GPU supports mixed-precision, such as FP32, FP16, INT8, and INT4 (performance above). The low-profile 75W card slots into a standard PCIe slot in servers, but it doesn't require an external power source, like a 6-pin connector. Nvidia tells us that the die does feature RT Cores, just like the desktop models, but that they will be useful for raytracing or VDI (Virtual Desktop Infrastructure), which implies they will be unused for most inference workloads.

The Tesla T4 also features an INT4 and (experimental) INT1 precision mode, which is a notable advancement over its predecessor.

As expected, the card supports all the major deep learning frameworks, such as PyTorch, TensorFlow, MXNet, and Caffee2. Nvidia also offers its TensorRT 5, a new version of Nvdia's deep learning inference optimizer and runtime engine that supports Turing Tensor Cores and multi-precision workloads. 

Developing story...

This thread is closed for comments
10 comments
    Your comment
  • mail.miftachul
    why these type of card doesnt has cooler?
  • PaulAlcorn
    2794804 said:
    why these type of card doesnt has cooler?


    It's only 75W and will live in a server, so the linear airflow will keep it cool. Servers are like tornadoes inside, usually at least 200LFM. I'll add something to the article to explain that.
  • bit_user
    1920539 said:
    This new GPU, which Nvidia designed for inference workloads in hyperscale data centers, leverages the same Turing microarchitecture as Nvidia's forthcoming GeForce RTX 20-series gaming graphics cards.

    Indeed, TU104 is the same silicon used in RTX 1070 and RTX 1080. All they did was down-clock and scale it back to fit a 75 W power envelope. It is then fitted with double the RAM (ECC, too), a passive heatsink, and a several $k price tag.


    1920539 said:
    the Tesla T4 is a unique graphics card designed specifically for AI inference workloads

    It's a stretch to call it a graphics card. While it can do desktop virtualization, note the lack of any display outputs.

    1920539 said:
    Intel claims that most of the world's inference workloads run on Xeon processors

    This seems like wishful thinking.

    1920539 said:
    it will likely be several years before the clear winners become apparent.

    Well, Nvidia is clearly winning. The question is whether anyone building AI-specific chips can unseat them. We already know Vega 7 nm won't.