Home
Blog
GPU Acceleration in AI: How Graphics Processing Units Drive Deep Learning

GPU Acceleration in AI: How Graphics Processing Units Drive Deep Learning

By Gcore

September 19, 2023

9 min read

GPU Acceleration in AI: How Graphics Processing Units Drive Deep Learning

This article discusses how GPUs are shaping a new reality in the hottest subset of AI training: deep learning. We’ll explain the GPU architecture and how it fits with AI workloads, why GPUs are better than CPUs for training deep learning models, and how to choose an optimal GPU configuration.

How GPUs Drive Deep Learning

The key GPU features that power deep learning are its parallel processing capability and, at the foundation of this capability, its core (processor) architecture.

Parallel Processing

Deep learning (DL) relies on matrix calculations, which are performed effectively using the parallel computing that GPUs provide. To understand this interrelationship better, let’s consider a simplified training process of a deep learning model. The model takes the input data, such as images, and has to recognize a specific object in these images using a correlation matrix. The matrix summarizes a data set, identifies patterns, and returns results accordingly: If the object is recognized, the model labels it “true”, otherwise it is labeled “false.” Below is a simplified illustration of this process.

Figure 1. The simplified illustration of the DL training process

An average DL model has billions of parameters, each of which contributes to the size of the matrix weights used in the matrix calculations. Each of the billion parameters must be taken into account, that’s why the true/false recognition process requires running billions of iterations of the same matrix calculations. The iterations are not linked to each other, they are executed in parallel. GPUs are perfect for handling these types of operations because of their parallel processing capabilities. This is enabled by devoting more transistors to data processing.

Core Architecture: Tensor Cores

NVIDIA tensor cores are an example of how hardware architecture can effectively adapt to DL and AI. Tensor cores—special kinds of processors—were designed specifically for the mathematical calculations needed for deep learning, while earlier cores were also used for video rendering and 3D graphics. “Tensor” refers to tensor calculations, which are matrix calculations. A tensor is a mathematical object; if a tensor has two dimensions, it is a matrix. Below is a visualization of how a Tensor core calculates matrices.

Figure 2. Volta Tensor Core matrix calculations. Source: NVIDIA

NVIDIA Volta-based chips, like Tesla V100 with 640 tensor cores, became the first fully AI-focused GPUs, and they significantly influenced and accelerated the DL development industry. NVIDIA added tensor cores to its GPU chips in 2017, based on the Volta architecture.

Multi-GPU Clusters

Another GPU feature that drives DL training is the ability to increase throughput by building multi-GPU clusters, where many GPUs work simultaneously. This is especially useful when training large, scalable DL models with billions and trillions of parameters. The most effective approach for such training is to scale GPUs horizontally using interfaces such as NVLink and InfiniBand. These high-speed interfaces allow GPUs to exchange data directly, bypassing CPU bottlenecks.

Figure 3. NVIDIA H100 with NVLink GPU-to-GPU connections. Source: NVIDIA

For example, with the NVLink switch system, you can connect 256 NVIDIA GPUs in a cluster and get 57.6 Tbps of bandwidth. A cluster of that size can significantly reduce the time needed to train large DL models. Though there are several AI-focused GPU vendors on the market, NVIDIA is the undisputed leader, and makes the greatest contribution to DL. This is one of the reasons why Gcore uses NVIDIA chips for its AI GPU infrastructure.

GPU vs. CPU Comparison

A CPU executes tasks serially. Instructions are completed on a first-in, first-out (FIFO) basis. CPUs are better suited for serial task processing because they can use a single core to execute one task after another. CPUs also have a wider range of possible instructions than GPUs and can perform more tasks. They interact with more computer components such as ROM, RAM, BIOS, and input/output ports.

A GPU performs parallel processing, which means it processes tasks by dividing them between multiple cores. The GPU is a kind of advanced calculator: it can only receive a limited set of instructions and execute only graphics- and AI-related tasks, such as matrix multiplication (CPU can execute them too.) GPUs only need to interact with the display and memory. In the context of parallel computing, this is actually a benefit, as it allows for a greater number of cores devoted solely to these operations. This specialization enhances the GPU’s efficiency in parallel task execution.

An average consumer-grade GPU has hundreds of cores adapted to perform simple operations quickly and in parallel, while an average consumer-grade CPU has 2–16 cores adapted to complex sequential operations. Thus, the GPU is better suited for DL because it provides many more cores to perform the necessary computations faster than the CPU.

Figure 4. An average CPU has 2–16 cores, while an average GPU has hundreds

The parallel processing capabilities of the GPU are made possible by dedicating a larger number of transistors to data processing. Rather than relying on large data caches and complex flow control, GPUs can reduce memory access latencies with computation compared to CPUs. This helps avoid long memory access latencies, frees up more transistors for data processing rather than data caching, and, ultimately, benefits highly parallel computations.

Figure 5. GPUs devote more transistors to data processing than CPUs. Source: NVIDIA

GPUs also use video DRAM: GDDR5 and GDDR6. These are much faster than CPU DRAM: DDR3 and DDR4.

How GPU Outperforms CPU in DL Training

DL requires a lot of data to be transferred between memory and cores. To handle this, GPUs have a specially optimized memory architecture which allows for higher memory bandwidth than CPUs, even when GPUs technically have the same or less memory capacity. For example, a GPU with just 32 GB of HBM (high bandwidth memory) can deliver up to 1.2 Tbps of memory bandwidth and 14 TFLOPS of computing. In contrast, a CPU can have hundreds of GB of HBM, yet deliver only 100 Gbps bandwidth and 1 TFLOPS of computing.

Since GPUs are faster in most DL cases, they can also be cheaper when renting. If you know the approximate time you spend on DL training, you can simply check the prices of cloud providers to estimate how much money you will save by using GPUs instead of CPUs.

Depending on the configuration, models, and frameworks, GPUs often provide better performance than CPUs in DL training. Here are some direct comparisons:

Azure tested various cloud CPU and GPU clusters using the TensorFlow and Keras frameworks for five DL models of different sizes. In all cases, GPU cluster throughput consistently outperformed CPU cluster throughput, with improvements ranging from 186% to 804%.
Deci compared the NVIDIA Tesla T4 GPU and the Intel Cascade Lake CPU using the EfficientNet-B2 model. They found that the GPU was 3 times faster than the CPU.
IEEE published the results of a survey about running different types of neural networks on an Intel i5 9th generation CPU and an NVIDIA GeForce GTX 1650 GPU. When testing CNN (convolutional neural networks,) which are better suited to parallel computation, the GPU was between 4.9 and 8.8 times faster than the CPU. But when testing ANN (artificial neural networks,) the execution time of CPUs was 1.2 times faster than that of GPUs. However, GPUs outperformed CPUs as the data size increased, regardless of the NN architecture.

Using CPU for DL Training

The last comparison case shows that CPUs can sometimes be used for DL training. Here are a few more examples of this:

There are CPUs with 128 cores that can process some AI workloads faster than consumer GPUs.
Some algorithms allow the optimization DL model to perform better training on CPUs. For instance, Rice’s Brown School of Engineering has introduced an algorithm that makes CPUs 15 times faster than GPUs for some AI tasks.
There are cases where the precision of a DL model is not critical, like speech recognition under near-ideal conditions without any noise and interference. In such situations, you can train a DL model using floating-point weights (FP16, FP32) and then round them to integers. Because CPUs work better with integers than GPUs, they can be faster, although the results will not be as accurate.

However, using CPUs for DL training is still an unusual practice. Most DL models are adapted for parallel computing, i.e., for GPU hardware. Thus, building a CPU-based DL platform is a task that may be both difficult and unnecessary. It can take an unpredictable amount of time to select a multi-core CPU instance and then configure a CPU-adapted algorithm to train your model. By selecting a GPU instance, you get a platform that’s ready to build, train, and run your DL model.

How to Choose an Optimal GPU Configuration for Deep Learning

Choosing the optimal GPU configuration is basically a two-step process:

Determine the stage of deep learning you need to execute.
Choose a GPU server specification to match.

Note: We’ll only consider specification criteria for DL training, because DL inference (execution of a trained DL model,) as you’ll see, is not such a big deal as training.

1. Determine Which Stage of Deep Learning You Need

To choose an optimal GPU configuration, first you must understand which of two main stages of DL you will execute on GPUs: DL training, or DL inference. Training is the main challenge of DL, because you have to adjust the huge number (up to trillions) of matrix coefficients (weights.) The process is close to a brute-force search for the best combinations to give the best results (though some techniques help to reduce the number of computations, for example, the stochastic gradient descent algorithm.) Therefore, you need maximum hardware performance for training, and vendors make GPUs specifically designed for this. For example, the NVIDIA A100 and H100 GPUs are positioned as devices for DL training, not for inference.

Once you have calculated all the necessary matrix coefficients, the model is trained and ready for inference. At this stage, a DL model only needs to multiply the input data and the matrix coefficients once to produce a single result—for example, when a text-to-image AI generator generates an image according to a user’s prompt. Therefore, inference is always simpler than training in terms of math computations and required computational resources. In some cases, DL inference can be run on desktop GPUs, CPUs, and smartphones. An example is an iPhone with face recognition: the relatively modest GPU with 4–5 cores is sufficient for DL inference.

2. Choose the GPU Specification for DL Training

When choosing a GPU server or virtual GPU instance for DL training, it’s important to understand what training time is appropriate for you: hours, days, months, etc. To achieve this, you can count operations in the model or use information about reported training time and GPU model performance. Then, decide on the resources you need:

Memory size is a key feature. You need to specify at least as much GPU RAM as your DL model size. This is sufficient if you are not pressed for time to market, but if you’re under time pressure then it’s better to specify sufficient memory plus extra in reserve.
The number of tensor cores is less critical than the size of the GPU memory, since it only affects the computation speed. However, if you need to train a model faster, then the more cores the better.
Memory bandwidth is critical if you need to scale GPUs horizontally, for example, when the training time is too long, the dataset is huge, or the model is highly complex. In such cases, check whether the GPU instances support interconnects, such as NVLink or InfiniBand.

So, memory size is the most important thing when training a DL model: if you don’t have enough memory, you won’t be able to run the training. For example, to run the LLaMA model with 7 billion parameters at full precision, the Hugging Face technical team suggests using 28 GB of GPU RAM. This is the result of multiplying 7×4, where 7 is the tensor size (7B), and 4 is four bits for FP32 (the full-precision format.) For FP16 (half-precision), 14 GB is enough (7×2.) The full-precision format provides greater accuracy. The half-precision format provides less accuracy but makes training faster and more memory efficient.

Kubernetes as a Tool for Improving DL Inference

To improve DL inference, you can containerize your model and use a managed Kubernetes service with GPU instances as worker nodes. This will help you achieve greater scalability, resiliency, and cost savings. With Kubernetes, you can automatically scale resources as needed. For example, if the number of user prompts to your model spikes, you will need more compute resources for inference. In that case, more GPUs are allocated for DL inference only when needed, meaning you have no idle resources and no monetary waste.

Managed Kubernetes also reduces operational overhead and helps to automate cluster maintenance. A provider manages master nodes (the control plane.) You manage only the worker nodes on which you deploy your model, instead focusing on its development.

AI Frameworks that Power Deep Learning on GPUs

Various free, open-source AI frameworks help to train deep neural networks and are specifically designed to be run on GPU instances. All of the following frameworks also support NVIDIA’s Compute Unified Device Architecture (CUDA.) This is a parallel computing platform and API that enables the development of GPU-accelerated applications, including DL models. CUDA can significantly improve their performance.

TensorFlow is a library for ML and AI focused on deep learning model training and inference. With TensorFlow, developers can create dataflow graphs. Each graph node represents a matrix operation, and each connection between nodes is a matrix (tensor.) TensorFlow can be used with several programming languages, including Python, C++, JavaScript, and Java.

PyTorch is a machine-learning framework based on the Torch library. It provides two high-level features: tensor computing with strong acceleration via GPUs, and deep neural networks built on a tape-based auto-differentiation system. PyTorch is considered more flexible than TensorFlow because it gives developers more control over the model architecture.

MXNet is a portable and lightweight DL framework that can be used for DL training and inference not only on GPUs, but also on CPUs and TPUs (Tensor Processing Units.) MXNet supports Python, C++, Scala, R, and Julia.

PaddlePaddle is a powerful, scalable, and flexible framework that, like MXNet, can be used to train and deploy deep neural networks on a variety of devices. PaddlePaddle provides over 500 algorithms and pretrained models to facilitate rapid DL development.

Gcore’s Cloud GPU Infrastructure

As a cloud provider, Gcore offers AI GPU Infrastructure powered by NVIDIA chips:

Virtual machines and bare metal servers with consumer- and enterprise-grade GPUs
AI clusters based on servers with A100 and H100 GPUs
Managed Kubernetes with virtual and physical GPU instances that can be used as worker nodes

With Gcore’s GPU infrastructure, you can train and deploy DL models of any type and size. To learn more about our cloud services and how they can help in your AI journey, contact our team.

Conclusion

The unique design of GPUs, focused on parallelism and efficient matrix operations, makes them the perfect companion for the AI challenges of today and tomorrow, including deep learning. Their profound advantages over CPUs are underscored by their computational efficiency, memory bandwidth, and throughput capabilities.

When seeking a GPU, consider your specific deep learning goals, time, and budget. These help you to choose an optimal GPU configuration.

Book a GPU instance

Gcore recognized as a Leader in the 2025 GigaOm Radar for AI Infrastructure

We’re proud to share that Gcore has been named a Leader in the 2025 GigaOm Radar for AI Infrastructure—the only European provider to earn a top-tier spot. GigaOm’s rigorous evaluation highlights our leadership in platform capability and innovation, and our expertise in delivering secure, scalable AI infrastructure.Inside the GigaOm Radar: what’s behind the Leader statusThe GigaOm Radar report is a respected industry analysis that evaluates top vendors in critical technology spaces. In this year’s edition, GigaOm assessed 14 of the world’s leading AI infrastructure providers, measuring their strengths across key technical and business metrics. It ranks providers based on factors such as scalability and performance, deployment flexibility, security and compliance, and interoperability.Alongside the ranking, the report offers valuable insights into the evolving AI infrastructure landscape, including the rise of hybrid AI architectures, advances in accelerated computing, and the increasing adoption of edge deployment to bring AI closer to where data is generated. It also offers strategic takeaways for organizations seeking to build scalable, secure, and sovereign AI capabilities.Why was Gcore named a top provider?The specific areas in which Gcore stood out and earned its Leader status are as follows:A comprehensive AI platform offering Everywhere Inference and GPU Cloud solutions that support scalable AI from model development to productionHigh performance powered by state-of-the-art NVIDIA A100, H100, H200 and GB200 GPUs and a global private network ensuring ultra-low latencyAn extensive model catalogue with flexible deployment options across cloud, on-premises, hybrid, and edge environments, enabling tailored global AI solutionsExtensive capacity of cutting-edge GPUs and technical support in Europe, supporting European sovereign AI initiativesChoosing Gcore AI is a strategic move for organizations prioritizing ultra-low latency, high performance, and flexible deployment options across cloud, on-premises, hybrid, and edge environments. Gcore’s global private network ensures low-latency processing for real-time AI applications, which is a key advantage for businesses with a global footprint.GigaOm Radar, 2025Discover more about the AI infrastructure landscapeAt Gcore, we’re dedicated to driving innovation in AI infrastructure. GPU Cloud and Everywhere Inference empower organizations to deploy AI efficiently and securely, on their terms.If you’re planning your AI infrastructure roadmap or rethinking your current one, this report is a must-read. Explore the report to discover how Gcore can support high-performance AI at scale and help you stay ahead in an AI-driven world.Download the full report

Protecting networks at scale with AI security strategies

Network cyberattacks are no longer isolated incidents. They are a constant, relentless assault on network infrastructure, probing for vulnerabilities in routing, session handling, and authentication flows. With AI at their disposal, threat actors can move faster than ever, shifting tactics mid-attack to bypass static defenses.Legacy systems, designed for simpler threats, cannot keep pace. Modern network security demands a new approach, combining real-time visibility, automated response, AI-driven adaptation, and decentralized protection to secure critical infrastructure without sacrificing speed or availability.At Gcore, we believe security must move as fast as your network does. So, in this article, we explore how L3/L4 network security is evolving to meet new network security challenges and how AI strengthens defenses against today’s most advanced threats.Smarter threat detection across complex network layersModern threats blend into legitimate traffic, using encrypted command-and-control, slow drip API abuse, and DNS tunneling to evade detection. Attackers increasingly embed credential stuffing into regular login activity. Without deep flow analysis, these attempts bypass simple rate limits and avoid triggering alerts until major breaches occur.Effective network defense today means inspection at Layer 3 and Layer 4, looking at:Traffic flow metadata (NetFlow, sFlow)SSL/TLS handshake anomaliesDNS request irregularitiesUnexpected session persistence behaviorsGcore Edge Security applies real-time traffic inspection across multiple layers, correlating flows and behaviors across routers, load balancers, proxies, and cloud edges. Even slight anomalies in NetFlow exports or unexpected east-west traffic inside a VPC can trigger early threat alerts.By combining packet metadata analysis, flow telemetry, and historical modeling, Gcore helps organizations detect stealth attacks long before traditional security controls react.Automated response to contain threats at network speedDetection is only half the battle. Once an anomaly is identified, defenders must act within seconds to prevent damage.Real-world example: DNS amplification attackIf a volumetric DNS amplification attack begins saturating a branch office's upstream link, automated systems can:Apply ACL-based rate limits at the nearest edge routerFilter malicious traffic upstream before WAN degradationAlert teams for manual inspection if thresholds escalateSimilarly, if lateral movement is detected inside a cloud deployment, dynamic firewall policies can isolate affected subnets before attackers pivot deeper.Gcore’s network automation frameworks integrate real-time AI decision-making with response workflows, enabling selective throttling, forced reauthentication, or local isolation—without disrupting legitimate users. Automation means threats are contained quickly, minimizing impact without crippling operations.Hardening DDoS mitigation against evolving attack patternsDDoS attacks have moved beyond basic volumetric floods. Today, attackers combine multiple tactics in coordinated strikes. Common attack vectors in modern DDoS include the following:UDP floods targeting bandwidth exhaustionSSL handshake floods overwhelming load balancersHTTP floods simulating legitimate browser sessionsAdaptive multi-vector shifts changing methods mid-attackReal-world case study: ISP under hybrid DDoS attackIn recent years, ISPs and large enterprises have faced hybrid DDoS attacks blending hundreds of gigabits per second of L3/4 UDP flood traffic with targeted SSL handshake floods. Attackers shift vectors dynamically to bypass static defenses and overwhelm infrastructure at multiple layers simultaneously. Static defenses fail in such cases because attackers change vectors every few minutes.Building resilient networks through self-healing capabilitiesEven the best defenses can be breached. When that happens, resilient networks must recover automatically to maintain uptime.If BGP route flapping is detected on a peering session, self-healing networks can:Suppress unstable prefixesReroute traffic through backup transit providersPrevent packet loss and service degradation without manual interventionSimilarly, if a VPN concentrator faces resource exhaustion from targeted attack traffic, automated scaling can:Spin up additional concentratorsRedistribute tunnel sessions dynamicallyMaintain stable access for remote usersGcore’s infrastructure supports self-healing capabilities by combining telemetry analysis, automated failover, and rapid resource scaling across core and edge networks. This resilience prevents localized incidents from escalating into major outages.Securing the edge against decentralized threatsThe network perimeter is now everywhere. Branches, mobile endpoints, IoT devices, and multi-cloud services all represent potential entry points for attackers.Real-world example: IoT malware infection at the branchMalware-infected IoT devices at a branch office can initiate outbound C2 traffic during low-traffic periods. Without local inspection, this activity can go undetected until aggregated telemetry reaches the central SOC, often too late.Modern edge security platforms deploy the following:Real-time traffic inspection at branch and edge routersBehavioral anomaly detection at local points of presenceAutomated enforcement policies blocking malicious flows immediatelyGcore’s edge nodes analyze flows and detect anomalies in near real time, enabling local containment before threats can propagate deeper into cloud or core systems. Decentralized defense shortens attacker dwell time, minimizes potential damage, and offloads pressure from centralized systems.How Gcore is preparing networks for the next generation of threatsThe threat landscape will only grow more complex. Attackers are investing in automation, AI, and adaptive tactics to stay one step ahead. Defending modern networks demands:Full-stack visibility from core to edgeAdaptive defense that adjusts faster than attackersAutomated recovery from disruption or compromiseDecentralized detection and containment at every entry pointGcore Edge Security delivers these capabilities, combining AI-enhanced traffic analysis, real-time mitigation, resilient failover systems, and edge-to-core defense. In a world where minutes of network downtime can cost millions, you can’t afford static defenses. We enable networks to protect critical infrastructure without sacrificing performance, agility, or resilience.Move faster than attackers. Build AI-powered resilience into your network with Gcore.Check out our docs to see how DDoS Protection protects your network

Introducing Gcore for Startups: created for builders, by builders

Building a startup is tough. Every decision about your infrastructure can make or break your speed to market and burn rate. Your time, team, and budget are stretched thin. That’s why you need a partner that helps you scale without compromise.At Gcore, we get it. We’ve been there ourselves, and we’ve helped thousands of engineering teams scale global applications under pressure.That’s why we created the Gcore Startups Program: to give early-stage founders the infrastructure, support, and pricing they actually need to launch and grow.At Gcore, we launched the Startups Program because we’ve been in their shoes. We know what it means to build under pressure, with limited resources, and big ambitions. We wanted to offer early-stage founders more than just short-term credits and fine print; our goal is to give them robust, long-term infrastructure they can rely on.Dmitry Maslennikov, Head of Gcore for StartupsWhat you get when you joinThe program is open to startups across industries, whether you’re building in fintech, AI, gaming, media, or something entirely new.Here’s what founders receive:Startup-friendly pricing on Gcore’s cloud and edge servicesCloud credits to help you get started without riskWhite-labeled dashboards to track usage across your team or customersPersonalized onboarding and migration supportGo-to-market resources to accelerate your launchYou also get direct access to all Gcore products, including Everywhere Inference, GPU Cloud, Managed Kubernetes, Object Storage, CDN, and security services. They’re available globally via our single, intuitive Gcore Customer Portal, and ready for your production workloads.When startups join the program, they get access to powerful cloud and edge infrastructure at startup-friendly pricing, personal migration support, white-labeled dashboards for tracking usage, and go-to-market resources. Everything we provide is tailored to the specific startup’s unique needs and designed to help them scale faster and smarter.Dmitry MaslennikovWhy startups are choosing GcoreWe understand that performance and flexibility are key for startups. From high-throughput AI inference to real-time media delivery, our infrastructure was designed to support demanding, distributed applications at scale.But what sets us apart is how we work with founders. We don’t force startups into rigid plans or abstract SLAs. We build with you 24/7, because we know your hustle isn’t a 9–5.One recent success story: an AI startup that migrated from a major hyperscaler told us they cut their inference costs by over 40%…and got actual human support for the first time. What truly sets us apart is our flexibility: we’re not a faceless hyperscaler. We tailor offers, support, and infrastructure to each startup’s stage and needs.Dmitry MaslennikovWe’re excited to support startups working on AI, machine learning, video, gaming, and real-time apps. Gcore for Startups is delivering serious value to founders in industries where performance, cost efficiency, and responsiveness make or break product experience.Ready to scale smarter?Apply today and get hands-on support from engineers who’ve been in your shoes. If you’re an early-stage startup with a working product and funding (pre-seed to Series A), we’ll review your application quickly and tailor infrastructure that matches your stage, stack, and goals.To get started, head on over to our Gcore for Startups page and book a demo.Discover Gcore for Startups

Announcing a new AI-optimized data center in Southern Europe

Good news for businesses operating in Southern Europe! Our newest cloud regions in Sines, Portugal, give you faster, more local access to the infrastructure you need to run advanced AI, ML, and HPC workloads across the Iberian Peninsula and wider region. Sines-2 marks the first region launched in partnership with Northern Data Group, signaling a new chapter in delivering powerful, workload-optimized infrastructure across Europe. And Sines-3 expands capacity and availability for the region.Strategically positioned in Portugal, Sines-2 and Sines-3 enhance coverage in Southern Europe, providing a lower-latency option for customers operating in or targeting this region. With the explosive growth of AI, machine learning, and compute-intensive workloads, these new regions are designed to meet escalating demand with cutting-edge GPU and storage capabilities.You can activate Sines-2 and Sines-3 for GPU Cloud or Everywhere Inference today with just a few clicks.Built for AI, designed to scaleSines-2 and Sines-3 bring with them next-generation infrastructure features, purpose-built for today's most demanding workloads:NVIDIA H100 GPUs: Unlock the full potential of AI/ML training, high-performance computing (HPC), and rendering workloads with access to H100 GPUs.VAST NFS (file sharing protocol) support: Benefit from scalable, high-throughput file storage ideal for data-intensive operations, research, and real-time AI workflows.IaaS portfolio: Deploy Virtual Machines, manage storage, and scale infrastructure with the same consistency and reliability as in our flagship regions.Organizations operating in Portugal, Spain, and nearby regions can now deploy workloads closer to end users, improving application performance. For finance, healthcare, public sector, and other organisations running sensitive workloads that must stay within a country or region, Sines-2 and Sines-3 are easy ways to access state-of-the-art GPUs with simplified compliance. Whether you're building AI models, running simulations, or managing rendering pipelines, Sines-2 and Sines-3 offer the performance, capacity, availability, and proximity you need.And best of all, servers are available and ready to deploy today.Run your AI workloads in Portugal todayWith these new Sines regions and our partnership with Northern Data Group, we're making it easier than ever for you to run AI workloads at scale. If you need speed, flexibility, and global reach, we're ready to power your next AI breakthrough.Unlock the power of Sines-2 and Sines-3 today

GTC Europe 2025: watch Seva Vayner on European AI trends

Inference is becoming Europe’s core AI workload. Telcos are moving fast on low-latency infrastructure. Data sovereignty is shaping every deployment decision.At GTC Europe, these trends were impossible to miss. The conversation has moved beyond experimentation to execution, with exciting, distinctly European priorities shaping conversations.Gcore’s own Seva Vayner, Product Director of Edge Cloud and AI, shared his take on this year’s event during GTC. He sees a clear shift in what European enterprises are asking for and what the ecosystem is ready to deliver.Scroll on to watch the interview and see where AI in Europe is heading.“It’s really a pleasure to see GTC in Europe”After years of global AI strategy being shaped primarily by the US and China, Europe is carving its own path. Seva notes that this year’s GTC Europe wasn’t just a regional spin-off. it marked the emergence of a distinctly European voice in AI development.“First of all, it's really a pleasure to see that GTC in Europe happened, and that a lot of European companies came together to have the conversation and build the ecosystem.”As Seva notes, the real excitement came from watching European players collaborate. The focus was less on following global trends and more on co-creating the region’s own AI trajectory.“Inference workloads will grow significantly in Europe”Inference was a throughline across nearly every session. As Seva points out, Europe is still at the early stages of adopting inference at scale, but the shift is happening fast.“Europe is only just starting its journey into inference, but we already see the trend. Over the next 5 to 10 years, inference workloads will grow significantly. That’s why GTC Europe is becoming a permanent, yearly event.”This growth won’t just be driven by startups. Enterprises, governments, and infrastructure providers are all waking up to the importance of real-time, regional inference capabilities.“There’s real traction. Companies are more and more interested in how to deliver low-latency inference. In a few years, this will be one of the most crucial workloads for any GPU cloud in Europe.”“Telcos are getting serious about AI”One of the clearest signs of maturity at GTC Europe was that telcos and CSPs are actively looking to deploy AI. And they’re asking the hard questions about how to integrate it into their infrastructure at a vast scale.“One of the most interesting things is how telcos are thinking about adopting AI workloads on their infrastructure to deliver low latency. Sovereignty is crucial, especially for customers looking to serve training or inference workloads inside their region. And also user experience: how can I get GPU capacity in clusters, or deliver inference in just a few clicks?”This theme—fast, sovereign, self-service AI—popped up again and again. Telcos and service providers want frictionless deployment and local control.“Companies are struggling most with data”While model deployment and infrastructure strategy took center stage, Seva reminds us that data processing and storage remains the bottleneck. Enterprises know they need to adopt AI, but they’re still navigating where and how to store and process the data that fuels it.“One of the biggest struggles for end customers is the data: where it’s processed, where it’s stored, and what kind of capabilities are available. From a European perspective, we already see more and more companies looking for sovereign data privacy and simple, mature solutions for end users.”That’s a familiar challenge for enterprises operating under GDPR, NIS2, and other compliance frameworks. The new wave of AI infrastructure has to be built for performance and for trust.AI in Europe: responsible, scalable, and localSeva’s key takeaway is that AI in Europe is no longer about catching up, it’s about doing it differently. The questions have changed from “Should we do AI?” to “How do we scale it responsibly, reliably, and locally?”From sovereign deployment to edge-first infrastructure, GTC Europe 2025 showed that inference is the foundation of how European businesses plan to run AI. “The ecosystem is coming together,” explains Seva. “And the next five years will be crucial for defining how AI will work: not just in the cloud, but everywhere.”If you’re looking to reduce latency, cut costs, and stay compliant while deploying AI in production, we invite you to download our free ebook, The inference optimization playbook.Download our free inference optimization playbook

Gcore and Orange Business launch innovation program piloting joint solution to deliver sovereign inference as a service

Gcore and Orange Business have kicked off a strategic co-innovation program with the mission to deliver a scalable, production-grade AI inference service that is sovereign by design. By combining Orange Business’ secure, trusted cloud infrastructure and Gcore’s AI inference private deployment service, the collaboration empowers European enterprises and public sector organizations to run inference workloads at scale, without compromising on latency, control, or compliance.Gcore’s AI inference private deployment service is already live on Orange Business’ Cloud Avenue infrastructure. Selected enterprises across industries are actively testing it in real-world scenarios. These pilot customers are exploring how fast, secure, and compliant inference can accelerate their AI projects, cut deployment times, and reduce infrastructure overhead.The prototype will be demonstrated at NVIDIA GTC Paris, at the Taiga Cloud booth G26. Stop by any time to see it in action.The inference supercycle is underwayBy 2030, inference will comprise 70% of enterprise AI workloads. Telcos are well positioned to lead this shift due to their dense edge presence, licensed national data infrastructure, and long-standing trust relationships.Gcore’s inference solution provides a sovereign, edge-native inference layer. It enables users to serve real-time, GPU-intensive applications like agentic AI, trusted LLMs, computer vision, and predictive analytics, all while staying compliant with Europe’s evolving data and AI governance frameworks.From complexity to three clicksEnterprise AI doesn’t need to be hard. Deploying inference workloads at scale used to demand Kubernetes fluency, large MLOps teams, and costly trial-and-error.Now? It’s just three clicks:Pick a model: Choose from NVIDIA NIMs, open source, or proprietary libraries.Choose a region: Select one of Orange Business’ accredited EU data centers.Deploy: See your workloads go live in under 10 seconds.Enterprises can launch inference projects faster, test ideas more quickly, and deliver production-ready AI services without spending months on ML plumbing.Explore our blog to watch a demo showing how enterprises can deploy inference workloads in just three clicks and ten seconds.Sovereign by designAll model data, logs, and inference results are stored exclusively within Orange Business’ own data centers in France, Germany, Norway, and Sweden. Cross-border data transfer is opt-in only, helping ensure alignment with GDPR, sector-specific regulations, and the forthcoming EU AI Act.This platform is built for trust, transparency, and sovereignty by default. Customers maintain full control over their data, with governance baked into every layer of the deployment.Performance without trade-offsGcore’s AI inference solution avoids the latency spikes, cold starts, and resource waste common in traditional cloud AI setups. Key design features include:Smart GPU routing: Directs each request to the nearest in-region GPU, delivering real-time performance with sub-50ms latency.Pre-loaded models: Reduces cold start delays and improves response times.Secure multi-tenancy: Isolates customer data while maximizing infrastructure efficiency.The result is a production-ready inference platform optimized for both performance and compliance.Powering the future of AI infrastructureThis partnership marks a step forward for Europe’s sovereign AI capabilities. It highlights how telcos can serve as the backbone of next-generation AI infrastructure, hosting, scaling, and securing workloads at the edge.With hundreds of edge POPs, trusted national networks, and deep ties across vertical industries, Orange Business is uniquely positioned to support a broad range of use cases, including real-time customer service AI, fraud detection, healthcare diagnostics, logistics automation, and public sector digital services.What’s next: validating real-world performanceThis phase of the Gcore and Orange Business program is focused on validating the solution through live customer deployments and performance benchmarks. Orange Business will gather feedback from early access customers to shape its future sovereign inference service offering. These insights will drive refinements and shape the roadmap ahead of a full commercial launch planned for later this year.Gcore and Orange Business are committed to delivering a sovereign inference service that meets Europe’s highest standards for speed, simplicity, and trust. This co-innovation program lays the foundation for that future.Ready to discover how Gcore and Orange Business can deliver sovereign inference as a service for your business?Request a preview