Radar has landed - discover the latest DDoS attack trends. Get ahead, stay protected.Get the report
Under attack?

Products

Solutions

Resources

Partners

Why Gcore

  1. Home
  2. Blog
  3. Maximizing AI and HPC Workloads with NVIDIA H200 Tensor Core GPU

Maximizing AI and HPC Workloads with NVIDIA H200 Tensor Core GPU

  • By Gcore
  • January 9, 2024
  • 6 min read
Maximizing AI and HPC Workloads with NVIDIA H200 Tensor Core GPU

In the past decade, the fields of artificial intelligence (AI) in software and high-performance computing (HPC) in hardware have undergone a significant transformation. This revolution has led to breakthrough applications across diverse sectors, improving human life through advancements in fields such as genomic sequencing, fuel conservation, and earthquake prediction. Developing such applications requires researchers and developers to work with a gigantic volume of data using complex algorithms. Such demanding computing tasks require a new generation of data processors capable of tackling parallel and sophisticated workloads. In this article, we’ll explain how the upcoming NVIDIA H200 Tensor Core GPU, slated for release in Q2 of 2024, will help to optimize your AI and HPC workloads. We will also explore how this technology enables the efficient creation of data-driven applications and discuss the current alternatives available.

Why Use GPU for AI and HPC Workloads?

Let’s start by exploring why GPUs are the hardware of choice for running AI and HPC workloads, and what kinds of workloads we’re talking about.

Benefits of GPUs Compared to CPUs

To understand why the new NVIDIA H200 GPU (graphics processing unit) is so innovative compared to its previous GPU, it’s important to understand how GPUs differ from CPUs (central processing units) in application development. GPUs offer distinct advantages in processing speed and efficiency, particularly for tasks involving parallel calculations on a massive amount of data. For an in-depth look at the differences between GPUs and CPUs, particularly in the context of deep learning, see our detailed article on deep learning GPUs.

What Are AI Workloads and Where Do GPUs Fit In?

The development of AI applications encompasses a series of critical processes, each contributing to what we term “AI workloads.” These processes include:

  • Data collection
  • Data preprocessing
  • Model selection
  • Model training
  • Model testing
  • Model optimization
  • Deployment
  • Continuous learning

GPUs play a major role in accelerating these workloads, especially in tasks like deep learning, due to their ability to process thousands of threads simultaneously, making them excellent at handling the complex mathematical computations required in AI development. The H200 stands out as an apt choice for such tasks due to its impressive computational power and high memory bandwidth, which significantly reduce the time taken to train and run AI models, thereby boosting efficiency and productivity.

Industry Use Cases for NVIDIA H200

Recent significant advances in high-performance computing and breakthroughs in large language models have expanded research and application development into previously challenging fields like climate science and genome sequencing. The NVIDIA H200, anticipated to be the highest-performance GPU on the market, will play a crucial role in this progress when it’s released in Q2 2024. To explore the advancements of the NVIDIA H200 over its predecessors like the A100 and H100, check out our comparative analysis.

Scientific Simulations

NVIDIA has a history of facilitating rapid scientific simulations in areas such as 3D workflows, and catalyst modeling, particularly through products like the NVIDIA Omniverse. This facilitates the development of 3D workflows and applications based on Universal Scene Description (USD) across various industries, including robotics, gaming, chemicals, and automotive. A notable example is the BMW Group, which used NVIDIA Omniverse to create its first entirely virtual factory.

The introduction of the H200 chip is expected to further accelerate and enhance these scientific simulations within the NVIDIA product range.

Genomic Software

In biomedical engineering, a field poised for rapid growth, NVIDIA Clara is a key player. This genomic software suite accelerates various genome sequencing analyses, supporting the development of a wide range of healthcare applications, including drug discovery, precision medicine, and clinical diagnostics.

With the introduction of the H200 chip, genomic software could yield faster, more affordable results. This could positively affect patient outcomes in areas like cancer treatment.

Financial Analysis

In the past, financial analysis used CPUs as the main calculation engine. With the evolution of AI and use of GPUs, faster and more precise financial analysis calculations are possible. Many cloud providers now offer AI GPU infrastructure, enabling financial firms to adopt a cost-effective, pay-as-you-go approach to their research and application development.

The new H200 chip further accelerates research outcomes and the development of new features, providing a competitive edge in the fast-paced and risk-oriented financial market.

Image and Video Processing

Industries like gaming and streaming require enormous computing power and memory for image and video processing. With this in mind, NVIDIA created the NVIDIA RTX technology which has revolutionized computer graphics with the latest enhancements in AI and simulations.

Wylie Co., a digital imagery studio known for its Oscar-winning work on Dune, is one of the most successful studios to have heavily applied AI successfully in its rendering process, wire removal, and visual effects. After switching from CPUs to GPUs, Wylie achieved a 24-fold performance improvement and reduced energy consumption to one-fifth of its previous level.

With the introduction of the new H200 chip, companies working with image and video processing can raise their performance while simultaneously reducing energy consumption—a win for business revenue and for the environment.

How NVIDIA H200 Drives AI and HPC Workloads

The NVIDIA H200 boasts innovative HBM3e memory technology, delivering 141 GB of memory at a remarkable speed of 4.8 TB per second. This doubles the capacity and delivers a 2.4-fold increase in bandwidth compared to the NVIDIA A100 chip. The H200 GPU chip also offers 1.4 times more bandwidth than the H100 GPU, marking a significant leap in performance and energy efficiency. Detailed specifications can be found on the NVIDIA H200 data sheet.

High-Performance AI Inference

The NVIDIA H200 will drive a new era in high-performance AI inference. For those new to the concept, AI inference is the process where a trained machine learning model applies its knowledge to previously unseen data, generating relevant outputs for specific inputs in a given context. For a comprehensive understanding of how it works, take a look at our detailed AI inference article. This process is integral to the functionality of large language models (LLMs) like OpenAI’s Chat-GPT or Google’s Bard, which have seen dramatic growth over the last few years. Notable large language models include GPT-3 175B, Llama2 70B, and Llama2 13B.

As LLMs advance, the demand for more powerful hardware—such as the H200—to support their inference processes increases. The NVIDIA H200 GPU has demonstrated exceptional performance, offering 18 times the efficiency of the A100 GPU in a recent benchmark for the GPT-3 175B model. The NVIDIA H100 GPU, by comparison, proved to be 11 times more efficient than the A100 GPU.

Figure 1: Performance improvement for GPT-3 application using A100, H100, and H200 GPUs

The NVIDIA GPU chip comes with the latest version of its own built-in LLM, TensorRT-LLM, a toolkit that provides optimized solutions for inferencing large language models like GPT-3 and Llama 2. When running on the Llama-70B model, the new NVIDIA H200 chip achieves a 1.9-fold improvement in throughput optimization over the NVIDIA H100, which uses the previous version of TensorRT-LLM.

Figure 2: Throughput improvement that the H200 GPU chip offers when running on the Llama 70B model, compared with the previous H100 chip

In the case of the Llama 2 13B model, the NVIDIA H200 GPU chip achieves up to 1.4 times the throughput optimization compared to its predecessor, the NVIDIA H100 GPU.

Figure 3: Details of the TensorRT-LLM improvement for the H200 chip, compared to the H100 chip

Stability AI, one of the world’s leading open-source generative AI companies, significantly enhanced the speed of its text-to-image AI product by integrating NVIDIA TensorRT. This integration, along with the use of the converted ONNX (Open Neural Network Exchange) model on NVIDIA H100 chips led to a notable performance gain. This product can now generate high-definition images in just 1.47 seconds, effectively doubling its previous performance.

Below is a summary of the enhanced image throughput performance achieved by Stable Diffusion XL 1.0. This model, designed to generate and modify images based on text prompts, was created by Stability AI in collaboration with NVIDIA.

Figure 4: Details in improvement for image throughput for 30 steps at 1024×1024 for Stable Diffusion XL 1.0

By applying the NVIDIA TensorRT library to their chips, Stability AI has significantly enhanced the image throughput of Stable Diffusion XL 1.0, particularly on the NVIDIA H100 chips, achieving up to a 70% improvement.

With the introduction of the more optimized latest TensorRT version on the NVIDIA H200 chip, companies like Stability AI, utilizing open-source, generative libraries such as Stable Diffusion XL 1.0, are poised to experience even greater performance gains.

High-Performance Computing

In AI, handling complex calculations on large datasets requires robust high-performance computing (HPC) capabilities. The NVIDIA H200 chip significantly improves performance, showing a 110-fold improvement over the Dual x86 CPU in the MILC project, a collaborative project studying the theory of strong interactions of subatomic physics.

Figure 5: H200 performance improvement for the MILC project, compared to the Dual x86 CPU

On average, the NVIDIA H200 chip doubles the performance of the NVIDIA A100 chip for HPC applications, compared to the 1.7-fold improvement shown by the NVIDIA H100.

Figure 6: Overall performance improvement of the H200 for HPC applications compared to the H100 and A100 GPUs

Energy Efficiency

In recent years, software companies have grown aware of the environmental impact of software development. NVIDIA employs sustainable computing best practices when creating its products.

With the H200 GPU chip, energy efficiency and TCO (total cost of ownership)—an estimation of the expenses associated with purchasing, deploying, using, and retiring a piece of equipment—reach new levels. While the H200 GPU chip offers enormous performance improvements, it only consumes as much power as its predecessor, the H100. Moreover, the H200 GPU is 50% more efficient than the H100 for both energy usage and total cost of ownership.

Figure 7: The H200 chip’s reduction in LLM energy use compared to the H100

This significant improvement in energy efficiency for the H200 GPU chip is achieved by optimizing the NVIDIA Hopper architecture, enabling better performance from its GPUs.

Conclusion

The new NVIDIA H200 Tensor Core GPU helps to maximize the AI and HPC workloads so that you can create data-intensive applications more easily and effectively. However, NVIDIA H200 GPU chips are expected to be costly to run, especially for projects that are in MVP phases or applications that only occasionally need to work with large volumes of data and workloads. In such scenarios, it’s prudent to consider a cloud platform that offers a pay-per-use plan, rather than investing in high-performance GPU infrastructure yourself.

With Gcore AI GPU infrastructure, you can choose the NVIDIA GPU solution that best fits your workload. You’re only charged for actual use, optimizing the cost of your AI infrastructure. We offer the latest and most efficient options, including A100s and H100s, with L40S coming soon. Explore the differences in our dedicated article.

Get Gcore AI GPU

Related articles

3 clicks, 10 seconds: what real serverless AI inference should look like

Deploying a trained AI model could be the easiest part of the AI lifecycle. After the heavy lifting of data collection, training, and optimization, pushing a model into production is where “the rubber hits the road”, meaning the business expects to see the benefits of invested time and resources. In reality, many AI projects fail in production because of poor performance stemming from suboptimal infrastructure conditions.There are broadly speaking 2 paths developers can take when deploying inference: DIY, which is time and resource-consuming and requires domain expertise from several teams within the business, or opt for the ever-so-popular “serverless inference” solution. The latter is supposed to simplify the task at hand and deliver productivity, cutting down effort to seconds, not hours. Yet most platforms offering “serverless” AI inference still feel anything but effortless. They require containers, configs, and custom scripts. They bury users in infrastructure decisions. And they often assume your data scientists are also DevOps engineers. It’s a far cry from what “serverless” was meant to be.At Gcore, we believe real serverless inference means this: three clicks and ten seconds to deploy a model. That’s not a tagline—it’s the experience we built. And it’s what infrastructure leaders like Mirantis are now enabling for enterprises through partnerships with Gcore.Why deployment UX matters more than you thinkServerless inference isn’t just a backend architecture choice. It’s a business enabler, a go-to-market accelerator, an ROI optimizer, a technology democratizer—or, if poorly executed, a blocker.The reality is that inference workloads are a key point of interface between your AI product or service and the customer. If deployment is clunky, you’re struggling to keep up with demand. If provisioning takes too long, latency spikes, performance is inconsistent, and ultimately your service doesn’t scale. And if the user experience is unclear or inconsistent, customers end up frustrated—or worse, they churn.Developers and data scientists don’t want to manage infrastructure. They want to bring a model and get results without becoming cloud operators in the process.Dom Wilde, SVP Marketing, MirantisThat’s why deployment UX is no longer a nice-to-have. It’s the core of your product.The benchmark: 3 clicks, 10 secondsWe built Gcore Everywhere Inference to remove every unnecessary step between uploading a model and running it in production. That includes GPU provisioning, routing, scaling, isolation, and endpoint generation, all handled behind the scenes.The result is what we believe should be the default:Upload a modelConfirm deployment parametersClick deployAnd within ten seconds, you’re serving live inference.For platform teams supporting AI workloads, this isn’t just a better workflow. It’s a transformation.With Gcore, our customers can deliver not just self-service infrastructure but also inference as a product. End users can deploy models in seconds, and customers don’t have to micromanage the backend to support that.Dom Wilde, MirantisSimple frontend, powerful backendIt’s worth saying: simplifying the frontend doesn’t mean weakening the backend. Gcore’s platform is built for scale and performance, offering the following:Multi-tenant GPU isolationSmart routing based on location and loadAuto-scaling based on demandA unified API and UI for both automation and accessibilityWhat makes this meaningful isn’t just the tech, it’s the way it vanishes behind the scenes. With Gcore, Mirantis customers can deliver low-latency inference, maximize GPU efficiency, and meet data privacy requirements without touching low-level infrastructure.Many enterprises and cloud customers worry about underutilized GPUs. Now, every cycle is optimized. The platform handles the complexity so our customers can focus on building value.Dom Wilde, MirantisIf it’s not 3 clicks and 10 seconds, it’s not really serverlessThere’s a growing gap between what serverless inference promises and what most platforms deliver. Many cloud providers are focused on raw compute or orchestration, but overlook the deployment layer. That’s a mistake. Because when it comes to customer experience, ease of deployment is the product.Mirantis saw that early on and partnered with Gcore to bring inference-as-a-service to CSP and enterprise customers, fast. Now, customers can launch new offerings more quickly, reduce operational overhead, and improve the user experience with a simple, elegant deployment path.Redefine serverless AI with GcoreIf it takes a config file, a container, and a support ticket to deploy a model, it’s not serverless—it’s server-less-ish. With Gcore Everywhere Inference, we’ve set a new benchmark: three clicks and ten seconds to deploy AI. And, our model catalog offers a variety of popular models so you can get started right away.Whether you’re frustrated with slow, inefficient model deployments or looking for the most effective way to start using AI for your company, you need Gcore Everywhere Inference. Give our experts a call to discover how we can simplify your AI so you can focus on scaling and business logic.Let’s talk about your AI project

Run AI inference faster, smarter, and at scale

Training your AI models is only the beginning. The real challenge lies in running them efficiently, securely, and at scale. AI and reality meet in inference—the continuous process of generating predictions in real time. It is the driving force behind virtual assistants, fraud detection, product recommendations, and everything in between. Unlike training, inference doesn’t happen once; it runs continuously. This means that inference is your operational engine rather than just technical infrastructure. And if you don’t manage it well, you’re looking at skyrocketing costs, compliance risks, and frustrating performance bottlenecks. That’s why it’s critical to rethink where and how inference runs in your infrastructure.The hidden cost of AI inferenceWhile training large models often dominates the AI conversation, it’s inference that carries the greatest operational burden. As more models move into production, teams are discovering that traditional, centralized infrastructure isn’t built to support inference at scale.This is particularly evident when:Real-time performance is critical to user experienceRegulatory frameworks require region-specific data processingCompute demand fluctuates unpredictably across time zones and applicationsIf you don’t have a clear plan to manage inference, the performance and impact of your AI initiatives could be undermined. You risk increasing cloud costs, adding latency, and falling out of compliance.The solution: optimize where and how you run inferenceOptimizing AI inference isn’t just about adding more infrastructure—it’s about running models smarter and more strategically. In our new white paper, “How to Optimize AI Inference for Cost, Speed, and Compliance”, we break it down into three key decisions:1. Choose the right stage of the AI lifecycleNot every workload needs a massive training run. Inference is where value is delivered, so focus your resources on where they matter most. Learn when to use pretrained models, when to fine-tune, and when simple inference will do the job.2. Decide where your inference should runFrom the public cloud to on-prem and edge locations, where your model runs, impacts everything from latency to compliance. We show why edge inference is critical for regulated, real-time use cases—and how to deploy it efficiently.3. Match your model and infrastructure to the taskBigger models aren’t always better. We cover how to choose the right model size and infrastructure setup to reduce costs, maintain performance, and meet privacy and security requirements.Who should read itIf you’re responsible for turning AI from proof of concept into production, this guide is for you.Inference is where your choices immediately impact performance, cost, and customer experience, whether you’re managing infrastructure, developing models, or building AI-powered solutions. This white paper will help you cut through complexity and focus on what matters most: running smarter, faster, and more scalable inference.It’s especially relevant if you’re:A machine learning engineer or AI architect deploying models across environmentsA product manager introducing real-time AI featuresA technical leader or decision-maker managing compute, cloud spend, or complianceOr simply trying to scale AI without sacrificing controlIf inference is the next big challenge on your roadmap, this white paper is where to start.Scale AI inference seamlessly with GcoreEfficient, scalable inference is critical to making AI work in production. Whether you’re optimizing for performance, cost, or compliance, you need infrastructure that adapts to real-world demand. Gcore Everywhere Inference brings your models closer to users and data sources—reducing latency, minimizing costs, and supporting region-specific deployments.Our latest white paper, “How to optimize AI inference for cost, speed, and compliance”, breaks down the strategies and technologies that make this possible. From smart model selection to edge deployment and dynamic scaling, you’ll learn how to build an inference pipeline that delivers at scale.Ready to make AI inference faster, smarter, and easier to manage?Download the white paper

Securing vibe coding: balancing speed with cybersecurity

Vibe coding has emerged as a cultural phenomenon in 2025 software development. It’s a style defined by coding on instinct and moving fast, often with the help of AI, rather than following rigid plans. It lets developers skip exhaustive design phases and dive straight into building, writing code (or prompting an AI to write it) in a rapid, conversational loop. It has caught on fast and boasts a dedicated following of developers hosting vibe coding game jams.So why all the buzz? For one, vibe coding delivers speed and spontaneity. Enthusiasts say it frees them to prototype at the speed of thought, without overthinking architecture. A working feature can be blinked into existence after a few AI-assisted prompts, which is intoxicating for startups chasing product-market fit. But as with any trend that favors speed over process, there’s a flip side.This article explores the benefits of vibe coding and the cybersecurity risks it introduces, examines real incidents where "just ship it" coding backfired, and outlines how security leaders can keep up without slowing innovation.The upside: innovation at breakneck speedVibe coding addresses real development needs and has major benefits:Allows lightning-fast prototyping with AI assistance. Speed is a major advantage, especially for startups, and allows faster validation of ideas and product-market fit.Prioritizes creativity over perfection, rewarding flow and iteration over perfection.Lowers barriers to entry for non-experts. AI tooling lowers the skill floor, letting more people code.Produces real success stories, like a game built via vibe coding hitting $1M ARR in 17 days.Vibe coding aligns well with lean, agile, and continuous delivery environments by removing overhead and empowering rapid iteration.When speed bites backVibe coding isn’t inherently insecure, but the culture of speed it promotes can lead to critical oversights, especially when paired with AI tooling and lax process discipline. The following real-world incidents aren’t all examples of vibe coding per se, but they illustrate the kinds of risks that arise when developers prioritize velocity over security, skip reviews, or lean too heavily on AI without safeguards. These three cases show how fast-moving or under-documented development practices can open serious vulnerabilities.xAI API key leak (2025)A developer at Elon Musk’s AI company, xAI, accidentally committed internal API keys to a public GitHub repo. These keys provided access to proprietary LLMs trained on Tesla and SpaceX data. The leak went undetected for two months, exposing critical intellectual property until a researcher reported it. The error likely stemmed from fast-moving development where secrets were hardcoded for convenience.Malicious NPM packages (2024)In January 2024, attackers uploaded npm packages like warbeast2000 and kodiak2k, which exfiltrated SSH keys from developer machines. These were downloaded over 1,600 times before detection. Developers, trusting AI suggestions or searching hastily for functionality, unknowingly included these malicious libraries.OpenAI API key abuse via Replit (2024)Hackers scraped thousands of OpenAI API keys from public Replit projects, which developers had left in plaintext. These keys were abused to access GPT-4 for free, racking up massive bills for unsuspecting users. This incident shows how projects with weak secret hygiene, which is a risk of vibe coding, become easy targets.Securing the vibe: smart risk mitigationCybersecurity teams can enable innovation without compromising safety by following a few simple cybersecurity best practices. While these don’t offer 100% security, they do mitigate many of the major vulnerabilities of vibe coding.Integrate scanning tools: Use SAST, SCA, and secret scanners in CI/CD. Supplement with AI-based code analyzers to assess LLM-generated code.Shift security left: Embed secure-by-default templates and dev-friendly checklists. Make secure SDKs and CLI wrappers easily available.Use guardrails, not gates: Enable runtime protections like WAF, bot filtering, DDoS defense, and rate limiting. Leverage progressive delivery to limit blast radius.Educate, don’t block: Provide lightweight, modular security learning paths for developers. Encourage experimentation in secure sandboxes with audit trails.Consult security experts: Consider outsourcing your cybersecurity to an expert like Gcore to keep your app or AI safe.Secure innovation sustainably with GcoreVibe coding is here to stay, and for good reason. It unlocks creativity and accelerates delivery. But it also invites mistakes that attackers can exploit. Rather than fight the vibe, cybersecurity leaders must adapt: automating protections, partnering with devs, and building a culture where shipping fast doesn't mean shipping insecure.Want to secure your edge-built AI or fast-moving app infrastructure? Gcore’s Edge Security platform offers robust, low-latency protection with next-gen WAAP and DDoS mitigation to help you innovate confidently, even at speed. As AI and security experts, we understand the risks and rewards of vibe coding, and we’re ideally positioned to help you secure your workloads without slowing down development.Into vibe coding? Talk to us about how to keep it secure.

Qwen3 models available now on Gcore Everywhere Inference

We’ve expanded our model library for Gcore Everywhere Inference with three powerful additions from the Qwen3 series. These new models bring advanced reasoning, faster response times, and even better multilingual support, helping you power everything from chatbots and coding tools to complex R&D workloads.With Gcore Everywhere Inference, you can deploy Qwen3 models in just three clicks. Read on to discover what makes Qwen3 special, which Qwen3 model best suits your needs, and how to deploy it with Gcore today.Introducing the new Qwen3 modelsQwen3 is the latest evolution of the Qwen series, featuring both dense and Mixture-of-Experts (MoE) architectures. It introduces dual-mode reasoning, letting you toggle between “thinking” and “non-thinking” modes to balance depth and speed:Thinking mode (enable_thinking=True): The model adds a <think>…</think> block to reason step-by-step before generating the final response. Ideal for tasks like code generation or math where accuracy and logic matter.Non-thinking mode (enable_thinking=False): Skips the reasoning phase to respond faster. Best for straightforward tasks where speed is a priority.Model sizes and use casesWith three new sizes available, you can choose the level of performance required for your use case:Qwen3-14B: A 14B parameter model tuned for responsive, multilingual chat and instruction-following. Fast, versatile, and ready for real-time applications with lightning-fast responses.Qwen3-30B-A3B: Built on the Arch-3 backbone, this 30B model offers advanced reasoning and coding capabilities. It’s ideal for applications that demand deeper understanding and precision while balancing performance. It provides high-quality output with faster inference and better efficiency.Qwen3-32B: The largest Qwen3 model yet, designed for complex, high-performance tasks across reasoning, generation, and multilingual domains. It sets a new standard for what’s achievable with Gcore Everywhere Inference, delivering exceptional results with maximum reasoning power. Ideal for complex computation and generation tasks where every detail matters.ModelArchitectureTotal parametersActive parametersContext lengthBest suited forQwen3-14BDense14B14B128KMultilingual chatbots, instruction-following tasks, and applications requiring strong reasoning capabilities with moderate resource consumption.Qwen3-30B-A3BMoE30B3B128KScenarios requiring advanced reasoning and coding capabilities with efficient resource usage; suitable for real-time applications due to faster inference times.Qwen3-32BDense32B32B128KHigh-performance tasks demanding maximum reasoning power and accuracy; ideal for complex R&D workloads and precision-critical applications.How to deploy Qwen3 models with Gcore in just a few clicksGetting started with Qwen3 on Gcore Everywhere Inference is fast and frictionless. Simply log in to the Gcore Portal, navigate to the AI Inference section, and select your desired Qwen3 model. From there, deployment takes just three clicks—no setup scripts, no GPU wrangling, no DevOps overhead. Check out our docs to discover how it works.Deploying Qwen3 via the Gcore Customer Portal takes just three clicksPrefer to deploy programmatically? Use the Gcore API with your project credentials. We offer quick-start examples in Python and cURL to get you up and running fast.Why choose Qwen3 + Gcore?Flexible performance: Choose from three models tailored to different workloads and cost-performance needs.Immediate availability: All models are live now and deployable via portal or API.Next-gen architecture: Dense and MoE options give you more control over reasoning, speed, and output quality.Scalable by design: Built for production-grade performance across industries and use cases.With the latest Qwen3 additions, Gcore Everywhere Inference continues to deliver on performance, scalability, and choice. Ready to get started? Get a free account today to explore Qwen3 and deploy with Gcore in just a few clicks.Sign up free to deploy Qwen3 today

Run AI workloads faster with our new cloud region in Southern Europe

Good news for businesses operating in Southern Europe! Our newest cloud region in Sines, Portugal, gives you faster, more local access to the infrastructure you need to run advanced AI, ML, and HPC workloads across the Iberian Peninsula and wider region. Sines-2 marks the first region launched in partnership with Northern Data Group, signaling a new chapter in delivering powerful, workload-optimized infrastructure across Europe.Strategically positioned in Portugal, Sines-2 enhances coverage in Southern Europe, providing a lower-latency option for customers operating in or targeting this region. With the explosive growth of AI, machine learning, and compute-intensive workloads, this new region is designed to meet escalating demand with cutting-edge GPU and storage capabilities.Built for AI, designed to scaleSines-2 brings with it next-generation infrastructure features, purpose-built for today’s most demanding workloads:NVIDIA H100 GPUs: Unlock the full potential of AI/ML training, high-performance computing (HPC), and rendering workloads with access to H100 GPUs.VAST NFS (file sharing protocol) support: Benefit from scalable, high-throughput file storage ideal for data-intensive operations, research, and real-time AI workflows.IaaS portfolio: Deploy Virtual Machines, manage storage, and scale infrastructure with the same consistency and reliability as in our flagship regions.Organizations operating in Portugal, Spain, and nearby regions can now deploy workloads closer to end users, improving application performance. For finance, healthcare, public sector, and other organisations running sensitive workloads that must stay within a country or region, Sines-2 is an easy way to access state-of-the-art GPUs with simplified compliance. Whether you're building AI models, running simulations, or managing rendering pipelines, Sines-2 offers the performance and proximity you need.And best of all, servers are available and ready to deploy today.Run your AI workloads in Portugal todayWith Sines-2 and our partnership with Northern Data Group, we’re making it easier than ever for you to run AI workloads at scale. If you need speed, flexibility, and global reach, we’re ready to power your next AI breakthrough.Unlock the power of Sines-2 today

How AI is transforming gaming experiences

AI is reshaping how games are played, built, and experienced. Although we are in a period of flux where the optimal combination of human and artificial intelligence is still being ironed out, the potential for AI to greatly enhance both gameplay and development is clear.PlayStation CEO Hermen Hulst recently emphasized the importance of striking the right balance between the handcrafted human touch and the revolutionary advances that AI brings. AI will not replace the decades of design, storytelling, and craft laid down by humans—it will build on that foundation to unlock entirely new possibilities. In addition to an enhanced playing experience, AI is shaking up gaming aspects such as real-time analytics, player interactions, content generation, and security.In this article, we explore three specific developments that are enriching gaming storyworlds, as well as the technology that’s bringing them to life and what the future might hold.#1 Responsive NPC behavior and smarter opponentsAI is evolving to create more realistic, adaptive, and intelligent non-player characters (NPCs) that can react to individual player choices with greater depth and reasoning. The algorithms allow NPCs to respond dynamically to players’ decisions so they can adjust their strategies and behaviors in real time. This provides a more immersive and dynamic gameplay environment and means gamers have endless opportunities to experience new gaming adventures and write their own story every time.A recent example is Red Dead Redemption 2, which enables players to interact with NPCs in the Wild West. Players were impressed by its complexity and the ability to interact with fellow cowboys and bar patrons. Although this is limited for now, eventually, it could become like a video game version of the TV series Westworld, in which visitors pay to interact with incredibly lifelike robots in a Wild West theme park.AI also gives in-game opponents more “agency,” making them more reactive and challenging for players to defeat. This means smarter, more unpredictable enemies who provide a heightened level of achievement, novelty, and excitement for players.For example, AI Limit, released in early 2025, is an action RPG incorporating AI-driven combat mechanics. While drawing comparisons to Soulslike games, the developers emphasize its unique features, including the Sync Rate system, which adds depth to combat interactions.#2 AI-assisted matchmaking and player behavior predictionsAI-powered analytics can identify and predict player skill levels and playing styles, leading to more balanced and competitive matchmaking. A notable example is the implementation of advanced matchmaking systems in competitive games such as Apex Legends and Call of Duty: Modern Warfare III. These titles use AI algorithms to analyze not just skill levels but also playstyle preferences, weapon selections, and playing patterns to create matches optimized for player retention and satisfaction. The systems continuously learn from match outcomes to predict player behavior and create more balanced team compositions across different skill levels.By analyzing a player’s past performance, AI can also create smarter team formations. This makes for fairer and more rewarding multiplayer games, as players are matched with others who complement their skill and strategy.AI can monitor in-game interactions to detect and mitigate toxic behavior. This helps encourage positive social dynamics and foster a more collaborative and friendly online environment.#3 Personalized gaming experiencesMultiplayer games can use AI to analyze player behavior in real time, adjusting difficulty levels and suggesting tailored missions, providing rich experiences unique to each player. This creates personalized, player-centric gameplay that evolves dynamically and can change over time as a player’s knowledge and ability improve.Games like Minecraft and Skyrim already use AI to adjust difficulty and offer dynamic content, while Oasis represents a breakthrough as an AI-generated Minecraft-inspired world. The game uses generative AI to predict and render gameplay frames in real time, creating a uniquely responsive environment.Beyond world generation, modern games are also incorporating AI chatbots that give players real-time coaching and personalized skill development tips.How will AI continue to shape gaming?In the future, AI will continue to impact not just the player experience but also the creation of games. We anticipate AI revolutionizing game development in the following areas:Procedural content generation: AI will create vast, dynamic worlds or generate storylines, allowing for more expansive and diverse game worlds than are currently available.Game testing: AI will simulate millions of player interactions to help developers find bugs and improve gameplay.Art and sound design: AI tools will be used to a greater extent than at present to create game art, music, and voiceovers.How Gcore technology is powering AI gaming innovationIn terms of the technology behind the scenes, Gcore Everywhere Inference brings AI models closer to players by deploying them at the edge, significantly reducing latency for training and inference. This powers dynamic features like adaptive NPC behavior, personalized gameplay, and predictive matchmaking without sacrificing performance.Gcore technology differentiates itself with the following features:Supports all major frameworks, including PyTorch, TensorFlow, ONNX, and Hugging Face Transformers, making deploying your preferred model architecture easy.Offers multiple deployment modes, whether in the cloud, on-premise, or across our distributed edge network with 180+ global locations, allowing you to place inference wherever it delivers the best performance for your users.Delivers sub-50ms latency for inference workloads in most regions, even during peak gaming hours, thanks to our ultra-low-latency CDN and proximity to players.Scales horizontally, so studios can support millions of concurrent inferences for dynamic NPC behavior, matchmaking decisions, or in-game voice/chat moderation, without compromising gameplay speed.Keeps your models and training data private through confidential computing and data sovereignty controls, helping you meet compliance requirements across regions including Europe, LATAM, and MENA.With a low-latency infrastructure that supports popular AI frameworks, Gcore Everywhere Inference allows your studio to deploy custom models and deliver more immersive, responsive player experiences at scale. With our confidential computing solutions, you retain full control over your training assets—no data is shared, exposed, or compromised.Deliver next-gen gaming with Gcore AIAI continues to revolutionize industries, and gaming is no exception. The deployment of artificial intelligence can help make games even more exciting for players, as well as enabling developers to work smarter when creating new games.At Gcore, AI is our core and gaming is our foundation. AI is seamlessly integrated into all our solutions with one goal in mind: to help grow your business. As AI continues to evolve rapidly, we're committed to staying at the cutting edge and changing with the future. Contact us today to discover how Everywhere Inference can enhance your gaming offerings.Get a customized consultation about AI gaming deployment

Subscribe to our newsletter

Get the latest industry trends, exclusive insights, and Gcore updates delivered straight to your inbox.