Position: Home|News List

The three major telecom operators are vying to lay out the global AI computing power center "arms race".

Author:Semiconductor Industry ProfilePublish:2024-05-06

In recent years, the field of artificial intelligence has been experiencing an explosive development led by generative artificial intelligence large models. On November 30, 2022, OpenAI launched an AI chatbot named ChatGPT, which has outstanding natural language generation capabilities and quickly attracted over 100 million users worldwide. This led to a wave of large model development both domestically and internationally, with various large models such as Gemini, WENXIN YIYAN, Copilot, LLaMA, SAM, and SORA emerging like mushrooms after rain. 2022 has been hailed as the "Year of Large Models."

As a result, artificial intelligence is seen as a revolutionary technology with significant strategic implications for governments around the world. Data shows that the adoption rate of generative artificial intelligence in China has reached 15% this year, with a market size of approximately 14.4 trillion yuan. The adoption of generative artificial intelligence technology in the manufacturing, retail, telecommunications, and healthcare industries has all seen rapid growth.

As one of the three key elements driving the development of artificial intelligence, computing power is referred to as the "engine" and core driving force of artificial intelligence. Computing power refers to the ability of devices to process data and produce specific results. According to Chen Yuanmou, a senior analyst at the China Academy of Information and Communications Technology, for every one-point increase in computing power, it contributes approximately 0.36 percentage points to the digital economy and 0.17 percentage points to GDP growth.

The shortage of computing power has become a critical factor constraining the research and application of artificial intelligence. In response, the United States has imposed restrictions on the sale of high-end computing products to China, and companies such as Huawei, Loongson, Cambricon, Sugon, and Hygon have been added to the Entity List. Their chip manufacturing processes are restricted, and the domestic process nodes capable of mass production lag behind international advanced levels by 2-3 generations, while the performance of core computing chips lags behind by 2-3 generations.

01 Computing Power Shortage Drives the Emergence of Computing Power Centers

In the 21st century, mobile computing and cloud computing have flourished. The emergence of cloud computing allows computing power to "flow" to every corner in need, similar to water and electricity.

The rise of artificial intelligence has placed higher demands on computing power. The emergence of specialized hardware such as GPUs (Graphics Processing Units) and TPUs (Tensor Processing Units) has greatly improved processing efficiency and provided strong support for the training and inference of machine learning models.

In addition to the shortage of computing power, this has further promoted the emergence of a huge market for computing power centers. A computing power center refers to a computing center with high-performance computing, large-scale storage, and high-speed networks, aiming to provide large-scale, efficient, and low-cost computing power services.

Taking China as an example, many regions across the country are accelerating the layout of public computing power infrastructure. In Shanghai, the first national computing power trading platform and public computing power service platform have been established. In Guangzhou, the first domestic computing power resource release and sharing platform has been built. These public platforms have effectively connected supply and demand.

Currently, China is constructing national computing power hub nodes in 8 regions and planning 10 national data center clusters to build a national computing power network system. By the end of 2023, there are 128 smart computing center projects in China, with 83 projects disclosing their scale, totaling over 77,000P. In addition, 39 smart computing center projects will be put into operation by 2024.

02 Smart Computing Gap Persists, Three Major Telecom Operators Deploy Smart Computing Centers

In recent years, with the continuous emergence of AI large models, the demand for smart computing is rapidly increasing. Market research firm IDC predicts that by 2026, China's intelligent computing power scale will reach the level of 10^27 floating-point operations per second (ZFLOPS), reaching 1271.4EFLOPS. The "High-Quality Development Action Plan for Computing Infrastructure" released by six departments clearly outlines the construction pace of top-level computing power in the next three years. It mentions that the smart computing construction gap from 2023 to 2024 is 23EFlops. By 2025, the national computing power target will exceed 300EFlops, with the proportion of smart computing reaching 35%, and the smart computing power target being 105EFlops.

In response, the three major telecom operators have been actively deploying smart computing centers and proposing related strategic deployments.

China Unicom has laid out a "1+N+X" smart computing capability system, including a large-scale single smart computing center, N smart computing training and inference hub centers, and X smart computing inference nodes with localized capabilities.

China Mobile has strengthened the layout of "4+N+31+X" data centers, achieving computing resource coverage around hotspots, centers, and edges, with the construction of over 1000 edge nodes. The "4+N+31+X" data center system includes "4" referring to four hotspot business areas such as Beijing-Tianjin-Hebei, Yangtze River Delta, Guangdong-Hong Kong-Macao Greater Bay Area, and Chengdu-Chongqing; "N" refers to super-large data centers planned within 10 data center clusters; "31" refers to super-large data centers planned in each province; and "X" refers to city-level data centers and aggregation rooms.

China Telecom has proposed the concept of "cloud-network integration" and formed a "2+4+31+X+O" computing power layout. This specifically involves building integrated resource pools in two national cloud bases in Inner Mongolia and Guizhou, constructing large-scale public clouds in four major regions such as Beijing-Tianjin-Hebei, and establishing localized exclusive clouds in 31 provincial capitals and key cities. It also includes building differentiated edge clouds in X nodes and extending the computing power system to countries along the "Belt and Road" initiative.

03 Massive Investments by the US, Europe, and Japan Spark a Global AI Computing Power "Arms Race"

Currently, countries around the world are formulating their own artificial intelligence strategies and policies to promote the development of the AI industry.

The United States released the "American Artificial Intelligence Research and Development Strategic Plan" in 2016, which explicitly proposed to strengthen the construction of AI infrastructure. At the same time, the European Union also explicitly proposed the goal of strengthening infrastructure construction in its AI strategy released in 2018. These infrastructures mainly include computing resources, data resources, talent resources, etc. Japan has followed the footsteps of the United States and successively issued three versions of the "Artificial Intelligence Strategy" in 2019, 2021, and 2022. In April of last year, the Japanese government established an AI strategy group, with Deputy Prime Minister Hiroki Mura as the leader and members including officials in charge of AI policies from the Cabinet Secretariat, Ministry of Foreign Affairs, and the Digital Agency.

Under a series of strategic deployments, countries and regions such as the United States, Japan, and Europe are also competing to build computing power centers, sparking a global "arms race" in AI computing power.

In November of last year, the National Supercomputing Center of the United States and many leading companies in the AI field jointly established the Trillion Parameters Consortium (TPC). The consortium, composed of scientists from around the world, aims to jointly advance AI models for scientific discovery, with a special focus on giant models with one trillion or more parameters. Currently, TPC is developing scalable model architectures and training strategies, and organizing and optimizing scientific data for model training on current and future exascale computing platforms.

In addition, the Oak Ridge National Laboratory and Lawrence Livermore National Laboratory under the U.S. Department of Energy, as well as IBM and NVIDIA, have established the Supercomputer Excellence Center to jointly develop a new generation of HPC computers using IBM's Power processors and NVIDIA's Tesla K accelerators, with a floating point performance of at least 10 exaflops, and up to 30 exaflops.

In December 2020, the European Union planned to allocate 7.5 billion euros for the "Digital Europe" program, with 2.2 billion euros for supercomputing and 2.1 billion euros for artificial intelligence. The specific plan includes: acquiring at least one exascale supercomputer by the end of 2021; establishing a pan-European data space and testing facilities for artificial intelligence in the fields of health, manufacturing, and energy; deploying a pan-European quantum communication infrastructure and supporting the establishment of a network security product certification program; and setting up master's programs in artificial intelligence, advanced computing, and network security.

In March of last year, the British government pledged to invest £1 billion (US$1.3 billion) in supercomputing and artificial intelligence research, aiming to become a "technology superpower." As part of this strategy, the government stated that it hopes to spend approximately £900 million to build an "ultra-large-scale" computer that can build its own "BritGPT," comparable to OpenAI's generative AI chatbot.

In April of this year, the Japanese Ministry of Economy, Trade and Industry will provide a total of 72.5 billion yen in subsidies to five Japanese companies to build an artificial intelligence supercomputer, aiming to reduce technological dependence on the United States. The Japanese government will provide subsidies of 50.1 billion, 10.2 billion, 1.9 billion, 2.5 billion, and 7.7 billion yen to Sakura Internet, Japanese telecom giant KDDI, GMO Internet, Rutilea, and Highreso, respectively. The news shows that the "Industrial Technology Research Institute" of Japan will develop a supercomputer as early as this year, with a computing capacity approximately 2.5 times that of existing machines. Under the supervision of the Ministry of Economy, Trade and Industry of Japan, the institute will provide this supercomputer to domestic Japanese companies developing generative AI through cloud services.

In addition to government-supported projects, global technology companies are also investing in building computing power. Amazon plans to invest $148 billion over the next 15 years to build data centers around the world to meet the demand for artificial intelligence. Google announced a $3 billion investment in building or expanding data center campuses in Virginia and Indiana. Microsoft and OpenAI are also conducting a five-stage supercomputer construction project, with planned investments exceeding $115 billion, most of which will be used to purchase computing power facilities needed for AI.

04 Operators launch large-scale procurement, AI chip market erupts

The large-scale construction of computing power centers has also led to large-scale procurement of AI chips.

Recently, China Mobile launched a large-scale AI chip joint procurement, attracting widespread attention from the industry. China Mobile initiated the procurement for the new Smart Computing Center from 2024 to 2025. The tender announcement shows that the total scale of this project's procurement reaches 8,054 units. According to calculations by some institutions, based on the previous winning bid prices, the scale of this procurement may exceed 15 billion yuan.

A month ago, China Unicom also initiated the procurement of over 2,500 AI servers, and China Telecom had already taken action earlier. With the three major operators launching large-scale tenders, the domestic computing power deployment is seen as being on the "fast track" within the industry.

Just two months ago, China Mobile also released the 2023-2024 New Smart Computing Center (Trial Network) joint procurement project, with 12 packages corresponding to the procurement of a total of 2,454 AI training servers (packages 1-11 jointly procuring 1,204 units, package 12 jointly procuring 1,250 units).

At the end of March, China Unicom released the pre-qualification announcement for the 2024 China Unicom Artificial Intelligence Server Centralized Procurement Project. The announcement shows that the 2024 China Unicom Artificial Intelligence Server Centralized Procurement Project has been approved, with the tenderer being China United Network Communications Co., Ltd. and its provincial branches, Unicom Digital Technology Co., Ltd., etc. This time, China Unicom will purchase a total of 2,503 artificial intelligence servers and 688 RoCE network equipment, and this procurement will not be divided into packages.

In October of last year, China Telecom also announced the results of the review of the AI computing server (2023-2024) centralized procurement project, with Super Fusion, Inspur, and New H3C and other manufacturers shortlisted, with a total purchase of 4,175 AI servers and 1,182 switches.

05 Construction of computing power centers drives AI chip manufacturers to benefit

At present, the main enterprises constructing computing power centers include operators, large cloud service enterprises, and large Internet enterprises. These enterprises are well-funded, large in scale, and able to bear the huge costs of constructing computing power centers. At the same time, they have a huge demand for computing power and also have abundant downstream customers to sell computing power to.

On October 17, 2023, the U.S. Department of Commerce issued the ECNN 3A090 and 4A090 requirements for the Export Control List, further restricting the export of high-performance AI chips, and adding 13 Chinese companies to the Entity List. The modified export control design products include but are not limited to: NVIDIA A100, A800, H100, H800, L40, L40S, and RTX 4090 products. Due to the restrictions on the purchase of domestic AI chips by the United States, the computing power center and related AI chips have formed two separate markets domestically and internationally.

The huge domestic computing power market has benefited domestic chip manufacturers. Recently, China Mobile officially announced the completion and operation of the world's largest single-unit intelligent computing center - China Mobile Intelligent Computing Center (Hohhot). The project has deployed approximately 20,000 AI accelerator cards, with a domestic AI chip localization rate of over 85%.

China Unicom has also recently established the first "Government + Operator" intelligent computing center in Beijing, continuing to use the domestically produced Ascend AI basic software and hardware in the computing power center.

Previously, China Telecom Shanghai Company lit up the "Large-scale Computing Power Cluster and Artificial Intelligence Public Computing Power Service Platform" in Shanghai, which is the largest operator-level intelligent computing center in the country, with a computing power cluster scale of 15,000 cards, using self-developed AI chips. The intelligent computing center in central China, which began operations at the beginning of the year, also adopted a solution architecture based on a domestically produced AI basic software and hardware platform.

It is not difficult to find that domestic computing power centers mostly use domestically produced AI software and hardware. Currently, GPUs have the largest usage in the AI chip market, and the main beneficiaries of domestic AI chip procurement include Chinese representative companies such as Huawei, Cambricon Technologies, Horizon Robotics, and Bitmain. Last year, Baidu ordered 1,600 Ascend 910B AI chips for 200 servers.

According to industry estimates, due to the upgrade of the NVIDIA restrictions, the new market space for domestically produced AI chips in 2024 will exceed 700 billion.

Other major markets outside of China have received relatively small restrictions on chip procurement. The global AI chip market is currently dominated by European and American giants represented by NVIDIA, with industry data showing that NVIDIA almost "monopolizes" the AI chip market with an 80% market share. Previously, NVIDIA's CEO, Jensen Huang, announced that they would establish an AI factory in Japan, and the factory would prioritize supplying GPU demand in Japan.

Competition intensifies, and major companies begin to develop AI server chips on their own

Currently, it is generally believed that in the wave of artificial intelligence, the most benefited are the AI chip manufacturers who "sell shovels". Data shows that the cost of chips accounts for about 32% of the total cost in basic servers, and as high as 50% to 83% in high-performance or even higher-performance servers.

The high cost has led to an increasing number of Internet and IT equipment giants starting to develop their own AI server chips.

In 2016, Google launched its self-developed AI Tensor Processing Unit (TPU). Around 2022, Google began developing server CPUs based on the Arm architecture, and in April 2024, Google released its self-developed Arm architecture CPU - Axion, and announced that the chip is already in internal use.

In 2020, Microsoft began customizing chips for its Azure cloud service, and in November 2023, Microsoft launched two self-developed chips - Maia100 and Cobalt100. Maia100 is a chip designed specifically for training and inference of large language models, using TSMC's 5nm process, and Cobalt100 is a 128-core server CPU based on the Arm architecture.

In early April of this year, Meta released a new generation of AI training and inference accelerator MTIA, with more than twice the computing and memory bandwidth of the previous generation product, which will help drive ranking and recommendation ad models on Facebook and Instagram.

Previously, there were reports that the U.S. AI research company OpenAI is in negotiations with potential investors, including the government of the United Arab Emirates, to push forward a project aimed at improving global chip manufacturing capabilities and reshaping the global semiconductor industry. One insider revealed that the plan is to raise as much as $5 trillion to $7 trillion.

In addition, domestic major companies are also not to be outdone, and have begun to develop AI chips. Recently, China Mobile officially released the Da Yun Pan Shi DPU at its 2024 Computing Power Network Conference, with a bandwidth of 400Gbps, reaching a leading level domestically.


Copyright © 2024 newsaboutchina.com