New Business

AI start-up DeepSeek’s ‘real’ costs and computing power debated as chip stocks reel

The amount of computing power DeepSeek used to train its models has become a subject of intense interest for artificial intelligence (AI) experts and investors over the past week, as the answer could have significant implications for the technology’s future development.

In a published paper on its DeepSeek-V3 large language model (LLM), which was launched in December, the Chinese start-up claimed that training took just 2.8 million “GPU hours” at a cost of US$5.6 million, a fraction of the time and money that US firms have been spending on their own models.

DeepSeek-R1, the company’s open-source reasoning model released on January 20, has demonstrated capabilities comparable to those of more advanced models from OpenAI, Anthropic and Google, but also with significantly lower training costs. The paper on R1 did not mention the cost of development.

The low cost and strong performance of DeepSeek’s models have cast doubt on the need for the eye-watering capital expenditure of US tech giants, particularly on expensive AI chips. This led to a big sell-off of Nvidia shares last week, wiping out US$600 billion in a single day.

05:10

Chinese AI disrupter DeepSeek claims top spot in US App Store, dethroning ChatGPT

Chinese AI disrupter DeepSeek claims top spot in US App Store, dethroning ChatGPT

DeepSeek’s own records, and those of its affiliated hedge fund High-Flyer Quant, show that the company is one of the best-sourced entities for training AI. As early as 2019, Liang Wenfeng, the founder of High-Flyer and DeepSeek, had spent 200 million yuan (US$27.8 million) to buy 1,100 graphics processing units (GPUs) to train algorithms for stock trading. High-Flyer said its computing centre at the time covered an area equivalent to a basketball court, according to company documents, which would have put it around 436.6 square metres (4,700 sq ft).

In 2021, the fund spent 1 billion yuan on the development of its supercomputer cluster Fire-Flyer 2, which was expected to reach 1,550 petaflops, a measurement of computing power, according to High-Flyer’s website. This would be similar in performance to some of the world’s most powerful supercomputers.


Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button