Position: Home|News List

Mark Zuckerberg's latest interview: Why is Meta's most powerful open-source model, Llama 3, worth billions of dollars?

Author:Friends of 36KrPublish:2024-04-19

Tencent Technology News, April 19th - According to foreign media reports, on Thursday local time in the United States, Meta, the parent company of Facebook, launched its most powerful open-source artificial intelligence (AI) model to date - Llama 3, with the intention of catching up with the leading OpenAI in the fierce industry competition. The newly released Llama 3 model includes two versions with 80 billion and 700 billion parameters, and a top-of-the-line version with over 400 billion parameters will be launched in the future, highlighting Meta's ambition in the field of AI.

It is reported that Llama 3 has demonstrated outstanding performance in multiple industry benchmark tests and has added many new features, such as improved reasoning capabilities. Meta plans to deeply integrate Llama 3 into its virtual assistant Meta AI, which is widely used in popular applications such as Facebook, Instagram, WhatsApp, and Messenger, and is about to undergo a new round of updates to provide users with a more intelligent and convenient experience.

In addition, Meta also announced that Llama 3 will soon be launched on platforms such as Amazon AWS, Google Cloud, IBM's WatsonX, Microsoft Azure, and NVIDIA's NIM, and has received support from hardware giants such as AMD, Dell, Intel, and NVIDIA. This series of collaborations and integrations will undoubtedly further accelerate the popularity and application of Llama 3 globally.

At the important moment of Meta's release of Llama 3, the company's CEO Mark Zuckerberg accepted an interview with the well-known tech podcast host Dwarkesh Patel. They had in-depth discussions on topics such as Llama 3, Artificial General Intelligence (AGI), energy bottleneck issues, the strategic significance of AI technology, potential risks of open source, and the Metaverse. At the same time, Zuckerberg also shared the decision-making process for open-sourcing the $10 billion model and custom chip source code.

The following is the transcript of the interview:

1. The top-of-the-line version of Llama 3 is currently in training

Patel: Mark, it's a great honor to have you on our podcast.

Zuckerberg: Thank you for the invitation, Patel. I'm glad to be here. I've always enjoyed your podcast.

Patel: That's great, thank you! Now, let's talk about Llama 3! Please share with me some highlights and exciting new developments about this latest large model and Meta AI.

Zuckerberg: I think most people may be more interested in the new version of Meta AI, but in fact, our efforts in model upgrades are the most important. We are launching Llama 3. We will provide it as an open-source project to the developer community and also use it to support Meta AI. Regarding Llama 3, I believe we will have many interesting topics to discuss. But I think the most important thing is that now we believe Meta AI is the smartest and most freely available AI assistant that people can use anytime, anywhere.

In addition, we have also integrated real-time knowledge from Google and Bing, allowing the AI assistant to provide more accurate and comprehensive information. We plan to make it more prominent in our applications, such as at the top of Facebook and Messenger, where you will be able to directly use the search box to ask any questions. In addition to these, we have added some new creative features, which I think are very cool and I believe everyone will love.

Especially the animation feature, you can easily turn any picture into an animation, which is very interesting. There is an amazing feature where it can generate and update high-quality images in real-time while you type. You just need to enter a query, such as "show me a scene of eating Hawaiian fruit and drinking beer in the field, with cows and mountains in the background," and it will update the image in real-time based on your input, which is a truly magical experience. I believe everyone will love this feature.

These are some of the obvious changes that most people will see. We are gradually rolling out these new features, although they are not yet available globally, but we will start with some countries and gradually expand the scope in the coming weeks and months.

I think this will be a very big breakthrough, and I am excited to let everyone experience it. But if you want to delve deeper, Llama 3 is undoubtedly the most technically interesting part for us. We are training three different scales of Llama 3 models, including versions with 80 billion parameters, 700 billion parameters, and 4050 billion parameters.

Currently, the first two versions are ready, and the largest-scale model is still in training. Although we cannot release the 4050 billion parameter version immediately today, I am confident in the performance of the 80 billion and 700 billion parameter models. They are industry-leading at their respective scales, and we will also publish detailed benchmark test results through blog posts for everyone to gain a deeper understanding of their performance.

Of course, Llama 3 is open-source, which means that developers will have the opportunity to try and explore its potential. We also have a carefully planned roadmap that will bring multi-modality, more language support, and a longer context window to the model. It is expected that later this year, we will launch the highly anticipated 4050 billion parameter version. Based on the current training progress, its MMLU (Multi-Modal Learning Understanding) score has approached 85, and we expect it to demonstrate outstanding performance in numerous benchmark tests.

As for the 700 billion parameter model, it also performs exceptionally well. Today, we are officially releasing it with an MMLU score of around 82, and it has achieved impressive results in mathematics and reasoning. I believe that allowing users to experience this model will be very interesting and meaningful.

I want to emphasize that even the 80 billion parameter model performs almost as well as our previously released Llama-2 version. This means that even the "smallest" Llama-3 is almost as powerful in functionality as the "largest" Llama-2.

Patel: Before we delve into analyzing these models, I'd like to look back at history. I remember in 2022, Meta faced a significant drop in stock prices, and there was skepticism about your substantial investment in the Nvidia H100 chips. The concept of the metaverse was not widely recognized in the market at that time. I guess, what considerations led to your decision to invest in the H100 GPUs? How did you anticipate the demand for these GPUs?

Zuckerberg: I think at that time, we were in the development stage of the Reels project. We always believed in reserving enough capacity to address unforeseen innovations, and the Reels project was just such an example. We found that we needed more GPUs for training models. This was a significant shift because our service was no longer just based on arranging content based on people or pages you follow, but began heavily recommending so-called "unconnected content" – content from people or pages you don't follow.

As a result, the candidate content we might show has increased from thousands to billions. This naturally required a new infrastructure to support it. We were already building such infrastructure, but in catching up with TikTok's pace, we encountered bottlenecks and couldn't quickly meet our expectations. Seeing this situation, I realized, "We must ensure that we don't fall into this passive situation again. So, we not only ordered enough GPUs to complete the Reels and content sorting work, but also doubled the order." The principle we always adhere to is that there will always be unforeseen new things in the future, and we must be prepared for them.

Patel: Did you know it would be related to artificial intelligence?

Zuckerberg: We initially thought it should be related to training large models. But then I realized it was more closely related to content recommendation. Running a company is like playing a game, there are always new challenges. At that time, I was fully committed to the development of Reels and other content recommendation features, hoping they would have a huge impact. Today, Instagram and Facebook can show users content they are interested in, even if it comes from people they haven't followed, which is undoubtedly a huge leap. Looking back, that decision was undoubtedly wise, stemming from the lessons we learned from falling behind. This is not to say that we were once "far ahead," in fact, many decisions now seem correct because we made mistakes and learned from them.

Patel: In 2006, you turned down a $10 billion acquisition offer, but I think there must be a price at which you would consider selling Facebook, right? Do you have a valuation in mind, thinking "this is the true value of Facebook, and they didn't offer that price"? I know if they offered you $5 trillion, you would definitely accept it. So, how do you view this decision, based on what considerations?

Zuckerberg: I think this is mainly a personal choice. Looking back, I wasn't sure if I was mature enough to make such a decision. There were many discussions about the $10 billion price at the time, and people analyzed it based on various reasons, such as expected income and scale. But these were far beyond the stage we were in at the time. To be honest, I didn't have enough financial knowledge to participate in such discussions at that time, but deep down, I had a firm belief in what we were doing.

I also did some simple analysis, like "If I don't do this, what would I do? Actually, I really enjoy creating new things, helping people communicate, understanding people's dynamics, and interactions between people. So, I thought, if I sold the company, I might go on to create another similar company, and I'm quite satisfied with the company as it is now. So, why sell it?" I think many major decisions people make are actually based on our beliefs and values. In fact, it is very difficult to accurately predict the future through analysis.

2 The Road to AGI

Patel: Facebook Artificial Intelligence Research (FAIR) has gone through a long period, and now it seems deeply embedded in the core of your company. I would like to ask, at what point did building Artificial General Intelligence (AGI) or the grand goal you pursue become Meta's top priority?

Zuckerberg: In fact, this shift has quietly occurred for some time. About 10 years ago, we founded FAIR. The original intention was that on the path to AGI or similar goals, there would be many innovations emerging, and these innovations would continuously drive the progress of all our businesses. Therefore, we did not conceive FAIR as an independent product, but as a research team. In the past 10 years, FAIR has created many unique achievements, bringing significant improvements to all our products. It has driven the development of multiple fields and provided inspiration for other innovators in these fields, thus creating more technologies to improve our products. This makes me very excited.

In recent years, with the rise of ChatGPT and the emergence of diffusion models in the field of image creation, we have clearly felt a huge wind of change. These new technologies are remarkable and will profoundly impact the way people interact with various applications. Therefore, we have decided to establish a second team - the General Artificial Intelligence team, with the aim of integrating these cutting-edge technologies into our products and building leading foundational models that can support all different products.

When we began this exploration, our initial thought was that much of what we do has a strong social aspect. It helps people interact with creators, helps people communicate with businesses, and also helps businesses sell products or provide customer service. Additionally, it can also serve as an intelligent assistant, integrated into our applications, smart glasses, and virtual reality. Therefore, initially, we were not entirely sure if we needed a complete general artificial intelligence to support these use cases. However, as we delved deeper into these nuances, I gradually realized that the support of general artificial intelligence is indeed essential. For example, when developing Llama-2, we did not prioritize coding functionality because people do not ask Meta AI a lot of coding questions on WhatsApp.

Patel: Do they do now?

Zuckerberg: I don't know, and I'm not sure if WhatsApp, Facebook, or Instagram will become interfaces where users ask a lot of coding questions. Perhaps it will be more common on our upcoming Meta.AI website. However, over the past 18 months, we have been surprised to find that coding actually plays a crucial role in many fields, not just limited to the programming industry. Even if users do not directly ask coding-related questions, training the model in coding helps it answer questions more accurately and demonstrate exceptional reasoning abilities in different fields. For example, with Llama-3, we focused on optimizing it through extensive coding training, as this would make it perform well in various aspects, even if the user's main focus is not coding questions.

Reasoning ability is another excellent example. Imagine when you interact with a creator, or when a business tries to interact with a customer, this interaction is far from a simple "you send a message, I reply" pattern. It involves a multi-step, deep thinking process that requires us to think about "how to better achieve this person's goal?" Many times, customers are not clear about what they really need or how to accurately ask questions. Therefore, simply answering questions is not the entirety of artificial intelligence's work. We need to think more comprehensively and deeply, which has actually transformed into a reasoning problem. If a team makes a significant breakthrough in reasoning, while we are still at the basic chatbot stage, our product will pale in comparison to products built by other teams. Ultimately, we realized that in order to stay ahead, we must fully address the issue of general intelligence, so we have increased our efforts and investments to ensure that we can make this breakthrough.

Patel: So, is the Llama version that can address all these user use cases powerful enough to replace the level of all programmers in this building?

Zuckerberg: I believe that over time, these technologies will mature and show tremendous potential. However, whether Llama-10 or future versions can completely replace programmers is a complex question. I don't think we are trying to replace humans, but rather hoping to empower people with these tools to enable them to accomplish work that was previously difficult to imagine.

Patel: Assuming our programmers' efficiency will increase tenfold after using Llama-10 in the future?

Zuckerberg: I have high expectations for this. I firmly believe that human intelligence is not measured by a single standard, as everyone has unique skills and talents. At some point, artificial intelligence may surpass the abilities of most humans in certain aspects, but this entirely depends on the strength of the model. However, I think this is a gradual evolutionary process, and general artificial intelligence is not something that happens overnight. We are actually gradually adding different capabilities to the model.

Currently, multimodality is an area of focus for us, from initial photos, images, and text to future involvement with videos. Given our strong interest in the metaverse, 3D technology is also particularly important. Additionally, one modality I am particularly interested in is emotional understanding, which is an area that I rarely see other teams deeply researching in the industry. After all, most of the human brain's functions are dedicated to understanding others, interpreting expressions, and emotions. I firmly believe that if we can make a breakthrough in this area, enabling artificial intelligence to truly understand and express emotions, the interaction between humans and machines will become unprecedentedly natural and profound.

You might think that this is only within the realm of videos or images, but in fact, they are a very professional version of human emotional expression. Therefore, in addition to improving the model's ability in reasoning and memory, we also need to pay attention to many other different abilities. I believe that in the future, we will not be satisfied with just entering a query window to seek answers. We will have different memory storage methods or customized models that will serve people in a more personalized way. These are the different abilities that artificial intelligence needs to develop. Of course, we also need to address the issue of model size. We care about large models as well as how to run small models in limited space. For example, if you are running a large service like Meta AI, it mainly relies on powerful computing capabilities on the server side. However, we also expect these advanced technologies to be integrated into compact devices, such as smart glasses. Since the space for smart glasses is very limited, we need to develop an efficient and lightweight solution to adapt to this environment.

Patel: Suppose we invest $10 billion, or even up to $100 billion, in implementing intelligent reasoning on an industrial scale, what specific use cases will these funds be used for? Is it simulating technology? Or artificial intelligence applications in the metaverse? How can we effectively use data centers to support these use cases?

Zuckerberg: According to our predictions, intelligent reasoning will profoundly change almost all product forms. I believe that in the future, we will see the emergence of a Meta AI universal assistant product. This product will evolve from traditional chatbots, from simply answering questions to being able to receive and execute more complex tasks. This will require a large amount of reasoning ability and will also trigger a huge demand for computing power.

In addition, interacting with other intelligent agents will also become an important part of our work, whether serving businesses or creators. I believe that humans will not only interact with a universal artificial intelligence, but every business will want to have an artificial intelligence that represents its interests. These artificial intelligences will not be primarily used to sell competitors' products, but to interact with businesses, creators, and consumers in a unique way.

It is worth mentioning that creators will be an important group benefiting from this technology. We have about 200 million creators on our platform, and they generally feel that they don't have enough time in a day, while their community is eager to interact with them. If we can develop a technology that allows creators to train their own artificial intelligence and use it to interact with the community, that would be a very powerful feature.

These are just some of the consumer use cases. Take the Chan-Zuckerberg Initiative, which my wife and I operate, for example. We are doing a lot of work in the scientific field, and artificial intelligence will undoubtedly play a key role in advancing science, healthcare, and other fields. Ultimately, intelligent reasoning will impact almost every product and economic sector.

Patel: You mentioned artificial intelligence that can perform multi-step tasks, which makes me curious whether this means we need a larger model to achieve this functionality? For example, for Llama-4, do we need a version with 700 billion parameters that, when trained on the right data, can demonstrate amazing capabilities? What are our current progress mainly focused on? Is it the expansion of model size, or, as you mentioned earlier, keeping the model size constant but diversifying functionality and application scenarios?

Zuckerberg: We may not have a clear answer to this question at the moment. But one obvious trend I have observed is that we have a basic Llama model, and then we build some application-specific code around it. Some of this code is fine-tuning for specific use cases, but some is about how to make Meta AI collaborate with tools like Google and Bing to obtain real-time knowledge, which is not part of the basic Llama model. In the development of Llama-2, we tried to integrate some of these features into the model, but more was done manually. For Llama-3, we set a goal to embed more of these features into the model itself. As we start exploring more behaviors similar to intelligent agents, I believe that some of these features still need to be optimized manually. For Llama-4, our goal is to naturally integrate more of these features into the model.

At each step of progress, you can feel the possible directions for future development. We are trying various possibilities and conducting experiments around the model. This helps us to understand more deeply which features should be included in the next version of the model. This way, our model can become more universal, because obviously, any feature implemented through manual coding, although it can unlock some use cases, is fundamentally fragile and not universal enough. Our goal is to make the model able to self-learn and self-evolve to adapt to various complex and changing scenarios.

Patel: You mentioned "embedding more content into the model itself." Could you explain specifically how you train to embed these desired features into the model? What do you mean by "embedding into the model itself"?

Zuckerberg: Take Llama-2 as an example, its tool usage capability is relatively specific and limited. By the time Llama-3 came around, we were delighted to find that its tool usage capability had significantly improved. Now, we no longer need to manually code everything to make it able to use Google for searches; it can do these tasks independently. Similarly, in programming, running code, and a range of other tasks, Llama-3 has also demonstrated excellent capabilities. Once we have this capability, we can foresee what new possibilities we can start exploring next. We don't have to wait for the appearance of Llama-4 to start building these capabilities, so we can experiment and try various things around it in advance. Although these manual coding processes may temporarily make the product better, they also indicate the direction of what should be built in the next version of the model.

Patel: In the fine-tuning of Llama-3 in the open-source community, what use cases are you most looking forward to? Perhaps not the most practically valuable one for you, but the one that interests you the most and you are most eager to try. For example, I heard that someone made fine-tuning adjustments in ancient history, allowing us to directly converse with historical figures such as the ancient Roman poet Virgil.

Zuckerberg: I think the charm of such things lies in the fact that they always bring us surprises. Any specific use case that we consider valuable has the potential to be explored. I believe we will see more streamlined versions of models emerging. I also look forward to seeing a model with fewer parameters, such as one with only 10 to 20 billion parameters, or even a 5 billion parameter model, to see what interesting and efficient applications they can bring. If an 8 billion parameter model is almost as powerful as the largest Llama-2 model, then a 10 billion parameter model should also demonstrate its unique value in certain areas. They can be used for classification tasks, or for preprocessing user queries to understand their intent before passing them on to more powerful models for precise processing. I think this will be an area where the community can play a huge role in helping us fill the gaps in the application of these models. Of course, we are also considering streamlining and optimizing these models, but currently all our GPU resources are mainly used for training a 405 billion parameter model.

Patel: I noticed that in the materials you shared earlier, there was a very striking point, that the amount of data used in training the models actually exceeds the optimal amount of computation used for training. Considering the importance of inference for you and the entire community, having a model with trillions of tokens does make a lot of sense.

Zuckerberg: Regarding the 700 billion parameter model, we observed an interesting phenomenon. We originally thought that as the amount of data increased, the performance of the model would gradually saturate. However, after training about 1.5 trillion tokens, we found that the model was still learning. Even in the final stages of training, it continued to demonstrate strong learning capabilities. Perhaps we can continue to input more tokens to further improve its performance.

But as the operator of the company, we need to make a decision at some point: should we continue to use GPU resources for further training of this 700 billion parameter model? Or should we turn to other directions, such as testing new hypotheses for Llama-4? We need to find a balance between the two. Currently, I think we have achieved a good balance with this version of the 700 billion parameter model. Of course, in the future, we will introduce other versions, such as a multimodal version with 700 billion parameters, which will be introduced to everyone in the near future. But one thing is very fascinating, that the current model architecture can accommodate such a huge amount of data.

3 Energy Bottleneck

Patel: This is indeed thought-provoking. So, what does this mean for future models? You mentioned earlier that the 80 billion parameter version of Llama-3 even surpasses the 700 billion parameter Llama-2 in some aspects.

Zuckerberg: No, no, I don't want to exaggerate. Their performance is actually quite similar in terms of orders of magnitude.

Patel: So, can we expect the 700 billion parameter version of Llama-4 to be comparable to the 405 billion parameter version of Llama-3? What will be the future development trend?

Zuckerberg: This is indeed a big question. To be honest, no one can accurately predict. One of the most difficult things to predict in the world is the trend of exponential growth. How long will it continue? I firmly believe that we will continue to move forward. I think it is very worthwhile to invest $100 billion, or even more than $1 trillion, in building infrastructure. Assuming that this growth trend can continue, we will achieve some truly stunning results, thus creating amazing products. But no one in the industry can tell you for sure that it will continue to expand at that speed. Historically, we always encounter bottlenecks at some point. But today, people have very high expectations for this field, and perhaps these bottlenecks will be overcome soon. This is indeed a question that is worth our deep consideration.

Patel: What would the world look like if there were no such bottlenecks? Although this seems unlikely, what if technological progress really could continue to develop at this rate?

Zuckerberg: In any case, there will always be new challenges and bottlenecks. In the past few years, the production of GPUs has been a clear problem. Even companies with the money to buy GPUs often struggle to obtain the required quantity due to supply constraints. But this situation seems to be gradually improving. Today, we see more and more companies considering investing heavily in building infrastructure for GPU production. I think this situation will continue for some time.

In addition, capital investment is also a factor that needs to be considered. At what point does investing more capital no longer provide cost-effectiveness? In fact, I believe that before we encounter capital investment issues, energy issues will become apparent first. As far as I know, no one has yet been able to build a single 1-gigawatt training cluster. We will encounter increasingly difficult challenges globally, such as obtaining energy permits. This is not just a software issue; it involves strict government regulations, which I believe are even stricter than what many in the tech industry perceive. Of course, if you are starting from a small company, you may not feel this as strongly. But when we deal with different government departments and regulatory agencies, we need to comply with a large number of rules and ensure that we are compliant globally. However, there is no doubt that energy will be a major constraint we face.

If you are talking about building large new power plants or large buildings and need to cross other private or public land to construct transmission lines, then this will be a heavily regulated project. You need to consider the lead time of many years. If we want to establish a large facility, providing power for it will be a long and complex project. I believe people will strive to achieve this goal, but I don't think it will be as simple and miraculous as reaching a certain level of artificial intelligence, obtaining a large amount of capital, and investing in it, and then suddenly the models will make a leap in progress.

Patel: On the path to driving the development of artificial intelligence, will we encounter bottlenecks that even companies like Meta cannot overcome alone? Are there projects that even a company like Meta does not have enough resources to complete, even if your R&D budget or capital expenditure budget increases tenfold? Is this what you have in mind, but given the current state of Meta, you cannot even raise enough funds through issuing stocks or bonds?

Zuckerberg: Energy issues are undoubtedly a major challenge. I firmly believe that if we can solve the energy supply problem, we can completely build larger-scale computing clusters than we have now.

Patel: So, fundamentally, is this a limitation of funding bottlenecks?

Zuckerberg: Funding is indeed one aspect, but I think time is also an important factor. Currently, the scale of many data centers is approximately between 50 megawatts and 100 megawatts, with large ones possibly reaching 150 megawatts. Suppose you have a complete data center and all the necessary training equipment, and you have built the largest cluster allowed by current technology. I think many companies are already close to or have reached this level. However, when we talk about building data centers of 300 megawatts, 500 megawatts, or even 1 gigawatt, the situation is completely different. Currently, no one has attempted to build a 1-gigawatt data center. I believe it will be possible, but it will take time to accumulate. However, this will not happen next year, as many things involved will take several years to complete. From this perspective, I believe a 1-gigawatt-scale data center will require an energy supply equivalent to that of a nuclear power plant to support model training.

Patel: Has Amazon already attempted this? They seem to have a 950-megawatt facility.

Zuckerberg: I am not very familiar with the specific practices of Amazon; you may need to ask them directly.

Patel: Training does not necessarily have to be limited to a single location, right? If distributed training is effective, then we can actually consider dispersing it to multiple locations.

Zuckerberg: I think this is a very important issue concerning the future of training large models. From the current trend, it seems that generating synthetic data through inference and using this data for model training is a very promising direction. Although I am not yet clear about the proportion between this synthetic data and direct training, I believe that the generation of synthetic data is increasingly approaching the process of inference. Obviously, if this method is used for model training, it will become an indispensable part of the entire training process.

Patel: So, this is still an unresolved issue about finding this balance and its future development direction. So, is this trend likely to be realized in Llama-3, or even in later versions such as Llama-4? In other words, if you release a model, entities with powerful computing capabilities, such as Kuwait or the United Arab Emirates, can use such models to make certain applications more intelligent.

Zuckerberg: I completely agree with this possibility. Indeed, I think there will be such dynamic developments in the future. But at the same time, I also believe that the model architecture itself has certain fundamental limitations. Take Llama-3 as an example. Although we have made significant progress, I believe there is still room for further optimization in its architecture. As I mentioned earlier, we feel that the performance of the model can still be improved through providing more data or iterating on certain key steps.

In fact, we have seen many companies build new models based on the architecture of Llama-2's 700 billion parameter model. However, for models like Llama-3 with 700 billion or 4050 billion parameters, making intergenerational improvements is not easy, and there are currently no similar open-source models. I think this is a huge challenge, but also a huge opportunity. However, I still believe that what people can build based on the existing model architecture is not infinitely scalable. Before reaching the next technological leap, we may only be able to make some optimizations and improvements based on the existing foundation.

4 Will AI go out of control overnight?

Patel: Now let's take a more macroscopic view. How do you think artificial intelligence technology will develop in the coming decades? Do you think it will make you feel like another technology, such as the metaverse or social technology, or do you think it has fundamentally different significance in human history?

Zuckerberg: I believe that artificial intelligence will be a fundamental technology. It's more like the invention of the computer, which will give rise to a whole new range of applications. Just like the emergence of the internet or mobile phones made many previously impossible things possible, people began to rethink these experiences. Therefore, I think artificial intelligence will bring about a similar transformation, but it is a deeper level of innovation. My feeling is that it's like the transition from having no computers to having computers. However, it's really difficult to accurately predict how it will develop. From a longer cosmic time span, this transformation will happen quickly, possibly within a few decades. Some people are indeed concerned that it may quickly get out of control and go from a certain level of intelligence to extremely intelligent overnight. But I think, due to many physical limitations, this scenario is unlikely to happen. I don't believe we will suddenly face a situation of artificial intelligence getting out of control overnight. I believe we will have enough time to adapt. But artificial intelligence will truly change the way we work, providing innovative tools for people to do different things. It will enable people to pursue what they truly want to do more freely.

Patel: Perhaps not overnight, but from a cosmic time perspective, do you think we can view these milestones in this way? Human evolution, then the emergence of artificial intelligence, and then they may head towards the galaxy. This may take decades, or it may take a century, but is this the grand plan unfolding in your eyes? I mean, like other technologies such as computers or even fire, but is the development of artificial intelligence itself as important as the initial evolution of humans?

Zuckerberg: I think it's difficult to judge. Human history is basically a process of gradually realizing that we are not unique in some aspects, but at the same time, realizing that humans are still very special, right? We used to think that the Earth was the center of the universe, but that's not the case, yet humans still have extraordinary qualities, don't they? I think people often have another bias, that intelligence is closely related to life to some extent, but that's not the case. We still don't have a clear enough definition of consciousness or life to fully understand this issue. Many science fiction novels describe the creation of intelligent life, and these intelligences begin to exhibit various human-like behaviors, etc. But the current trend seems to indicate that intelligence can exist quite independently of consciousness, agency, and other qualities, making it a very valuable tool.

5 The Danger of Open Source

Zuckerberg: It's extremely challenging to predict the direction of these things as they develop over time, so I think anyone should avoid planning their development or use in a dogmatic way. We need to re-evaluate every time we release a new product. We are very supportive of open source, but it doesn't mean we will make everything public. I tend to think that open source is beneficial for both the community and ourselves because it promotes innovation. However, if at some point, there is a qualitative change in the capabilities of these technologies, and we feel that open sourcing is irresponsible, then we will choose not to make it public. There is a lot of uncertainty in all of this.

Patel: When you were developing Llama-4 or Llama-5, was there a possibility of a specific qualitative change that made you consider whether it should be open sourced?

Zuckerberg: It's difficult to answer this question from an abstract perspective because any product may have potential risks, and the key is how we effectively manage and mitigate these risks. In Llama-2, we have already faced some challenges and invested a lot of resources to ensure that it is not used for malicious purposes, such as violent behavior, etc. This doesn't mean it has become an intelligent entity just because it has a lot of knowledge about the world and can answer a range of questions that may bring risks. So, I think the issue is how to identify and mitigate its potential malicious behavior, rather than the behavior itself.

In my view, evaluating the good and bad of things involves multiple dimensions, and it's difficult to list all possibilities in advance. For example, with social media, we have dealt with various types of harmful behavior and categorized them into 18 or 19 categories. We have built artificial intelligence systems to identify these behaviors to reduce their occurrence on our platform. Over time, I believe we will further refine these classifications. This is a problem we have been working hard to study because we want to ensure a deep understanding of it.

Patel: I think it's very important to widely deploy artificial intelligence systems and give everyone the opportunity to use them. I would be disappointed if future artificial intelligence systems are not widely applied. At the same time, I also hope to have a deeper understanding of how to mitigate potential risks. If the mitigation measures are mainly fine-tuning, then the benefit of open sourcing model weights is that people can make more in-depth adjustments based on these capabilities. Currently, these models are far from reaching that level, more like advanced search engines. But if I could show them my petri dish and let them explain why my smallpox sample didn't grow and how to improve it, then in this case, how can we ensure the safe and effective use of these models? After all, some people may fine-tune these models to meet their own needs.

Zuckerberg: Indeed, this is a complex issue. I think most people would choose to directly use ready-made models, but there are also some unscrupulous people who may try to use these models for malicious behavior. So, this issue is indeed worth our deep consideration. From a philosophical perspective, the reason why I support open source so much is that I think if artificial intelligence becomes overly centralized in the future, its potential risks may be no less than its widespread dissemination. Many people are thinking, "If we can do these things, will the widespread application of these technologies in society be a bad thing?" At the same time, another issue worth considering is whether it is a bad thing if an organization has more powerful artificial intelligence than everyone else.

I can use an analogy from the field of security to explain. Imagine if you could discover and exploit certain security vulnerabilities in advance, then you could almost effortlessly invade any system. This is not limited to the field of artificial intelligence. Therefore, we cannot solely rely on a highly intelligent artificial intelligence system to identify and fix all vulnerabilities, even though this seems theoretically feasible. So, how does our society address this issue? Open-source software plays a crucial role in this. It allows software improvement to extend beyond the scope of a single company and be widely applied to various systems, including banks, hospitals, and government agencies. As software continues to improve, thanks to more people being able to participate in reviewing and testing, standards for how these software work are gradually established. When upgrades are needed, the world can act together swiftly. I believe that in a world where artificial intelligence is widely deployed, over time, these artificial intelligence systems will gradually be strengthened, and all different systems will be controlled in some way.

In my view, this distributed, widespread deployment is healthier than a centralized approach. Of course, there are risks in all aspects, but I believe people have not fully discussed these risks. There is indeed a risk of artificial intelligence systems being used for malicious purposes. However, I am more concerned that an untrustworthy entity possesses a super powerful artificial intelligence system, which I think could be a greater risk.

Patel: Will they try to overthrow our government because they have weapons that others don't? Or just create a lot of chaos?

Zuckerberg: My intuition tells me that for economic, security, and various other reasons, these technologies will eventually become very important and valuable. If our enemies or people we don't trust gain more powerful technology, then this could indeed be a serious problem. Therefore, I think the best way to mitigate this may be to promote the development of good open-source artificial intelligence, make it an industry standard, and take a leading role in multiple aspects.

Patel: Open-source artificial intelligence systems do indeed help to establish a more fair and balanced playing field, which I find to be very reasonable. If this mechanism can operate successfully, it is undoubtedly the future I am looking forward to. However, what I want to further explore is, from a mechanistic perspective, how does open-source artificial intelligence prevent someone from using their artificial intelligence system to create chaos? For example, if someone tries to create a biological weapon, can we develop the corresponding vaccine at an extremely fast pace on a global scale to counter it? What are the specific operational mechanisms in this?

Zuckerberg: From a security perspective I mentioned earlier, I believe that individuals with weaker artificial intelligence systems attempting to invade systems protected by stronger artificial intelligence will have a relatively low success rate.

Patel: But how do we ensure that everything in the world can be handled properly like this? For example, the situation with biological weapons may not be so straightforward.

Zuckerberg: Indeed, I cannot assert that everything in the world can be resolved smoothly. Biological weapons are one of the focal points of concern for those who are deeply worried about such issues, and I think this concern is valid. Despite some mitigation measures, such as attempting not to train certain knowledge in models, we must recognize that in certain situations, if faced with extremely malicious actors and no other artificial intelligence to counterbalance them and understand the severity of the threat, then this could indeed be a risk. This is one of the issues we must take seriously.

Patel: Have you encountered any unexpected situations in deploying these systems? For example, during the training of Llama-4, it might lie to you for some reason. Of course, for a system like Llama-4, this situation may not be common, but have you considered similar scenarios? For example, are you very concerned about the deceitfulness of the system and the potential problems that could arise from the billions of copies of this system freely spreading in the wild?

Zuckerberg: Currently, we have observed many hallucinatory phenomena. I think how to distinguish between hallucinations and deceit is a question worth exploring in depth. Indeed, there are many risks and factors to consider. In operating our company, I try to at least balance these long-term theoretical risks with the actual risks I believe currently exist. Therefore, when it comes to deceit, what concerns me the most is that someone may use this technology to spread misinformation and propagate it through our network or other networks. To counteract this harmful content, we are building artificial intelligence systems that are more intelligent than adversarial systems.

This constitutes part of my understanding of the matter. By observing the different types of harm people cause or attempt to cause on social networks, I find that some of the harm is not highly adversarial. For example, hate speech is not highly adversarial at some level, because people do not become more racially discriminatory due to online speech. In this regard, I believe that artificial intelligence is generally more complex and rapid in dealing with these issues than humans. However, we both have problems. People may engage in improper behavior for various purposes, whether it is attempting to incite violence or other improper behavior, but we also have to deal with a large number of false reports, where we may mistakenly review some content that should not have been reviewed. This situation undoubtedly troubles many people. Therefore, I believe that as artificial intelligence becomes increasingly precise in this regard, the situation will gradually improve.

Whether it's Llama-4 or the future Llama-6, we need to deeply consider the behaviors we observe, and not just us. You chose to open-source this project, in part because there are many researchers dedicated to this as well. Therefore, we hope to share observation results with other researchers, explore possible mitigation strategies together, and consider open-sourcing it while ensuring everything is safe. In the foreseeable future, I am optimistic that we can achieve this. At the same time, in the short term, we cannot ignore the problems of people today trying to engage in improper behavior using the model. Even if these behaviors are not destructive, in operating our services, we are well aware of some quite serious daily harms.

Patel: I find the idea of synthetic data very intriguing. With the current models, there might be a performance asymptote through the repeated use of synthetic data, and there is theoretical basis for this. But suppose these models become smarter and can utilize the kind of technology you mentioned in your papers or upcoming blog posts to find the most correct chain of thoughts. Then, why do you think this wouldn't lead to a loop, where the model becomes smarter, produces better outputs, and becomes even smarter, and so on? Of course, this change wouldn't happen overnight, but after months or years of continuous training, the model could indeed become more intelligent.

Zuckerberg: I think within the parameter range of model architecture, this kind of loop improvement is possible. However, for the current 8 billion parameter models, I don't believe they can reach the same level as advanced models with hundreds of billions of parameters and incorporating the latest research findings.

Patel: About these models, they will also be open source, right?

Zuckerberg: Yes, that's correct. But all of this is contingent on us successfully addressing the challenges and issues we've discussed before. We certainly hope so, but I am also well aware that at every stage of building software, despite the enormous potential and possibilities of the software itself, its operation is still subject to physical limitations of chip performance to some extent. Therefore, we always face various physical constraints. The size to which a model can grow actually depends on how much energy we can obtain and use for inference. I am very optimistic about the future of artificial intelligence technology, believing that it will continue to develop and improve rapidly. At the same time, I am more cautious than some people. I don't think that losing control will happen particularly easily, but we still need to remain vigilant and seriously consider various possible risks. Therefore, I think keeping options open is very meaningful.

6 Caesar and the Metaverse

Patel: Okay, let's turn to another topic—the metaverse. In the long history of human civilization, which period do you most want to explore? Do you just want to catch a glimpse of the past from 100,000 years ago to the present, or does this exploration have to be limited to the past?

Zuckerberg: Indeed, I am more inclined to explore the past. American history, classical history, and the history of science all deeply attract me. I think it would be very interesting to be able to observe and understand how those significant historical advancements occurred. However, what we can rely on are only those limited historical records. For the metaverse, it would be very difficult to completely reproduce those historical periods for which we have no records. In fact, I don't think going back to the past would be the main application of the metaverse, although such functionality could be very useful in history teaching, for example. For me, the most important thing is that no matter where we are in the world, we can interact with others in real time and coexist. I firmly believe that this is the killer application.

In our previous conversation about artificial intelligence, we delved into many of the underlying physical limitations. One valuable lesson that technology teaches us is that we should strive to liberate more things from physical constraints and transfer them to the software domain, because software is not only easier to build and evolve, but also easier to popularize. After all, not everyone can have a data center, but many people can write code, access open-source code, and modify and optimize it. The metaverse is an ideal platform to achieve this goal.

This will be a disruptive and monumental change that will greatly alter people's perception of gathering and interaction. Therefore, people will no longer feel that they have to gather in person to accomplish many things. Of course, I also firmly believe that in certain contexts, gathering in person still has irreplaceable value. This is not an either-or choice. However, it does provide us with a whole new dimension, allowing us to socialize, connect, work more conveniently and efficiently, and play a huge role in many fields such as industry and medicine.

Patel: We mentioned before that you didn't sell the company for a billion dollars. Clearly, you also have a strong belief in the metaverse, despite market skepticism. I'm curious, where does this confidence come from? You mentioned "oh, my values, my intuition," but such statements seem somewhat vague. Could you be more specific about certain traits related to yourself, perhaps we can better understand why you have such confidence in the metaverse.

Zuckerberg: I think this involves several different issues. First, what drives me to keep moving forward? We have discussed many topics. I love creating, especially around how people communicate, express themselves, and work. In college, I majored in computer science and psychology, and the intersection of these two fields has always been very crucial to me. This is also where my strong drive lies. I don't know how to explain it, but deep down, I always feel that if I don't create something new, then I'm doing something wrong. Even when we are developing business plans to invest $100 billion in artificial intelligence or the metaverse, our plans have clearly indicated that if these projects succeed, they will bring huge returns.

Of course, you cannot determine everything from the beginning. People always have various debates and doubts. Just like "How can you have enough confidence to do this?" For me, if one day I stop trying to create new things, then I have lost myself. I will continue to create elsewhere. Fundamentally, I cannot imagine myself just operating something without trying to create new things that I find interesting. For me, the question of whether we should try to build the next thing is not a problem. I just cannot stop creating. Not only in the field of technology, but also in other aspects of my life. For example, our family built a ranch in Kauai, and I personally participated in all the design work for the construction. When we started raising cows, I thought, "Well, I want to raise the best cows in the world." Then we started planning how to build everything we needed to achieve this goal. That's me!

Patel: I have always been curious about one thing: when you were only 19 years old in high school and college, you read a lot of ancient and classical books. I want to know what important lessons you learned from these books. Not just the content you found interesting, but more importantly, considering the limited scope of knowledge you were exposed to at that time.

Zuckerberg: One thing that fascinated me was how Caesar Augustus became emperor and worked to establish peace. At that time, people did not have a true concept of peace. Their understanding of peace was just a brief respite before the enemy attacked again. He had the foresight to change the economy from relying on mercenaries and militarism to achieving peace and prosperity, which was a very novel idea at the time. This reflects a very basic fact: the boundaries of what people could imagine as a reasonable way of working at that time.

This concept applies not only to the metaverse but also to fields such as artificial intelligence. Many investors and others find it difficult to understand why we would open source these technologies. They might say, "I don't understand, if it's open source, won't it shorten the time for you to develop proprietary technology?" But I think this is a profound concept in the technology field, and it actually creates more winners. I don't want to overemphasize this analogy, but I do believe that many times, people find it difficult to understand the model of building things, find it difficult to understand why it would be valuable for people, or why it would be a reasonable state in the world. In fact, there are many more reasonable things than people imagine.

Patel: That's really interesting. Can I share my thoughts? It may be a bit off-topic, but I think it might be because some important figures in history had already made a mark at a young age. For example, Caesar Augustus was already an important figure in Roman politics at the age of 19, leading battles and establishing alliances. I wonder if at 19, you also had similar thoughts: "Since Caesar Augustus did it, I can do it too."

Zuckerberg: That's indeed an interesting observation, which not only comes from rich history but also resonates with our American history. I really like a quote from Picasso: "Every child is an artist. The challenge is how to remain an artist once he grows up." When we are young, we are more likely to have crazy ideas. In your life, company, or anything you build, there is a parallel to the dilemma of an innovator. In the early stages of your career, you are more likely to adjust direction, embrace new ideas, without being hindered by commitments to other things. I think this is also an interesting part of running a company: how to maintain vitality, how to continue to innovate?

7 Open Source Model Worth $10 Billion

Patel: Let's get back to the topic of investors and open source. Imagine we have a model worth up to $10 billion, and this model has undergone rigorous security assessments. At the same time, evaluators can also make adjustments to the model. Would you open source a model worth $10 billion?

Zuckerberg: If it is beneficial for us, then open sourcing is a consideration.

Patel: But would you really do that? After all, it is a model that has incurred $10 billion in research and development costs, and now it needs to be open sourced.

Zuckerberg: This is a question we need to carefully weigh over time. We have a long tradition of open source software. Generally, we do not directly open source products, such as Instagram's code. However, we do open source a lot of underlying infrastructure. For example, one of our largest open source projects historically is the Open Compute Project, where we open sourced the designs for servers, network switches, and data centers. Ultimately, this has brought us huge benefits. Although many people can design servers, the entire industry now basically revolves around our design. This means that the entire supply chain is built around our design, improving production efficiency, reducing costs, and saving us billions of dollars. It's really great.

Open sourcing can help us in many ways. One way is that if people can find more cost-effective ways to run the model, it will be a huge advantage for us. After all, our investment in this will reach tens of billions, or even hundreds of billions of dollars. Therefore, if we can improve efficiency by 10%, we will be able to save tens of billions or hundreds of billions of dollars. Moreover, if there are other competing models in the market, our open source behavior will not give a crazy advantage to a particular model. On the contrary, it will promote progress and development in the entire industry.

Patel: How do you view the trend of commercializing model training?

Zuckerberg: I think there are multiple possibilities for the development of training, and commercialization is indeed one of them. Commercialization means that as the choices in the market increase, the cost of training will be greatly reduced, making it more affordable. Another possibility is the improvement of quality. You mentioned fine-tuning, and currently, for many large models, the options for fine-tuning are still quite limited. Although some options exist, they are generally not applicable to the largest models. If we can overcome this challenge and achieve a more extensive fine-tuning capability, then different applications or specific use cases will be able to demonstrate more diverse functionalities, or integrate these models into specific toolchains. This can not only accelerate the development process but also potentially lead to differentiation in quality.

Here, I'd like to use an analogy to illustrate. In the mobile ecosystem, a common issue is the presence of two gatekeeper companies—Apple and Google—that impose restrictions on the content developers build. From an economic perspective, it's like they charge high fees when we build something. But what concerns me more is the quality aspect. Many times, we want to release certain features, but Apple may reject them, which is indeed frustrating. Therefore, what we need to consider is whether we are setting up a world dominated by a few closed-model companies that control the APIs, thereby determining what developers can build in the realm of artificial intelligence. For us, I can say with certainty that we build our own models to ensure that we don't fall into this situation. We don't want other companies to restrict our innovation. From an open-source perspective, I believe many developers also don't want to be limited by these companies.

So, the key question is what kind of ecosystem will emerge around these models? What interesting new things will emerge? To what extent can they improve our products? I believe that if the development of these models can eventually be like our databases, caching systems, or architectures, the community will be able to contribute valuable value to make our products even better. Of course, we will still strive to maintain uniqueness and not be too influenced. We will be able to continue to focus on our core work and benefit from it. At the same time, with the development of the open-source community, all systems, whether our own or the community's, will be improved and enhanced.

However, there is also a possibility that the models themselves may eventually become products. In this case, the decision to open source would require more complex economic considerations. Because once open sourced, it is equivalent to commercializing our models to a large extent. But from what I have observed so far, it seems that we have not reached that stage.

Patel: Do you expect to generate substantial revenue by licensing your models to cloud providers? In other words, do you hope they will pay fees to offer model services on their platforms?

Zuckerberg: Yes, we do expect to reach such licensing agreements with cloud providers and hope to generate substantial revenue from them. This is essentially what we have set for Llama's licensing agreement. In multiple dimensions, we have adopted a very permissive open-source licensing strategy, providing broad usage rights for the community and developers. But we have set restrictions for the largest companies using it. These restrictions are not aimed at preventing them from using the models, but rather to engage in communication and negotiation with us when they intend to directly use the models we have built for resale and commercial gain. If it's cloud service providers like Microsoft Azure or Amazon AWS, intending to resell our models as part of your service, then we expect to receive a certain share of the revenue from it.

Patel: Your viewpoint on power balance is very reasonable, and we do need to consider how to eliminate potential harm through better technical alignment or other methods. I hope Meta can establish a clear framework, as other labs have done, to clearly define when open source or potential deployment is not feasible in certain specific situations. Such a framework not only helps the company prepare for potential risks but also sets expectations for people.

Zuckerberg: You're right, the issue of existential risk is indeed worth our deep attention. However, at the moment, we are more concerned about content risk, that is, the models may be used for creating violence, fraud, or other harmful behaviors. Although discussing existential risk may be more appealing, in reality, what we need to focus on now is mitigating this more common harm. For current models, and possibly even the next generation of models, we need to ensure that they are not used for fraudulent or malicious behavior. As a large company, Meta has a responsibility to ensure that we do well in this regard. Of course, we also have the capability to address both aspects of the issue.

Patel: Regarding open source, I am curious about whether you think the impact of open-source projects like PyTorch, React, Open Compute, on the world, could possibly surpass Meta's influence in social media? I have spoken with users of these services, and they believe that this possibility exists, after all, much of the internet's operation relies on these open-source projects.

Mark Zuckerberg: Our consumer products do have a huge user base globally, covering almost half of the world's population. However, I believe that open source is becoming a new and powerful way of building. It may be like Bell Labs, where initially they developed transistors to enable long-distance calls, which indeed achieved that goal and brought them substantial profits. But 5 to 10 years later, when people look back at their proudest inventions, they may mention other technologies with more profound impact. I firmly believe that many of the projects we are building, such as Reality Labs, certain AI projects, and some open source projects, will have a lasting and profound impact on human progress. Although specific products will continue to evolve, emerge, and disappear over time, their contributions to human society will be enduring. This is an exciting part that we as technology practitioners can collectively participate in.

Patel: Regarding your Llama model, when will it be trained on your custom chip?

Zuckerberg: Soon, we are working hard to push this process forward, but Llama-4 may not be the first model to be trained on a custom chip. Our strategy is to start with inference tasks such as ranking, recommendations, etc., like Reels, news feed ads, which previously consumed a lot of GPU resources. Once we can move these tasks to our own chips, we can use more expensive NVIDIA GPUs to train more complex models. We expect that in the near future, we will be able to use our own chips to first train relatively simple models and eventually expand to train these large models. Currently, this project is progressing smoothly, and we have a clear and long-term plan that is advancing steadily.

Patel: Assuming you become the CEO of Google+.

Zuckerberg: Google+? Oh, I don't know.

Patel: Okay, then the real last question would be: Did you feel pressure when Google launched Gemini?

Zuckerberg: The thing is, Google+ does not have a CEO; it is just a department within Google. In my view, for most companies, especially those that have reached a certain scale, focus is crucial. Startups may be tight on funds, they are validating an idea, and may not have all the necessary resources. But as the business grows, companies will cross a certain threshold, start building more elements, and create more value between these elements. However, unexpected and delightful things will always happen in a company, and these are valuable. But overall, I believe that a company's capacity is largely limited by the scope of affairs that the CEO and management team can oversee and manage. Therefore, for us, prioritizing the main affairs and focusing on key matters as much as possible is extremely important. As venture capitalist Ben Horowitz said, "The main thing is to keep the main thing the main thing."


Copyright © 2024 newsaboutchina.com