2023.05.22 CHEN, Wei、YAN, Binglin
On April 11, 2023, the Cyberspace Administration of China (“CAC”) released the “Measures for the Administration of Generative AI Services (Draft for Public Comments)” (“Draft”) to solicit public opinion.
The emergence of ChatGPT has triggered a wave of generative artificial intelligence (“Generative AI” or “AIGC”) products, and concerns associated with data abuse, invasion of privacy, false information and ethical issues have been raised. Europe and the United States have begun to regulate AIGC products, and the release of this Draft indicates that China’s AIGC regulatory framework will also be formed and implemented in the near future.
Containing 21 articles, the Draft touches on the regulatory principles and specific requirements regarding AIGC products and services. It is expected that the time-to-market of domestic AIGC products and services will be impacted after the implementation of the official regulations, and some even may be put on hold due to failure in meeting the regulatory requirements.
The Draft has quickly become the most talked-about topic since its release. In this article, we share our observations and comments regarding the potential impact to the industry.
1. Scope of Application
Article 2 of the Draft provides that these measures shall apply to the research and development and use of AIGC products to provide services to the public in the People’s Republic of China.
Our Comment: According to the Draft, these measures shall apply to AIGC services to the public in the People’s Republic of China. The question is whether these measures also apply to foreign products which provide registration portals for domestic users. Under the current PRC legal regulatory framework, unless explicitly provided by laws and regulations and approved by the relevant competent authorities, a legal entity shall be established in China in order to provide information services within China. Therefore, if an overseas AIGC product provider provides domestic users with a registration portal to the effect of providing services to the public in China, the regulators may require the overseas AIGC product provider to establish an entity in China to provide such services and these AIGC measures shall apply.
2. Service Providers
Article 5 of the Draft defines the “Provider” of AIGC services as organizations and individuals which provide chat and text, images, sound generation and other services by using AIGC products, including organizations and individuals which provide programmable interfaces and methods to support others to generate their own text, images, sound, etc. The Provider shall be subject to the regulatory requirements for service providers in the Draft and undertake the responsibilities as the producer of product-generated content and personal information protection obligations.
Our Comment: Currently there are three major categories of AIGC products. The first is the basic large language model, the second is the finetuning of big models in vertical industries, i.e. the vertical big model, and the third is applications made by the API of the big model. According to Article 5 of the Draft, it appears to us that the participants of the above three types of products should all be identified as “Providers” and therefore be subject to the regulatory requirements for service providers in the Draft. However, according to Article 2 of the Draft, the scope of application of these measures should be service providers who provide services to the public in China. Therefore, if a basic large language model provider only provides its products to a vertical big model provider and not to the public, the Draft may not apply. We understand this might be clarified with the release and implementation of the official regulations.
3. Pre-Conditions for Services
Article 6 of the Draft sets forth two conditions precedent for providing services by AIGC products to the public: first, to make a security assessment reporting to the CAC in accordance with the Provisions on Security Assessment of Internet Information Services with Public Opinion Attributes or Social Mobilization Capability; and second, to complete algorithm filing procedures in accordance with the Internet Information Service Algorithm Recommendation Management Regulations.
Security Assessment of Internet Information Services
According to the Provisions on Security Assessment of Internet Information Services with Public Opinion Attributes or Social Mobilization Capability, service providers may conduct a security assessment by itself or by entrusting a third party and submit the security assessment report to the local counterparts of the CAC and the Public Security Authority.
The security assessment focuses on the following aspects:
(i) determination of the person in charge of security management and the information auditors who should be competent to support the relevant services, or the status of the establishment of security management bodies;
(ii) true identity verification of users and registration information record-keeping measures;
(iii) record-keeping measures of users’ account number, operation time, operation type, network source address and destination address, network source portal, client-side hardware features and other log information, as well as information posted by users;
(iv) prevention and countermeasures against illegal and harmful information in users’ account and communication group name, nickname, profile, notes, logo, posted information, forwarded information, comments and communication groups and other service functions, and preservation measures of relevant records;
(v) technical measures to protect personal information, to prevent the spread of illegal and harmful information and to prevent losing control of the social mobilization functions;
(vi) the establishment of complaint and reporting systems, publication of complaint and reporting methods and other information, timely receipt and handling of complaints and reports;
(vii) the establishment of work mechanisms to provide technical, data support and assistance for the CAC in the supervision and management of Internet information services in accordance with the law;
(viii) the establishment of work mechanisms to provide technical, data support and assistance for public security authorities and national security authorities in the course of maintaining national security and investigating and disposition of illegal crimes.
According to the Internet Information Service Algorithm Recommendation Management Regulations, the contents of algorithm filing include the name of the service provider, service form, application area, algorithm type, algorithm self-assessment report, the proposed publication content and other information. Though the Internet Information Service Algorithm Recommendation Management Regulations requires the algorithm filing to be completed within ten working days of the start of the service, the Draft changes the algorithm filing for AIGC products to prior to the product’s launch.
Our Comment: we can understand why the CAC determined AIGC as “Internet Information Services with Public Opinion Attributes or Social Mobilization Capability”, but compared to the existing regulations, the Draft strictly sets the security assessment and algorithm filing as pre-conditions for an AIGC product launch. This may prolong the time-to-market for domestic AIGC products and slow down the current situation wherein many Chinese companies are rushing to launch their own AIGC products.
4. Training Data Requirements
Article 7 of the Draft states that the service provider shall be responsible for the legitimacy of the sources of pre-training data and optimized training data of AIGC products. Pre-training and optimized training data for AIGC products shall meet the following requirements:
(i) meet the requirements under Cybersecurity Law and other laws and regulations;
(ii) contain nothing in violation of intellectual property rights;
(iii) if personal information is contained, the relevant individual’s consent shall be obtained or otherwise be in accordance with other circumstances specified in the laws and administrative regulations;
(iv) can ensure the authenticity, accuracy, objectivity and diversity of the data;
(v) other regulatory requirements regarding AIGC services by the CAC.
Our Comment: Large amounts of data are involved during pre-training and optimization training in the course of AIGC research and development and services. In accordance with the requirements of Article 7 of the Draft, AIGC service providers shall ensure the sources and use of data comply with the laws and regulations and not infringe the intellectual property rights of others. Given the limited data sources publicly available in China, the current big model training process in China may use publicly available data sources in other countries. We understand the complex access of data sources will make it difficult for the service providers to meet the requirements under Article 7 of the Draft. Accordingly, the time, labor and costs to meet the compliance requirements for data sources will also increase.
5. Manual Data Labeling Requirements
Article 8 of the Draft states that, if manual labeling is used in the development of AIGC products, providers shall make clear, detailed and operational labeling rules in accordance with the Draft, train the labeling personnel as necessary, and take samples to verify the correctness of the labeled contents.
Our comment: It is universally acknowledged that the development of AIGC products requires labeling of a large number of training data sets to mark the characteristics of the training objects in order to be used as basic material for machine learning. High-quality data annotation is the key to model training. Although there are already programmatic data labeling products in the market, the training process of models like ChatGPT use a huge amount of manual labeling in order to achieve accurate understanding of human instructions. Currently, the manual labeling required for the training of big models are using employees or manpower outsourcing or service outsourcing. It is understood that the proposed labeling rules and the necessary training requirements for labeling personnel in the Draft should meet the requirements of the current manual labeling practice in China. However, in the scenario of service outsourcing, the provider needs to be aware that it is the party that should abide by the legal requirements on manual labeling and therefore should request the manual labeling service company assist in meeting such legal requirements in the process of service outsourcing.
6. Prevention of False Information
Article 4 paragraph 4 of the Draft provides that the content generated by AIGC should be true and accurate, and measures should be taken to prevent the generation of false information.
Article 15 of the Draft further provides that, for generated contents in violation of the Draft which are discovered during operation or reported by users, in addition to taking measures such as content filtering, the provider should prevent recurrence of such contents by model optimization training and other means within three months.
Our comment: In AI language model training, one of the biggest challenges is how to stop the model from making up nonsense. However, users who tested using the idiom mapping function of certain large domestic model products found that the generated images are mostly beside the point. Even with the more advanced GPT-4, the generated content and the correct answer are sometimes the opposite of each other, and the AI is “talking nonsense”. Are these AI “fabricated” contents false information? If so, according to Article 15 of the Draft, the service provider should prevent recurrence of such contents by model optimization training and other means within three months. However, is three months of optimization training enough time, when the so-called “false information” is actually a manifestation of the immature product performance in the process of large language model development? If it is not enough time, will the CAC order service providers to cease services and take the products off the market, and thus deprive the service providers of opportunities to continue optimization training through actual use?
7. International Cooperation
Article 3 provides that China supports the independent innovation, promotion, application, and international cooperation of artificial intelligence algorithms, frameworks and other basic technologies, and encourages to prioritize the use of secure and trusted software, tools, computing and data resources. “International cooperation” is specifically mentioned as one of the national supported directions for the development of AIGC.
Our comment: Except for the beginning, the Draft does not mention the feasible model or exploration direction of international cooperation.
Around September 2022, the US Department of Commerce issued notices to restrict NVIDIA and AMD to export NVIDIA’s A100 and H100 and AMD’s MI 250 series and future high-end GPU products to China. The export restrictions of such high-end GPU chips to China will greatly slow down the AIGC big model training in China. To make AIGC productive in various industries in China as soon as possible and avoid China falling behind in the AI era, it is necessary to explore paths of international cooperation by introducing GPT-4 or more advanced AIGC products.
Considering the overall regulatory requirements of the Draft, especially the market entry conditions and regulations of service providers, it seems that international cooperation should involve cooperation between a Chinese company and an overseas AIGC products provider, with the Chinese company acting as the access point for providing the service and acting as the service provider in order to satisfy the regulatory requirements under the Draft.
It is worth mentioning that Article 17 of the Draft provides that the service provider should provide the necessary information that can influence the users’ trust and choice, including the source, size, type, quality and other descriptions of the pre-training and optimization training data, manual labeling rules, the size and type of the manually labeled data, and the basic algorithms and technical systems if requested by the CAC and the relevant authorities. Such reporting obligation does not differentiate international cooperations and domestic AIGC products. It is conceivable that such information reporting obligation will create barriers to international cooperation. Therefore, in the scenario of international cooperation, if the Chinese service provider meets the other regulatory requirements under the Draft, is it possible to waive such information reporting obligation to certain extent for the international cooperation of AIGC?
The release of the Draft reflects the regulator’s positive attitude in supporting the development of Generative AI and draws a bottom line for its healthy development so as to avoid the wild growth of Generative AI. Providers of AIGC products and services should undertake responsibilities as producers of product-generated content, implement cybersecurity responsibilities, keep an eye on generated content from multiple perspectives such as ideology, privacy protection and intellectual property protection, and promote the healthy and legitimate development of AI technology within the regulatory framework.
Although the Draft covers most of the important aspects of AIGC, some of the principled provisions are subject to more detailed regulations. We will keep abreast with the formal Measures for the Administration of Generative AI Services to be issued by the CAC in the future and other regulatory provisions on AI and share our comments in due course.