CHINA LEGAL SCIENCE 2020年第4期｜我国数据爬取刑事制裁制度研究：基于对使用网络爬虫非法获取特定数据案的分析

日期:20-08-26 来源: 作者:zzs

RETHINKING CRIMINAL SANCTIONS ON DATA SCRAPING IN CHINA BASED ON A CASE STUDY OF ILLEGALLY OBTAINING SPECIFIC DATA BY CRAWLERS

Li Qian & Jiang Tao

I. INTRODUCTION

There was a company managing technical development, technical services, e-commerce, electronics, etcetera in the field of computer and network technology. One day, three people who worked for the company conspired to scrape the video data stored on the server of Beijing ByteDance Networking Technology Co., Ltd. Additionally, one of them prompted another person to continually bypass anti-scraping measures via files designated ‘tt_spider’ to scrape data, resulting in a loss of technical service fees of 20,000 yuan. On November 24, 2017, the People’s Court of Haidian District of Beijing made a criminal decision. The decision pointed out that the three people violated related national regulations and implemented technical means to obtain the data. This was considered to be very serious. Consequently, they were convicted of the crime of illegally obtaining computer information system data. This incident counts as the national first case of illegally accessing a computer information system to obtain the data from Beijing ByteDance Networking Technology Co., Ltd. by crawlers in China.

To have a better understanding of the case, we will first offer a relatively concise description of technical terminologies involved. Crawlers are a sort of tool which can be used for crawling to the deepest of the web pages. Data scraping usually refers to retrieving information from any source (not necessarily the web). So what are the differences between crawlers and data scraping? Generally speaking, there are at least two differences herein. One is that crawlers focus on how a sort of tools is used to download pages from the web, and data scraping emphasizes an act of extracting data from various sources involving the web. The other is that crawlers treat deduplication as an essential part, while data scraping does not treat deduplication as an essential part. Although crawlers differ from data scraping, interestingly, a fact in the case combines these two technical terminologies. It is commonly acknowledged that there are subtle discrepancies between potentially malicious and legitimate data scraping, especially when crawlers are used for data scraping. Given this situation, we intend to focus on data scraping by crawlers (hereinafter referred to as data scraping) when we analyze the case and further conduct related research in this article.

In a sense, data scraping in the case would present a particular challenge for specific data that should be protected because data scraping typically harms a particular victim, such as this company. In a broader sense, as one of many ‘upstream’ cyber threats, illegal data scraping has a secondary impact upon the subjects of the data like ‘downstream’ cyber threats such as fraud, extortion, etc., occurring when the data is subsequently monetized. It can be predicted that illegal data scraping will be more and more prolific and aggressive in the foreseeable future if data scraping is not used appropriately. Only by paying more attention to non-technical strategies for data scraping, such as its legality, can people benefit more from its widespread use. Otherwise, purely overemphasizing technology tends to create striking challenges for cyberspace, given the advent of the age of artificial intelligence (AI). Undoubtedly, as the embodiment of AI, data scraping will be prevalent shortly, intertwined with both good and evil.

This seemingly paradoxical two-sided effect particularly highlights how important it is to legitimately scrape data. In this regard, it has received attention from domestic academics in China. Two scholars focused on web crawler technology’s legal boundaries and argued that legal boundaries are guided by perspectives of information network transmission rights, the Anti-unfair Competition Law and the Anti-monopoly Law. Two other scholars analyzed the legal nature of data scraping based on case studies. Furthermore, only one scholar discussed the case Beijing ByteDance Networking Technology Co., Ltd., and stated that despite there being only one crime the justice had to choose, it remained deficient in that it failed to present who would claim the ownership of the video data.

From the above reviews, we think that prior discussions failed to provide a framework of criminal sanctions on data scraping in China, much less detailed interpretative strategies or legislative improvements for criminal sanctions on data scraping, leaving many disputes unresolved. In the Criminal Law of the People’s Republic of China (hereinafter referred to as the Criminal Law), the crime of illegally obtaining computer information system data contains the three key terms: ‘intrusions’, ‘obtain the data’ and ‘serious circumstances’. Can data scraping constitute this crime? More specifically, when the terms ‘obtain the data’ and ‘serious circumstances’ are interpreted, does data scraping match their meanings? What solutions should be taken into account to enhance the accuracy of tackling data scraping cases? To emphasize one point, the article is aimed at rethinking criminal sanctions on data scraping in China based on a case study of Beijing ByteDance Networking Technology Co., Ltd., as well as the analysis of several data scraping cases in the US. The article proceeds as follows.

Chapter II briefly comments on the case Beijing ByteDance Networking Technology Co., Ltd., illustrating how the court interpreted paragraph 2 of article 285 of the Criminal Law and relevant provisions. In the case Beijing ByteDance Networking Technology Co., Ltd., the court’s decision not only failed to distinguish the meaning of data from that of a computer information system but also failed to clarify why data scraping can be interpreted as the term ‘serious circumstances’. Chapter III summarizes the US experience and its three measures. In the US, academics and practitioners emphasize the measure of different values, striking a balance between use value and privacy protection, freedom of speech and informed consent, public access and technical support, etc. And then, they tend to adopt a contextual model to effectively explore what is hidden inside authorization provisions and damage or loss provisions under the Computer Fraud and Abuse Act (hereinafter referred to as the CFAA). Chapter IV adopts the contextualization-based individualized criminal sanctions on data scraping in China. And this chapter will also provide an ample analysis of how interpretative strategies and legislative improvements work in China when this model is used to impose criminal sanctions on data scraping mainly from the perspective of paragraph 2 of article 285 of the Criminal Law and relevant provisions. Chapter V concludes and calls for interdisciplinary research upon legal sanctions on data scraping.

II. THE BRIEF COMMENT ON THE COURT’S DECISION

This chapter includes two sections. The first section demonstrates how the court interprets the crime of illegally obtaining computer information system data as well as other provisions. The second section explores some deficiencies in the court’s decision, remarkably deserving further research.

A. The Interpretations in the Court’s Decision

The crime of illegally obtaining computer information system data derives from paragraph 2 of article 285 of the Criminal Law. The reason why the court applied paragraph 2 of article 285 of the Criminal Law is that scrapers violated related national regulations which are something other than article 285 and then implemented technical means to obtain the video data. The application of paragraph 2 of article 285 of the Criminal Law differs from that of paragraph 3 of article 285 of the Criminal Law. The reason for this difference is that paragraph 2 is about committing an act of intrusion, and paragraph 3 is about providing the tools that allow someone else to commit an intrusion.

As such, it is easier to conclude that any illegal or improper act of data scraping is likely to violate the legal system. More seriously, the violation of the Criminal Law will take place if the court adopts one interpretation imposing criminal sanctions. In the case Beijing ByteDance Networking Technology Co., Ltd., the court held that the misconduct (bypassing anti-scraping measures and obtaining the data) and consequences (a loss of technical service fees of 20,000 yuan) caused by scrapers can be interpreted as the term ‘serious circumstances’ in the Criminal Law. Nevertheless, the court failed to investigate the parameters of the term ‘data’ itself (because data is targeted by scrapers), which made for holes in the court’s decision.

B. The Deficiencies of the Court’s Decision

Determining why the court applied paragraph 2 of article 285 of the Criminal Law requires investigating the parameters of the term ‘data’ itself and clarifying why data scraping should be interpreted as the term ‘serious circumstances’, rather than merely concentrating on the misconduct of intruding into the computer information system and the consequence of a loss of technical service fees of 20,000 yuan. With the advent of the age of AI, it is harder to thoroughly resolve disputes and effectively impose criminal sanctions on data scraping than ever before. It is also associated with the workings of complex mechanisms (dubbed the application of technology), as well as how the court tailors its work to impose criminal sanctions on subsequent data scraping cases (dubbed the interpretation of the law).

Given the inherent tension between the application of technology and the interpretation of the law, it is important to elaborate on what deficiencies in the court’s decision are. As Derek Leben discussed in his new book, Ethics for Robots: How to Design a Moral Algorithm, ‘the moral challenge for police and military robots is: how to manage the correct response to a criminal or enemy who is actively threatening others?’ In one sense, we argue that this view will help make the court’s decision of Beijing ByteDance Networking Technology Co., Ltd. more reasonable. Deficiencies in the court’s decision include two sides.

On one side, the court’s decision failed to distinguish the meaning of data from that of a computer information system. In the criminal justice system, courts did not sufficiently analyze the meaning of the term ‘data’ (only a few references to it), much fewer differences among related concepts including information, computer information system, etc. The lack of a set meaning for data may lead to error-prone judicial determinations. The reason why it occurred is that people did not attach importance to the required status of data, which must be considered thoroughly and systematically.

In the case Beijing ByteDance Networking Technology Co., Ltd., the court’s decision only described a loss of technical service fees that the company had to pay to fix the problem. This is neither the real purpose which scrapers initially achieved nor the true loss, let alone the inherent value of the court’s decision. It is noted that the main parameters of the term data that are increasingly prominent as they reveal different legal interests should be taken into consideration when it comes to data scraping cases.

At a superficial level, although data is explicitly stipulated in paragraph 2 of article 285 of the Criminal Law, the legislative goal of this crime is not directly related to the protection of data, but directly related to the safety of computer information systems. It means that the legislature tends to regulate crimes that affect the safety of computer information systems rather than crimes that affect the protection of data. Following this difference, little attention has been paid to interdisciplinary research on data. The interdisciplinary research can develop new methodologies to what it could mean to adopt a synthetic approach to reinforcing the specialized protection of data. The specialized protection of data, having a wide-ranging consensus among cybersecurity scholars, is attributed to technical advantage and the intergenerational ground where network data transcends ‘information network’. As Chinese scholars proposed, in the article An Approach to Sanctioning Data Crimes in the Age of Big Data, ‘under the background of big data, data becomes the dominant target with the transition from information network to data focusing on data process.’

On the other side, the court’s decision also failed to clarify why data scraping can be interpreted as the term ‘serious circumstances’. The term ‘serious circumstances’ is commonplace in the Criminal Law. Scholars have long been concerned with its interpretation in that it is one of the most important components of some specific crimes. In the theory of the Criminal Law, scholars generally agree that a comprehensive approach is conducive to having a clear-cut perception of what circumstances are serious, albeit there have been controversies over what constitutes ‘serious circumstances’ as well as how to understand it in a certain way. A comprehensive approach is implemented to confirm typical objective factors’ amount of harm, which is beneficial to the interpretation of the term ‘serious circumstances’.

As for the case Beijing ByteDance Networking Technology Co., Ltd., the court held that three people violated related national regulations to commit an act of intrusion and obtained the video data, lacking a concrete analysis of scraping data, much less a reason why data scraping can be interpreted as the term ‘serious circumstances’. The analysis of scraping data is out of proportion to that of bypassing anti-scraping measures via the ‘tt_spider’ files. Such a disproportionate share is not sufficient to convince that Beijing ByteDance Networking Technology Co., Ltd. indeed suffered harm and would bear the costs of data scraping. It undertook the painstaking work of protecting those video data and confirmed that it reserved the right to defend against data scraping. The forays into this reservation of the right largely hinged on investigating whether scrapers had complied with terms of use (e.g., by clicking on a checkbox) posted on its website.

More significantly, article 1 of the Interpretation of the Supreme People’s Court and the Supreme People’s Procuratorate of Several Issues on the Application of Law in the Handling of Criminal Cases about Endangering the Security of Computer Information Systems, especially prescribing what circumstances are serious, is left out inadvertently in the court’s decision. This article contains three conditions regarding serious circumstances. The first condition refers to obtaining ten or more groups of identity verification information about online financial services; the second condition refers to obtaining five hundred or more groups of identity verification information other than online financial services; the third condition refers to other serious circumstances. Pursuant to the first condition and the second condition, video data cannot count as identity verification information. Therefore, the first two conditions are naturally excluded from the possible applied provisions.

Under the third condition, it is likely to be applicable, revealing that data scraping should be interpreted as the term ‘serious circumstances’. The rationale for this inference is that ‘other serious circumstances’ is a general term needed to be understood relying on the doctrine of ejusdem generis. However, the court had been mired in the analysis of whether the aftermath of data scraping is interpreted under the term ‘(other) serious circumstances’, and scrapers should be convicted of the crime of illegally obtaining computer information system data.

From the deficiencies of the court’s decision, it is suggested that criminal sanctions on data scraping will require significant changes involving interpretative and legislative aspects. But before discussing what these changes are required, learning from the US experience, which will be an asset when it comes to criminal sanctions on data scraping, is what we intend to discuss in Chapter III.

III. THE US CRIMINAL SANCTIONS ON DATA SCRAPING

This chapter mainly places data scraping under the CFAA, and analyzes typical cases, especially linked to criminal sanctions. The first section describes the CFAA-related provisions. The second section analyzes three typical data scraping cases. The third section clarifies three measures to resolve those ever-challenging criminal sanctions on data scraping in the US.

A. The CFAA-Related Provision

In 1986, Congress enacted the CFAA. Three additional sorts of computer crimes were added under the CFAA: a computer fraud offense; an offense for the alteration, damage, or destruction of information; and an offense for the trafficking of unauthorized computer passwords in certain circumstances. In the next successive amendments, the CFAA underwent various degrees of changes and revisions including the creation of some offenses and expanding its coverage. Given numerous adjustments to the CFAA, which had less to do with criminal sanctions on data scraping, we concentrate on the adjustments, including key phrases of the CFAA (e.g., without authorization and exceeds authorized access) and elements of criminal sanctions.

The phrases ‘without authorization’ and ‘exceeds authorized access’ are described in section 1030 of title 18 in the United States Code. Under what is specifically prohibited, data scraping brought under the CFAA can be mainly regulated under sections 1030(a)(2), 1030(a)(4) and 1030(a)(5). Section 1030(a)(2) makes it a crime when a person intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains: (A) information contained in a financial record of a financial institution; (B) information from any department or agency of the US; or (C) information from any protected computer.

Under this section, Congress substituted the phrase ‘exceeds authorized access’ for the overly-long phrase ‘having accessed a computer with authorization, uses the opportunity such access provides for purposes to which such authorization does not extend’. The Senate Judiciary Committee reported that the substitution was to simplify the language in section 1030(a)(2) of title 18 in the United States Code. Similarly, the House Judiciary Committee also reported that its aim was to merely clarify the language in existing law. In addition to this substitution, another associated with the alteration of mens rea requirement in section 1030(a)(2) of title 18 in the United States Code was from knowingly to intentionally. It substantially changed the element of criminal sanctions when prosecutors prove that persons enter one’s computer system with clear intent, without appropriate authorization. As the Senate Judiciary Committee report concluded, this substitution is to focus federal criminal prosecutions on those whose conduct evinces clear intent to enter, without proper authorization, computer files or data belonging to another.

The revision of section 1030(a)(4) was analogical to that of section 1030(a)(2). It added a key requirement that a person accesses a protected computer without authorization or exceeds authorized access, knowingly and with intent to defraud, violating the computer fraud provisions. Meanwhile, the defrauding requirement renders section 1030(a)(4) a new type of crime. It makes it a crime when a person knowingly and with intent to defraud, accesses a protected computer without authorization, or exceeds authorized access, and by means of such conduct furthers the intended fraud and obtains anything of value, unless the object of the fraud and the thing obtained consists only of the use of the computer and the value of such use is not more than 5,000 US dollars in any 1-year period.

As for section 1030(a)(5), it prohibits the exfiltration of data induced by the conduct that is linked to accessing a protected computer without authorization, causing damage or loss. Section 1030(a)(5) punishes one who: (A) knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer; (B) intentionally accesses a protected computer without authorization, and as a result of such conduct, recklessly causes damage; or (C) intentionally accesses a protected computer without authorization, and as a result of such conduct, causes damage and loss.

Additionally, sections 1030(e)(8) and 1030(e)(11) respectively reveal the meaning of ‘damage’ and ‘loss’. All of these provisions will lay the groundwork for case studies below.

B. Typical Data Scraping Cases

As mentioned earlier, data scraping cases tend to be brought under sections 1030(a)(2), 1030(a)(4) and 1030(a)(5), when brought under the CFAA. These sections have in common that they all require courts to interpret the phrases without authorization or exceed authorized access. How to reasonably interpret them has become necessary for courts to resolve similar cases and make correct judicial decisions. In an iconic article Twenty Years of Web Scraping and the Computer Fraud and Abuse Act in 2018, the US scholar Andrew Sellars roughly divided twenty years of data scraping cases into four phases of thinking around the critical question of when a scraper accesses a computer ‘without authorization’ or if it ‘exceeds authorized access’. We analyze three cases therein.

A good place to begin the analysis of scraping claims brought under the CFAA is the case EF Cultural Travel BV v. Zefer Corp. EF Cultural Travel and BV (hereinafter referred to as EF) and Explorica, Inc. (hereinafter referred to as Explorica) were competitors in the travel business. Zefer was a former employee in EF and later hired by Explorica. Zefer used a scraper tool to glean pricing data for two years from EF’s website. After receiving pricing data from Zefer, Explorica set its prices for the public, undercutting EF’s prices an average of five percent.

In the case of EF, the district court issued a preliminary injunction against Zefer based on the CFAA, holding that the use of the scraper tool was outside the scope of reasonable expectations of ordinary users. The district court did not foreclose the link between the use of the scraper tool and exceeded authorized access. Rather, it interpreted the phrase ‘exceeded authorized access’, using the reasonable expectations test. The defendant appealed and the court of appeals held that the reasonable expectations test was not the proper gloss for determining a lack of authorization for the purposes of the CFAA provisions setting forth the offense of fraudulently accessing a protected computer without authorization.

The second case is US v. Nosal. David Nosal was an employee in a company. When he left the company, he asked several people who were still working for the company to help him start a competing business. These people used their log-in credentials to download source lists, names and contact information from a confidential database on the company’s computer, and then transfer this information to Nosal. These people were privileged to access the database, but the company had a policy that banned the disclosure of confidential information.

As a consequence, Nosal was charged with violations of section 1030(a)(4)in title 18 of the United States Code, for abetting these people in exceeding their authorized access with purposes to defraud. The court held that the phrase ‘exceeds authorized access’ does not expand to the scope of violating use restrictions. Meanwhile, the court tried to make the phrase ‘exceeds authorized access’ clear, limited to violations of restrictions on accessing data.

The third case, Craigslist Inc. v. 3Taps Inc., needs to be examined. Craigslist operated a widely-used website that allows users to submit and browse classified advertisements. The defendant 3Taps aggregated and republished ads from Craigslist by copying content from its website. Craigslist adopted two remedial measures to block this conduct. The first was a cease and desist letter sent to 3Taps, showing that this conduct was unauthorized. The second was a configuration Craigslist built to obstruct deviant access from IP addresses.

In Craigslist Inc. v. 3Taps Inc., the parties agreed that 3Taps intentionally accessed a protected computer and copied data from it. The only controversy is whether 3Taps’ conduct was without authorization. In terms of the plain language, this conduct was without authorization when 3Taps continued to glean data from Craigslist’s website after Craigslist revoked its authorization to access the website. The Ninth Circuit interpreted without authorization as computer owners having the power to revoke the authorizations they grant. Therefore, when the permission for 3Taps was validly rescinded, 3Taps’ further conduct was ‘without authorization’.

C. The US Measures to Criminal Sanctions on Data Scraping

The typical data scraping cases, we analyzed earlier, are merely the tip of the iceberg. There are more cases not included in the preceding analysis. From the typical data scraping cases, it can be concluded that the interpretation of the phrases ‘without authorization’, ‘exceeds authorized access’ and ‘damage or loss’, has received great attention from both academics and practitioners. In the US, academics and practitioners emphasize the measure of different values, striking a balance between use value and privacy protection, freedom of speech and informed consent, public access and technical support, etc. And then, they tend to adopt a contextual model to effectively explore what is hidden inside authorization provisions and damage or loss provisions under the CFAA. The contextual model mainly derives from privacy theory and practice in the US. Understanding privacy in specific contextual situations instead of seeking to illuminate an abstract conception of privacy is a hallmark of the contextual model. Furthermore, Professor Helen Nissenbaum proposed a famous theory of contextual integrity further rationalizing the contextual model, which tends to make privacy distribute and flow in a specific context as well as respect contextual norms of information flow. Over the past few years, although the contextual model has had different versions, the essence of this model still highlights how important it is to consider a specific context when academics and practitioners face privacy issues. According to the essence of the contextual model, the US gradually outlines the contextual model to criminal sanctions on data scraping. For the ease of discussion, we summarize the US experience in the contextual model to criminal sanctions on data scraping and expand on three measures (the first two are associated with interpretation and the last is associated with legislation).

Interpreting the CFAA arises in the context of a further point of academic interest in the US, whether the CFAA should apply and what the breadth of the CFAA should be have already become heated discussions. However, it may be clear that a web crawler accessing a computer (information) system or a relatively private website and scraping it for data, could violate the CFAA. Web crawlers can operate in violation of a terms-of-service agreement, and they desired to scrape data from specific areas by using servers’ resources, leading to the exfiltration of data and infringements on important benefits.

1. The Concrete Interpretation of Phrases Regarding Authorization. — Phrases regarding authorization (e.g., without authorization) should be concretely interpreted depending on various situations. To determine what they mean, courts often seek to ascertain whether specific data breaches are committed by hackers. Sometimes this is not easy. In a computer (information) system or on a website, an authorization notice determines what a user can browse and obtain. This situation presents an accepting standard: without hacking, a user accesses an area within the scope of authority, his access to that area is legitimate.

Additionally, if a user can access an area by credentials (e.g., username and password) warranting legitimate access within the scope of authority, his access is also valid. In contrast, concerning data breaches, when the access is described as without authorization, it typically means that black-hat hacking has occurred, or that a user does not have credentials to access an area, such as a computer (information) system. Based on these situations, the contextual model could be taken seriously on the ground that a concrete interpretation clarifies the landscape of phrases regarding authorization.

For one thing, a workable mechanism of distinguishing properly using data from illicitly accessing data can be adopted to make the interpretation concrete. In US v. Nosal, the Ninth Circuit did not exclude the possibility that hackers would be inside employees. On the contrary, the court interpreted without authorization in a way that applied to outside hackers who have no access to a computer. By this token, only interpreting authorization concretely conformed to the plain meaning of the statute. Meanwhile, as stated above, ‘exceeds authorized access’ is limited to access restrictions, not use restrictions. In terms of this consideration, the court’s focus was on technical means by which data was obtained. Professor Orin S. Kerr explicitly pointed out that courts limit access without authorization according to the circumvention of code-based restrictions rather than contract-based restrictions.

For another, the underlying principle of constitutionality can also play a vital role in concretely interpreting related phrases. When courts interpret the criminal law, the principle of constitutionality highlights that citizens’ acts exercising fundamental rights stipulated in the Constitution cannot be explained as a crime. The principle aims at exploring the scope of the CFAA provisions. Supposing that people do not adhere to the principle to concretely interpret related phrases, the uncertainty surrounding the scope of these phrases may result in arbitrarily criminalizing ordinary behaviour (some of them are beneficial to data sharing).

This principle was deemed a tool to explain how to concretely interpret ‘without authorization’ in Craigslist Inc. v. 3Taps Inc. By using it in favor of defendants, this case evinced that when the court interpreted authorization concretely, the defendants’ acts not causing damage or loss could be excluded from criminal sanctions. Similarly, Professor Kerr proposed that it requires courts to adopt a narrow interpretation of unauthorized access. Given this view, clear notice of what would impose criminal sanctions and what norms need to be complied with should be given to citizens. In a sense, adopting a contextual model that emphasizes the values of different situations will facilitate the concrete interpretation of phrases regarding authorization, making its scope relatively limited.

2. The Concrete Analysis of Conditions Regarding Damage or Loss. — Analyzing whether there is damage or loss and if any, how to assess it will outweigh the concern about authorization and access, especially when scrapers are invading data hosts’ interests in unprecedented ways. The aftermath of data scraping lies in not only unauthorized access or exceeded access, but downstream illicit acts (e.g., data leaks, illicit data transfers, and unjustified monetization of data). Among them, disputes over damage or loss are becoming increasingly crucial when it comes to interpreting the CFAA.

In EF Cultural Travel BV v. Explorica, Inc., the court paid closer attention to whether the data scraping can be attributed to damage or loss. The court noted that EF had suffered a loss because of scraping data by Explorica, according to a purpose that damage or loss is to target remedial expenses borne by victims that could not properly be considered direct damage caused by a computer hacker. Since EF had been compelled to implement specialized measures to ‘evaluate whether the website had been jeopardized’, they had suffered a loss. Although Explorica alleged that its act neither caused any physical damage nor placed any stress on EF’s website, the court insisted that EF indisputably suffered a detriment and a disadvantage by expending substantial sums to estimate the extent, if any, of the physical damage to the website caused by Explorica’s invasion. On the foregoing grounds, expenses of at least 5,000 US dollars resulting from such invasion are ‘losses’, and they can be attributed as damage or loss.

This case represents a tendency of courts to increasingly underscore the analysis of costs, helping interpret damage or loss in actual situations, rather than purely determining whether scrapers’ acts violate the authorization provisions. That is to say, courts should carefully scrutinize the basis of any CFAA ‘loss’ in data scraping cases to determine what, if any, harm actually occurred. Under this circumstance, the questions of what would count as costs, when costs would be regarded as damage or loss and how damage or loss would be eventually calculated in the criminal justice, consist of interpretations of the CFAA.

Perhaps more significantly, scholars of the US have been cognizant of these questions and shed light on them. For instance, the US scholars Zachary Gold and Mark Latonero proposed that when courts impose criminal sanctions on data scraping, there will be a policy dilemma involving the disclosure of a significant amount of metadata about users, threatening data security, invading the privacy of internet users, misusing data, etc. Furthermore, another US scholar, Jeffrey Kenneth Hirschey, argued that several pragmatic factors might be taken into account before bringing suits, such as user demand for data, potential benefits of scraped data and public relations implications.

Given these views, it can be found that the CFAA has been elevated to data protection law going beyond the anti-hacking law, basically remaining consistent with one of the most influential views revealing that the CFAA is a major piece of federal sector-specific legislation. In addition to interpreting the CFAA, the US experience also encompasses the legislation. The legislation will be later discussed around the amendment of both the CFAA and other related regulations.

3. Creating Multifunctional Provisions in the CFAA. — The CFAA should be amended to be unambiguous through the addition of multifunctional provisions. The CFAA defines ‘exceeds authorized access’, but fails to define the words authorization and access. Though damage or loss provisions are stipulated in the CFAA and explained in various ways, they are still thought to be ambiguous because of the absence of relatively certain rules of calculation. As Professor Kerr noted, at the time Congress passed the CFAA, it probably did not realize how complex these concepts were (and would become) in the computer context, resulting in a startling level of ambiguity. With the widespread use of cloud computing and automated technology, the complexity of this problem will remain more apparent.

As every principle has its exceptions, the CFAA should be amended to contain explicit exceptions, like the copying of data, reversing engineering and security research. The exceptions can be carved out to appropriately apply the CFAA to data scraping cases by modifying the CFAA provisions.

In addition to modifications to the phrases ‘exceeds authorized access’ and ‘without authorization’, inserting the phrase ‘and the fair market value of the information obtained exceeds 5,000 US dollars’ after ‘financial gain’ constitutes the exceptions of the damage or loss provision. Especially for the insertion after financial gain, it would tremendously clarify the meaning of the damage or loss provision because adopting the contextual model can shape what the damage or loss means. In particular, the damage or loss provision can be modified by (i) analyzing the nature of the data and its value; (ii) distinguishing between access controls which by default deny all traffic and those that by default admit all traffic; and (iii) analyzing the reasonableness of incurring expenses under the circumstances. By doing so, courts not only avoid punishing innocuous activity but incent proper web security.

IV. CHINA’S SOLUTIONS TO CRIMINAL SANCTIONS ON DATA SCRAPING

As mentioned earlier, we obtain the US experience and conclude what is valued most when the US tackles issues over data scraping. In this sense, we propose that it is time to adopt the contextualization-based individualized criminal sanctions on data scraping in China. In Chapter IV, we will offer domestic solutions to criminal sanctions on data scraping, based on this application, involving interpretative strategies and legislative improvements.

A. Interpretative Strategies

In China’s criminal justice system, it is necessary to apply the contextualization-based individualized principle to reasonably interpret related provisions. Particularly when courts face the relationship between data scraping cases and the crime of illegally obtaining computer information system data, legal interests and the term ‘serious circumstances’ need to be considered to reinforce interpretative strategies. In this sense, we will also offer solutions to criminal sanctions that data scraping may embody.

1. The Interpretation of the Term ‘Intrusions’ in Different Situations. — It is easier to comprehend that if someone is charged with the crime of illegally obtaining computer information system data, the three key terms ought to be met: ‘intrusions’, ‘obtain the data’ and ‘serious circumstances’. Due to the loss in the form of a specific quantity, if the requirement for this quantity is met in the context of the above judicial interpretation, there is not any requirement for what kind of data a scraper must obtain. Hence, the term intrusions is a central focus of this crime.

On the one hand, not every ‘bypassing restriction on access’ can constitute the term intrusions. Under the legislative goal of this law, it can be understood that it is to penalize anybody who breaks through the hosts’ safeguards by hacking. As the CFAA is an anti-hacking provision enacted to prohibit hacking to illicitly access data, the legislative goal of this law in China is analogous to that of the CFAA provisions in the US. It thus appears that an illegal intrusion refers to a situation where access without authorization bypasses an authentication gate. In order to render access as being without authorization, a website must be configured to not respond to every request.

This requires not just any type of code-based limitation, but a code-based authentication gate that lets only authorized users in and keeps unwanted individuals out. In this instance, an IP address block or a restriction on browsing frequency is not a barrier to access. It is because an IP address block only restrains specific sorts of browsing without preventing specific individuals from browsing a computer information system. More significantly, the scraped data tends to be publicly available, and the availability is indispensable to how cyberspace works. Otherwise, curbing it may stifle the free flow of data (the free flow of data is encouraged and warranted by principles of constitutional law), and eventually lower users’ motivations for widely sharing data with others.

On the other hand, even if the access without authorization satisfies the requirement of bypassing an authentication gate, it should not be taken for granted that this access is bound to be an illegal intrusion in this crime. In most instances, the access without authorization, satisfying the requirement of bypassing an authentication gate, constitutes the term intrusions in this crime. However, there are a few exceptions, when bypassing an authentication gate does not reach a degree of serious circumstances.

The degree of serious circumstances, which will be discussed later, not only refers to the term ‘obtain the data’, but the term ‘intrusions’. It signifies that a certain act appears to bypass an authentication gate, but if it causes remarkably minor and no serious harm, it is not deemed a crime. For example, imagine a scraper unconsciously breaks into a website or a computer information system, bypassing an implied authentication gate, and then withdraws from the access as soon as possible without triggering any physical damages. Based on the above reasons, we insist that this access should not be an illegal intrusion in this crime.

Due to the inadvertent access, the contextualization-based individualized principle can be applied to exclude those situations where scrapers do not incur any physical damages. One of the most visible merits of this principle is in favour of the scraper when the meaning of the term intrusions is relatively broad. As Craigslist Inc. v. 3Taps Inc. mentioned earlier, courts narrowly interpret authorization, giving rise to the exclusion from criminal liability for the defendants’ acts not causing damage or loss.

2. The Specific Interpretation of the Aftermath of Data Scraping. — Instead of intrusions, data scraping results in data copying, data leaks and data transmission. Data copying is directly associated with data scraping. Data breach and data transmission are often regarded as downstream data breach. In China’s criminal justice system, these three types of acts are extremely common, having cascading effects on the interests of different data hosts. However, criminal justice tends to overlook this reality, leaving many questions unanswered. What is striking about these questions is not whether they need to be emphasized, but rather how they need to be addressed. Viewed in this light, the interpretation of the aftermath of data scraping is the core of these questions.

On one side, the fundamental parameters of the term ‘data’ should be underscored. The crucial lesson from the US experience is that analyzing data scraping cases entails determining whether affected details of scraped data trigger damage or loss provisions. To handle it, the US courts wield significant power in assessing how much work they must do to look back and forth between facts of cases and legal requirements. The experience motivates China’s courts to shed light on analyzing it by tailoring the intensity of the analysis of the aftermath of data scraping.

Taking a data scraping case as an example, the key reason why the court convicted three scrapers of the crime of illegally obtaining computer information system data is the dominant analysis of the term ‘intrusions’. The court only mentioned that the three scrapers obtained customer data and then made an illegal profit of 37,000 yuan. As a matter of fact, the court was particularly well poised to undertake a role in analyzing whether fundamental parameters of the term ‘data’ were harmed or how the fact could be interpreted as the term ‘serious circumstances’. For instance, the volume of data was fairly large, and data sensitivity to customers was incredibly high. Additionally, the timing of data acquisition was particularly long and scraped data was illicitly frequently used. All of these situations can confirm whether the scrapers violated the term ‘serious circumstances’ and whether the scrapers would be convicted of the crime. As a Chinese professor proposed that it is incredibly vital to analyze the unlawfulness of data scraping by crawlers and related criminal sanctions in a specific context. In a sense, this demonstrates that the contextualization-based individualized criminal sanctions will be conducive to the discovery of certain parameters of the term ‘data’.

In this regard, a Chinese scholar proposed that especially for the interpretation of the term ‘other serious circumstances’, volumes, times, means and latent risks of obtaining the data are at least involved to evaluate whether infringements of legal interests and accountability reach a degree required for criminal sanctions. We hold that this viewpoint offers a solution to interpret related phrases when it comes to data scraping cases. In a broader sense, other Chinese scholars claimed that, in an era of big data, it is essential to comprehend data itself to strengthen the understanding of the two terms ‘intrusions’ and ‘obtain the data’. In this way, in order to adequately protect the legal interests of this law, interpreters can embrace a notion of data, quite distinct from that of data only confined to computer (information) system. In a word, what needs to be implemented is that fundamental parameters of the term ‘data’ should be explicitly analyzed during the process of interpreting related provisions when it comes to data scraping cases.

On the other side, how these parameters of the term ‘data’ affect criminal sanctions on data scraping should also be underlined in the court’s decision. To some extent, criminal sanctions on data scraping rely on both static parameters of the term ‘data’ and dynamic combinations of these parameters. In this sense, how to combine them indeed plays an important role in criminal sanctions on data scraping. Now that data scraping may have some negative effects, such as easily acquiring large amounts of data, leading to data leakage, affecting the legal uses of data, etc., what will courts do to present these negative effects? What analysis method will be chosen to clarify them? For these questions, truly, courts need to pick out parameters of the term ‘data’ which may affect criminal sanctions, so that courts will have an understanding of what is expected of them and how they rightly apply the Criminal Law to data scraping cases. Based on the contextualization-based individualized criminal sanctions, we expand on how to specifically interpret serious circumstances in the crime of illegally obtaining computer information system data, involving a multiplicative method and a paratactic method.

The multiplicative method is used when scrapers repeatedly intrude into computer (information) systems and then obtain the data, or they are in the system for a long time. This method is conducive to comprehensively measuring risk and degrees of intrusions, and obtaining the data. In the Internet era, we need to analyze parts of elements of facts which can generate new ones through the multiplicative method. For example, a scraper intruded into a computer (information) system and obtained specific data. The scraper repeated these acts five times afterwards, and each time obtaining almost the same volume of data. Assuming that the volume of data obtained by the scraper each time did not constitute serious circumstances, did a total of a volume of data collected by the scraper in five iterations constitute (other) serious circumstances?

To answer this question, it is feasible to use the multiplicative method to measure degrees of risk in this act. That is, courts can determine whether this act constitutes (other) serious circumstances mainly by calculating times multiply a volume of data each time. Certainly, each case has its peculiarity, and it is difficult to take a one-size-fits-all way when measuring degrees of risk in related acts. This method may be customizable in specific cases because multi-factor situations, like frequent intrusions into computer (information) systems, may also give rise to (other) serious circumstances.

The paratactic method is used when a scraper intrudes into a computer (information) system and obtains data, and elements of this act are in a parallel place. The parallel place refers to a condition where the elements of this act cannot be multiplied, but can be in a parallel place and simultaneously assessed. Emphasizing contextualization and applying them to the paratactic method, it will identify, analyze, evaluate, treat and review the risk triggered by data scraping.

We take the case Beijing ByteDance Networking Technology Co., Ltd. as an example. The court should have enumerated several elements of the act in a parallel place, such as illegally intruding into computer information via the ‘tt_spider’ files, obtaining a volume of data, a loss of technical service fees of 20,000 yuan, and so on. The court should have then simultaneously assessed them and analyzed whether integrating them would have constituted other serious circumstances. The two steps could not be omitted or reversed. Moreover, the court could have vividly embodied a volume of data, kinds of data and the value of data in the court’s decision. In a word, focusing on contextualization, the paratactic method should not downplay the sorts of personal data to measure how the sensitivity of personal data affects citizen’s fundamental rights.

B. Legislative Improvements

The interpretation of the Criminal Law is merely a short-term strategy. In the long term, this interpretation hardly helps pave the way for clearer criminal sanctions on data scraping, something that legislatures can further help clarify through legislation covering some provisions in China’s legal system. When legislations do not thoroughly resolve a potential overlap between cyber threats, there are many drawbacks of criminal sanctions on data scraping. Therefore, we would think that it is necessary to offer domestic solutions to China’s legislation learning from the US experience so that disputes over criminal sanctions on data scraping will be sufficiently addressed. Generally, these solutions can be split into two dimensions.

1. Amending the Term ‘Obtain the Data’. — We insist that the term ‘obtain the data’ should be amended adequately in the Criminal Law. In criminal legislation, the reason why criminal sanctions on data scraping raise greater controversies may derive from a lack of adequate and explicit explanations of data, let alone a concrete meaning for obtaining the data. According to paragraph 2 of article 285 of the Criminal Law, the term ‘intrusions’ essentially merges with the term ‘obtain the data’ in one article.

In an era of big data, it is imperative to separately regulate the unlawful acquisition of data. It is not only because interests in data should be emphasized, but because unlawfully obtaining the data represents a special nature of illegality distinct from that of intrusions. In this sense, there are roughly two ways to amend the term ‘obtain the data’, involving adding legal requirements to the original article in the Criminal Law and designing new articles in the Criminal Law.

Adding legal requirements to the original article is a relatively simple manner of amending the term ‘obtain the data’. To be specific, seemingly similar to obtaining the data, other types of acts can also be added to the original article. Although obtaining the data is one of a major component of acts in data scraping cases, courts are still unable to rule out the possibility of other similar acts emerging in data scraping cases. For instance, to some extent, possessing, copying and stealing the data are analogous to obtaining the data in that these acts are present when controlling the data as well as when infringing data hosts’ rights of disposition.

Generally speaking, the fundamental principle that violations of the law by newly added types of acts ought to be consistent with that of the existing misdeed (i.e., illegally obtain), should be adhered to when adding legal requirements to the original article that is chosen to amend the term ‘obtain the data’. In light of this view, what can be added to paragraph 2 of article 285 of the Criminal Law, will remain confined to the scope of undermining the safety of computer information systems rather than extend to the protection of data. It is a limitation of this kind.

Designing new articles is mainly applied to a condition where the protection of legal interests cannot be totally covered by the original article. Although little overlap exists between these articles, legislatures might well design new articles so that legislation can provide explicit and sufficient data protection. From the scope of the original article and the necessity of protecting the data itself, legislators will choose to design new articles. On the one hand, in most data scraping cases, obtaining the data is the main component of violations of legal interests. In other words, it is predominantly the acquisition of data that infringes data hosts’ rights, such as rights of possession, disposition and so on. On the other hand, legislatures must design new articles to combat data acquisition. As mentioned earlier, obtaining the data is apparently distinct from illegally intruding into computer information systems, leading to different degrees of downstream cyber threats like data leakage, data transmission, etc.

In light of these considerations, new articles may be designed along the lines of whoever steals, possesses or uses other means to obtain the data, if the circumstances are serious, should be sentenced to fixed-term imprisonment not more than three years or criminal detention, and/or be fined.

2. Creating References to Criminal Sanctions in Other Regulations. — Another choice to criminalize data scraping is to create references to criminal sanctions in other regulations. As Professor Ric Simmons from the US proposed that it is time to use administrative regulations to combat computer crimes because the CFAA has failed to effectively establish what constitutes ‘computer misuse’. Similarly, in China’s legal system, other regulations have many advantages over the Criminal Law because they can stipulate illicit acts, create separate rules in response to ever-changing data scraping, and pinpoint differences between administrative and criminal liability.

Other regulations have maintained similar articles in many other contexts. In some contexts (such as the Cybersecurity Law of the People’s Republic of China), an administrative article of illegally using information network is stipulated as article 46. In other contexts (such as the E-Commerce Law of the People’s Republic of China), an administrative article concerning infringing citizens’ personal information is stipulated as article 87.

In general, other regulations will be chosen when misconduct is complex or rapidly evolving, thus making courts more reasonable for facing data scraping cases. Particularly for administrative regulations, another distinct advantage of using them is that civil remedies and criminal sanctions can be easily distinguished, and different situations for punishments for each can also be enumerated. On a technical level, other regulations are expected to summarize specific types of data acquisition and assign proper levels of criminal sanctions relying on each situation. Specifically, what can be explained is how the diversity of data can shape different levels of punishment.

Take administrative regulations as an example, obtaining citizens’ sensitive data is more culpable than obtaining citizens’ insensitive data. Likewise, obtaining citizens’ private data is more culpable than obtaining citizens’ general data that does not cover privacy. The crime of illegally obtaining computer information system data does not develop a number of these different situations. Indeed, it is impossible to design any article in the Criminal Law that can make such distinctions, much less let these distinctions keep pace with ubiquitously updated technology. Therefore, making such distinctions by designing articles in other regulations is a feasible solution to criminal sanctions on data scraping.

V. CONCLUSION

Concerning the application of criminal provisions (i.e., paragraph 2 of article 285 of the Criminal Law), criminal sanctions on data scraping face many challenges in China’s legal system. First, many intrusions into computer information systems can be regarded as violations of national regulations, thus leading to broad criminalization of ordinary behaviour. Second, the analysis of data acquisition tends to be overlooked in criminal decisions. And finally, interpretations of the term ‘(other) serious circumstances’ fail to indicate why some scrapers should be sanctioned by the Criminal Law.

To properly analyze criminal sanctions on data scraping, the US emphasizes the measure of different values and adopts a contextual model to effectively explore what is hidden inside authorization provisions and damage or loss provisions under the CFAA, mainly including the interpretation and legislation. For the interpretation, concrete interpretations of the terms ‘without authorization’ and ‘exceeds authorized access’ are adopted based on the circumvention of code-based restrictions. Besides, the analysis of damage or loss provisions receives greater attention from scholars and justices. For the legislation, multifunctional provisions should be added to the CFAA to make it unambiguous.

The domestic analysis of criminal sanctions on data scraping is still at the nascent stage. Under this circumstance, it is necessary to learn from other countries. Among them, the US has relatively mature experience. As a result, it is time to adopt the contextualization-based individualized criminal sanctions on data scraping in China. And then we offer solutions to criminal sanctions on data scraping, involving interpretative strategies and legislative improvements. Specifically, the interpretative strategies entail the interpretation of the term ‘intrusions’ in different situations, and the specific interpretation of the aftermath of data scraping. The legislative improvements include amending the term ‘obtain the data’ and creating references to criminal sanctions in other regulations.

The article merely analyzes criminal sanctions on data scraping in the context of legal systems of China and the US. However, this does not mean that any other field of academic research should be excluded from analyses of this topic. On the contrary, especially with the advent of the AI age, it is the interdisciplinary research that is advantageous to advanced analysis involving computer (data) science, ethics, philosophy, etc. All these fields are also worthy of a widespread concern going forward.

上一篇： CHINA LEGAL SCIENCE 2020年第4期｜北极探险邮轮海事安全规则化的目标导向及制度设计

下一篇：没有了

友情链接