RETHINKING CRIMINAL SANCTIONS ON DATA SCRAPING IN CHINA BASED ON A CASE STUDY OF ILLEGALLY OBTAINING SPECIFIC DATA BY CRAWLERS
Li Qian & Jiang Tao
This chapter mainly places data scraping under the CFAA, and analyzes typical cases, especially linked to criminal sanctions. The first section describes the CFAA-related provisions. The second section analyzes three typical data scraping cases. The third section clarifies three measures to resolve those ever-challenging criminal sanctions on data scraping in the US.
In 1986, Congress enacted the CFAA. Three additional sorts of computer crimes were added under the CFAA: a computer fraud offense; an offense for the alteration, damage, or destruction of information; and an offense for the trafficking of unauthorized computer passwords in certain circumstances. In the next successive amendments, the CFAA underwent various degrees of changes and revisions including the creation of some offenses and expanding its coverage. Given numerous adjustments to the CFAA, which had less to do with criminal sanctions on data scraping, we concentrate on the adjustments, including key phrases of the CFAA (e.g., without authorization and exceeds authorized access) and elements of criminal sanctions.
The phrases ‘without authorization’ and ‘exceeds authorized access’ are described in section 1030 of title 18 in the United States Code. Under what is specifically prohibited, data scraping brought under the CFAA can be mainly regulated under sections 1030(a)(2), 1030(a)(4) and 1030(a)(5). Section 1030(a)(2) makes it a crime when a person intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains: (A) information contained in a financial record of a financial institution; (B) information from any department or agency of the US; or (C) information from any protected computer.
Under this section, Congress substituted the phrase ‘exceeds authorized access’ for the overly-long phrase ‘having accessed a computer with authorization, uses the opportunity such access provides for purposes to which such authorization does not extend’. The Senate Judiciary Committee reported that the substitution was to simplify the language in section 1030(a)(2) of title 18 in the United States Code. Similarly, the House Judiciary Committee also reported that its aim was to merely clarify the language in existing law. In addition to this substitution, another associated with the alteration of mens rea requirement in section 1030(a)(2) of title 18 in the United States Code was from knowingly to intentionally. It substantially changed the element of criminal sanctions when prosecutors prove that persons enter one’s computer system with clear intent, without appropriate authorization. As the Senate Judiciary Committee report concluded, this substitution is to focus federal criminal prosecutions on those whose conduct evinces clear intent to enter, without proper authorization, computer files or data belonging to another.
The revision of section 1030(a)(4) was analogical to that of section 1030(a)(2). It added a key requirement that a person accesses a protected computer without authorization or exceeds authorized access, knowingly and with intent to defraud, violating the computer fraud provisions. Meanwhile, the defrauding requirement renders section 1030(a)(4) a new type of crime. It makes it a crime when a person knowingly and with intent to defraud, accesses a protected computer without authorization, or exceeds authorized access, and by means of such conduct furthers the intended fraud and obtains anything of value, unless the object of the fraud and the thing obtained consists only of the use of the computer and the value of such use is not more than 5,000 US dollars in any 1-year period.
As for section 1030(a)(5), it prohibits the exfiltration of data induced by the conduct that is linked to accessing a protected computer without authorization, causing damage or loss. Section 1030(a)(5) punishes one who: (A) knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer; (B) intentionally accesses a protected computer without authorization, and as a result of such conduct, recklessly causes damage; or (C) intentionally accesses a protected computer without authorization, and as a result of such conduct, causes damage and loss.
Additionally, sections 1030(e)(8) and 1030(e)(11) respectively reveal the meaning of ‘damage’ and ‘loss’. All of these provisions will lay the groundwork for case studies below.
As mentioned earlier, data scraping cases tend to be brought under sections 1030(a)(2), 1030(a)(4) and 1030(a)(5), when brought under the CFAA. These sections have in common that they all require courts to interpret the phrases without authorization or exceed authorized access. How to reasonably interpret them has become necessary for courts to resolve similar cases and make correct judicial decisions. In an iconic article Twenty Years of Web Scraping and the Computer Fraud and Abuse Act in 2018, the US scholar Andrew Sellars roughly divided twenty years of data scraping cases into four phases of thinking around the critical question of when a scraper accesses a computer ‘without authorization’ or if it ‘exceeds authorized access’. We analyze three cases therein.
A good place to begin the analysis of scraping claims brought under the CFAA is the case EF Cultural Travel BV v. Zefer Corp. EF Cultural Travel and BV (hereinafter referred to as EF) and Explorica, Inc. (hereinafter referred to as Explorica) were competitors in the travel business. Zefer was a former employee in EF and later hired by Explorica. Zefer used a scraper tool to glean pricing data for two years from EF’s website. After receiving pricing data from Zefer, Explorica set its prices for the public, undercutting EF’s prices an average of five percent.
In the case of EF, the district court issued a preliminary injunction against Zefer based on the CFAA, holding that the use of the scraper tool was outside the scope of reasonable expectations of ordinary users. The district court did not foreclose the link between the use of the scraper tool and exceeded authorized access. Rather, it interpreted the phrase ‘exceeded authorized access’, using the reasonable expectations test. The defendant appealed and the court of appeals held that the reasonable expectations test was not the proper gloss for determining a lack of authorization for the purposes of the CFAA provisions setting forth the offense of fraudulently accessing a protected computer without authorization.
The second case is US v. Nosal. David Nosal was an employee in a company. When he left the company, he asked several people who were still working for the company to help him start a competing business. These people used their log-in credentials to download source lists, names and contact information from a confidential database on the company’s computer, and then transfer this information to Nosal. These people were privileged to access the database, but the company had a policy that banned the disclosure of confidential information.
As a consequence, Nosal was charged with violations of section 1030(a)(4)in title 18 of the United States Code, for abetting these people in exceeding their authorized access with purposes to defraud. The court held that the phrase ‘exceeds authorized access’ does not expand to the scope of violating use restrictions. Meanwhile, the court tried to make the phrase ‘exceeds authorized access’ clear, limited to violations of restrictions on accessing data.
The third case, Craigslist Inc. v. 3Taps Inc., needs to be examined. Craigslist operated a widely-used website that allows users to submit and browse classified advertisements. The defendant 3Taps aggregated and republished ads from Craigslist by copying content from its website. Craigslist adopted two remedial measures to block this conduct. The first was a cease and desist letter sent to 3Taps, showing that this conduct was unauthorized. The second was a configuration Craigslist built to obstruct deviant access from IP addresses.
In Craigslist Inc. v. 3Taps Inc., the parties agreed that 3Taps intentionally accessed a protected computer and copied data from it. The only controversy is whether 3Taps’ conduct was without authorization. In terms of the plain language, this conduct was without authorization when 3Taps continued to glean data from Craigslist’s website after Craigslist revoked its authorization to access the website. The Ninth Circuit interpreted without authorization as computer owners having the power to revoke the authorizations they grant. Therefore, when the permission for 3Taps was validly rescinded, 3Taps’ further conduct was ‘without authorization’.
The typical data scraping cases, we analyzed earlier, are merely the tip of the iceberg. There are more cases not included in the preceding analysis. From the typical data scraping cases, it can be concluded that the interpretation of the phrases ‘without authorization’, ‘exceeds authorized access’ and ‘damage or loss’, has received great attention from both academics and practitioners. In the US, academics and practitioners emphasize the measure of different values, striking a balance between use value and privacy protection, freedom of speech and informed consent, public access and technical support, etc. And then, they tend to adopt a contextual model to effectively explore what is hidden inside authorization provisions and damage or loss provisions under the CFAA. The contextual model mainly derives from privacy theory and practice in the US. Understanding privacy in specific contextual situations instead of seeking to illuminate an abstract conception of privacy is a hallmark of the contextual model. Furthermore, Professor Helen Nissenbaum proposed a famous theory of contextual integrity further rationalizing the contextual model, which tends to make privacy distribute and flow in a specific context as well as respect contextual norms of information flow. Over the past few years, although the contextual model has had different versions, the essence of this model still highlights how important it is to consider a specific context when academics and practitioners face privacy issues. According to the essence of the contextual model, the US gradually outlines the contextual model to criminal sanctions on data scraping. For the ease of discussion, we summarize the US experience in the contextual model to criminal sanctions on data scraping and expand on three measures (the first two are associated with interpretation and the last is associated with legislation).
Interpreting the CFAA arises in the context of a further point of academic interest in the US, whether the CFAA should apply and what the breadth of the CFAA should be have already become heated discussions. However, it may be clear that a web crawler accessing a computer (information) system or a relatively private website and scraping it for data, could violate the CFAA. Web crawlers can operate in violation of a terms-of-service agreement, and they desired to scrape data from specific areas by using servers’ resources, leading to the exfiltration of data and infringements on important benefits.
1. The Concrete Interpretation of Phrases Regarding Authorization. — Phrases regarding authorization (e.g., without authorization) should be concretely interpreted depending on various situations. To determine what they mean, courts often seek to ascertain whether specific data breaches are committed by hackers. Sometimes this is not easy. In a computer (information) system or on a website, an authorization notice determines what a user can browse and obtain. This situation presents an accepting standard: without hacking, a user accesses an area within the scope of authority, his access to that area is legitimate.
Additionally, if a user can access an area by credentials (e.g., username and password) warranting legitimate access within the scope of authority, his access is also valid. In contrast, concerning data breaches, when the access is described as without authorization, it typically means that black-hat hacking has occurred, or that a user does not have credentials to access an area, such as a computer (information) system. Based on these situations, the contextual model could be taken seriously on the ground that a concrete interpretation clarifies the landscape of phrases regarding authorization.
For one thing, a workable mechanism of distinguishing properly using data from illicitly accessing data can be adopted to make the interpretation concrete. In US v. Nosal, the Ninth Circuit did not exclude the possibility that hackers would be inside employees. On the contrary, the court interpreted without authorization in a way that applied to outside hackers who have no access to a computer. By this token, only interpreting authorization concretely conformed to the plain meaning of the statute. Meanwhile, as stated above, ‘exceeds authorized access’ is limited to access restrictions, not use restrictions. In terms of this consideration, the court’s focus was on technical means by which data was obtained. Professor Orin S. Kerr explicitly pointed out that courts limit access without authorization according to the circumvention of code-based restrictions rather than contract-based restrictions.
For another, the underlying principle of constitutionality can also play a vital role in concretely interpreting related phrases. When courts interpret the criminal law, the principle of constitutionality highlights that citizens’ acts exercising fundamental rights stipulated in the Constitution cannot be explained as a crime. The principle aims at exploring the scope of the CFAA provisions. Supposing that people do not adhere to the principle to concretely interpret related phrases, the uncertainty surrounding the scope of these phrases may result in arbitrarily criminalizing ordinary behaviour (some of them are beneficial to data sharing).
This principle was deemed a tool to explain how to concretely interpret ‘without authorization’ in Craigslist Inc. v. 3Taps Inc. By using it in favor of defendants, this case evinced that when the court interpreted authorization concretely, the defendants’ acts not causing damage or loss could be excluded from criminal sanctions. Similarly, Professor Kerr proposed that it requires courts to adopt a narrow interpretation of unauthorized access. Given this view, clear notice of what would impose criminal sanctions and what norms need to be complied with should be given to citizens. In a sense, adopting a contextual model that emphasizes the values of different situations will facilitate the concrete interpretation of phrases regarding authorization, making its scope relatively limited.
2. The Concrete Analysis of Conditions Regarding Damage or Loss. — Analyzing whether there is damage or loss and if any, how to assess it will outweigh the concern about authorization and access, especially when scrapers are invading data hosts’ interests in unprecedented ways. The aftermath of data scraping lies in not only unauthorized access or exceeded access, but downstream illicit acts (e.g., data leaks, illicit data transfers, and unjustified monetization of data). Among them, disputes over damage or loss are becoming increasingly crucial when it comes to interpreting the CFAA.
In EF Cultural Travel BV v. Explorica, Inc., the court paid closer attention to whether the data scraping can be attributed to damage or loss. The court noted that EF had suffered a loss because of scraping data by Explorica, according to a purpose that damage or loss is to target remedial expenses borne by victims that could not properly be considered direct damage caused by a computer hacker. Since EF had been compelled to implement specialized measures to ‘evaluate whether the website had been jeopardized’, they had suffered a loss. Although Explorica alleged that its act neither caused any physical damage nor placed any stress on EF’s website, the court insisted that EF indisputably suffered a detriment and a disadvantage by expending substantial sums to estimate the extent, if any, of the physical damage to the website caused by Explorica’s invasion. On the foregoing grounds, expenses of at least 5,000 US dollars resulting from such invasion are ‘losses’, and they can be attributed as damage or loss.
This case represents a tendency of courts to increasingly underscore the analysis of costs, helping interpret damage or loss in actual situations, rather than purely determining whether scrapers’ acts violate the authorization provisions. That is to say, courts should carefully scrutinize the basis of any CFAA ‘loss’ in data scraping cases to determine what, if any, harm actually occurred. Under this circumstance, the questions of what would count as costs, when costs would be regarded as damage or loss and how damage or loss would be eventually calculated in the criminal justice, consist of interpretations of the CFAA.
Perhaps more significantly, scholars of the US have been cognizant of these questions and shed light on them. For instance, the US scholars Zachary Gold and Mark Latonero proposed that when courts impose criminal sanctions on data scraping, there will be a policy dilemma involving the disclosure of a significant amount of metadata about users, threatening data security, invading the privacy of internet users, misusing data, etc. Furthermore, another US scholar, Jeffrey Kenneth Hirschey, argued that several pragmatic factors might be taken into account before bringing suits, such as user demand for data, potential benefits of scraped data and public relations implications.
Given these views, it can be found that the CFAA has been elevated to data protection law going beyond the anti-hacking law, basically remaining consistent with one of the most influential views revealing that the CFAA is a major piece of federal sector-specific legislation. In addition to interpreting the CFAA, the US experience also encompasses the legislation. The legislation will be later discussed around the amendment of both the CFAA and other related regulations.
3. Creating Multifunctional Provisions in the CFAA. — The CFAA should be amended to be unambiguous through the addition of multifunctional provisions. The CFAA defines ‘exceeds authorized access’, but fails to define the words authorization and access. Though damage or loss provisions are stipulated in the CFAA and explained in various ways, they are still thought to be ambiguous because of the absence of relatively certain rules of calculation. As Professor Kerr noted, at the time Congress passed the CFAA, it probably did not realize how complex these concepts were (and would become) in the computer context, resulting in a startling level of ambiguity. With the widespread use of cloud computing and automated technology, the complexity of this problem will remain more apparent.
As every principle has its exceptions, the CFAA should be amended to contain explicit exceptions, like the copying of data, reversing engineering and security research. The exceptions can be carved out to appropriately apply the CFAA to data scraping cases by modifying the CFAA provisions.
In addition to modifications to the phrases ‘exceeds authorized access’ and ‘without authorization’, inserting the phrase ‘and the fair market value of the information obtained exceeds 5,000 US dollars’ after ‘financial gain’ constitutes the exceptions of the damage or loss provision. Especially for the insertion after financial gain, it would tremendously clarify the meaning of the damage or loss provision because adopting the contextual model can shape what the damage or loss means. In particular, the damage or loss provision can be modified by (i) analyzing the nature of the data and its value; (ii) distinguishing between access controls which by default deny all traffic and those that by default admit all traffic; and (iii) analyzing the reasonableness of incurring expenses under the circumstances. By doing so, courts not only avoid punishing innocuous activity but incent proper web security.
As mentioned earlier, we obtain the US experience and conclude what is valued most when the US tackles issues over data scraping. In this sense, we propose that it is time to adopt the contextualization-based individualized criminal sanctions on data scraping in China. In Chapter IV, we will offer domestic solutions to criminal sanctions on data scraping, based on this application, involving interpretative strategies and legislative improvements.
In China’s criminal justice system, it is necessary to apply the contextualization-based individualized principle to reasonably interpret related provisions. Particularly when courts face the relationship between data scraping cases and the crime of illegally obtaining computer information system data, legal interests and the term ‘serious circumstances’ need to be considered to reinforce interpretative strategies. In this sense, we will also offer solutions to criminal sanctions that data scraping may embody.
1. The Interpretation of the Term ‘Intrusions’ in Different Situations. — It is easier to comprehend that if someone is charged with the crime of illegally obtaining computer information system data, the three key terms ought to be met: ‘intrusions’, ‘obtain the data’ and ‘serious circumstances’. Due to the loss in the form of a specific quantity, if the requirement for this quantity is met in the context of the above judicial interpretation, there is not any requirement for what kind of data a scraper must obtain. Hence, the term intrusions is a central focus of this crime.
On the one hand, not every ‘bypassing restriction on access’ can constitute the term intrusions. Under the legislative goal of this law, it can be understood that it is to penalize anybody who breaks through the hosts’ safeguards by hacking. As the CFAA is an anti-hacking provision enacted to prohibit hacking to illicitly access data, the legislative goal of this law in China is analogous to that of the CFAA provisions in the US. It thus appears that an illegal intrusion refers to a situation where access without authorization bypasses an authentication gate. In order to render access as being without authorization, a website must be configured to not respond to every request.
This requires not just any type of code-based limitation, but a code-based authentication gate that lets only authorized users in and keeps unwanted individuals out. In this instance, an IP address block or a restriction on browsing frequency is not a barrier to access. It is because an IP address block only restrains specific sorts of browsing without preventing specific individuals from browsing a computer information system. More significantly, the scraped data tends to be publicly available, and the availability is indispensable to how cyberspace works. Otherwise, curbing it may stifle the free flow of data (the free flow of data is encouraged and warranted by principles of constitutional law), and eventually lower users’ motivations for widely sharing data with others.
On the other hand, even if the access without authorization satisfies the requirement of bypassing an authentication gate, it should not be taken for granted that this access is bound to be an illegal intrusion in this crime. In most instances, the access without authorization, satisfying the requirement of bypassing an authentication gate, constitutes the term intrusions in this crime. However, there are a few exceptions, when bypassing an authentication gate does not reach a degree of serious circumstances.
The degree of serious circumstances, which will be discussed later, not only refers to the term ‘obtain the data’, but the term ‘intrusions’. It signifies that a certain act appears to bypass an authentication gate, but if it causes remarkably minor and no serious harm, it is not deemed a crime. For example, imagine a scraper unconsciously breaks into a website or a computer information system, bypassing an implied authentication gate, and then withdraws from the access as soon as possible without triggering any physical damages. Based on the above reasons, we insist that this access should not be an illegal intrusion in this crime.
Due to the inadvertent access, the contextualization-based individualized principle can be applied to exclude those situations where scrapers do not incur any physical damages. One of the most visible merits of this principle is in favour of the scraper when the meaning of the term intrusions is relatively broad. As Craigslist Inc. v. 3Taps Inc. mentioned earlier, courts narrowly interpret authorization, giving rise to the exclusion from criminal liability for the defendants’ acts not causing damage or loss.
2. The Specific Interpretation of the Aftermath of Data Scraping. — Instead of intrusions, data scraping results in data copying, data leaks and data transmission. Data copying is directly associated with data scraping. Data breach and data transmission are often regarded as downstream data breach. In China’s criminal justice system, these three types of acts are extremely common, having cascading effects on the interests of different data hosts. However, criminal justice tends to overlook this reality, leaving many questions unanswered. What is striking about these questions is not whether they need to be emphasized, but rather how they need to be addressed. Viewed in this light, the interpretation of the aftermath of data scraping is the core of these questions.
On one side, the fundamental parameters of the term ‘data’ should be underscored. The crucial lesson from the US experience is that analyzing data scraping cases entails determining whether affected details of scraped data trigger damage or loss provisions. To handle it, the US courts wield significant power in assessing how much work they must do to look back and forth between facts of cases and legal requirements. The experience motivates China’s courts to shed light on analyzing it by tailoring the intensity of the analysis of the aftermath of data scraping.
Taking a data scraping case as an example, the key reason why the court convicted three scrapers of the crime of illegally obtaining computer information system data is the dominant analysis of the term ‘intrusions’. The court only mentioned that the three scrapers obtained customer data and then made an illegal profit of 37,000 yuan. As a matter of fact, the court was particularly well poised to undertake a role in analyzing whether fundamental parameters of the term ‘data’ were harmed or how the fact could be interpreted as the term ‘serious circumstances’. For instance, the volume of data was fairly large, and data sensitivity to customers was incredibly high. Additionally, the timing of data acquisition was particularly long and scraped data was illicitly frequently used. All of these situations can confirm whether the scrapers violated the term ‘serious circumstances’ and whether the scrapers would be convicted of the crime. As a Chinese professor proposed that it is incredibly vital to analyze the unlawfulness of data scraping by crawlers and related criminal sanctions in a specific context. In a sense, this demonstrates that the contextualization-based individualized criminal sanctions will be conducive to the discovery of certain parameters of the term ‘data’.
In this regard, a Chinese scholar proposed that especially for the interpretation of the term ‘other serious circumstances’, volumes, times, means and latent risks of obtaining the data are at least involved to evaluate whether infringements of legal interests and accountability reach a degree required for criminal sanctions. We hold that this viewpoint offers a solution to interpret related phrases when it comes to data scraping cases. In a broader sense, other Chinese scholars claimed that, in an era of big data, it is essential to comprehend data itself to strengthen the understanding of the two terms ‘intrusions’ and ‘obtain the data’. In this way, in order to adequately protect the legal interests of this law, interpreters can embrace a notion of data, quite distinct from that of data only confined to computer (information) system. In a word, what needs to be implemented is that fundamental parameters of the term ‘data’ should be explicitly analyzed during the process of interpreting related provisions when it comes to data scraping cases.
On the other side, how these parameters of the term ‘data’ affect criminal sanctions on data scraping should also be underlined in the court’s decision. To some extent, criminal sanctions on data scraping rely on both static parameters of the term ‘data’ and dynamic combinations of these parameters. In this sense, how to combine them indeed plays an important role in criminal sanctions on data scraping. Now that data scraping may have some negative effects, such as easily acquiring large amounts of data, leading to data leakage, affecting the legal uses of data, etc., what will courts do to present these negative effects? What analysis method will be chosen to clarify them? For these questions, truly, courts need to pick out parameters of the term ‘data’ which may affect criminal sanctions, so that courts will have an understanding of what is expected of them and how they rightly apply the Criminal Law to data scraping cases. Based on the contextualization-based individualized criminal sanctions, we expand on how to specifically interpret serious circumstances in the crime of illegally obtaining computer information system data, involving a multiplicative method and a paratactic method.
The multiplicative method is used when scrapers repeatedly intrude into computer (information) systems and then obtain the data, or they are in the system for a long time. This method is conducive to comprehensively measuring risk and degrees of intrusions, and obtaining the data. In the Internet era, we need to analyze parts of elements of facts which can generate new ones through the multiplicative method. For example, a scraper intruded into a computer (information) system and obtained specific data. The scraper repeated these acts five times afterwards, and each time obtaining almost the same volume of data. Assuming that the volume of data obtained by the scraper each time did not constitute serious circumstances, did a total of a volume of data collected by the scraper in five iterations constitute (other) serious circumstances?
To answer this question, it is feasible to use the multiplicative method to measure degrees of risk in this act. That is, courts can determine whether this act constitutes (other) serious circumstances mainly by calculating times multiply a volume of data each time. Certainly, each case has its peculiarity, and it is difficult to take a one-size-fits-all way when measuring degrees of risk in related acts. This method may be customizable in specific cases because multi-factor situations, like frequent intrusions into computer (information) systems, may also give rise to (other) serious circumstances.
The paratactic method is used when a scraper intrudes into a computer (information) system and obtains data, and elements of this act are in a parallel place. The parallel place refers to a condition where the elements of this act cannot be multiplied, but can be in a parallel place and simultaneously assessed. Emphasizing contextualization and applying them to the paratactic method, it will identify, analyze, evaluate, treat and review the risk triggered by data scraping.
We take the case Beijing ByteDance Networking Technology Co., Ltd. as an example. The court should have enumerated several elements of the act in a parallel place, such as illegally intruding into computer information via the ‘tt_spider’ files, obtaining a volume of data, a loss of technical service fees of 20,000 yuan, and so on. The court should have then simultaneously assessed them and analyzed whether integrating them would have constituted other serious circumstances. The two steps could not be omitted or reversed. Moreover, the court could have vividly embodied a volume of data, kinds of data and the value of data in the court’s decision. In a word, focusing on contextualization, the paratactic method should not downplay the sorts of personal data to measure how the sensitivity of personal data affects citizen’s fundamental rights.
The interpretation of the Criminal Law is merely a short-term strategy. In the long term, this interpretation hardly helps pave the way for clearer criminal sanctions on data scraping, something that legislatures can further help clarify through legislation covering some provisions in China’s legal system. When legislations do not thoroughly resolve a potential overlap between cyber threats, there are many drawbacks of criminal sanctions on data scraping. Therefore, we would think that it is necessary to offer domestic solutions to China’s legislation learning from the US experience so that disputes over criminal sanctions on data scraping will be sufficiently addressed. Generally, these solutions can be split into two dimensions.
1. Amending the Term ‘Obtain the Data’. — We insist that the term ‘obtain the data’ should be amended adequately in the Criminal Law. In criminal legislation, the reason why criminal sanctions on data scraping raise greater controversies may derive from a lack of adequate and explicit explanations of data, let alone a concrete meaning for obtaining the data. According to paragraph 2 of article 285 of the Criminal Law, the term ‘intrusions’ essentially merges with the term ‘obtain the data’ in one article.
In an era of big data, it is imperative to separately regulate the unlawful acquisition of data. It is not only because interests in data should be emphasized, but because unlawfully obtaining the data represents a special nature of illegality distinct from that of intrusions. In this sense, there are roughly two ways to amend the term ‘obtain the data’, involving adding legal requirements to the original article in the Criminal Law and designing new articles in the Criminal Law.
Adding legal requirements to the original article is a relatively simple manner of amending the term ‘obtain the data’. To be specific, seemingly similar to obtaining the data, other types of acts can also be added to the original article. Although obtaining the data is one of a major component of acts in data scraping cases, courts are still unable to rule out the possibility of other similar acts emerging in data scraping cases. For instance, to some extent, possessing, copying and stealing the data are analogous to obtaining the data in that these acts are present when controlling the data as well as when infringing data hosts’ rights of disposition.
Generally speaking, the fundamental principle that violations of the law by newly added types of acts ought to be consistent with that of the existing misdeed (i.e., illegally obtain), should be adhered to when adding legal requirements to the original article that is chosen to amend the term ‘obtain the data’. In light of this view, what can be added to paragraph 2 of article 285 of the Criminal Law, will remain confined to the scope of undermining the safety of computer information systems rather than extend to the protection of data. It is a limitation of this kind.
Designing new articles is mainly applied to a condition where the protection of legal interests cannot be totally covered by the original article. Although little overlap exists between these articles, legislatures might well design new articles so that legislation can provide explicit and sufficient data protection. From the scope of the original article and the necessity of protecting the data itself, legislators will choose to design new articles. On the one hand, in most data scraping cases, obtaining the data is the main component of violations of legal interests. In other words, it is predominantly the acquisition of data that infringes data hosts’ rights, such as rights of possession, disposition and so on. On the other hand, legislatures must design new articles to combat data acquisition. As mentioned earlier, obtaining the data is apparently distinct from illegally intruding into computer information systems, leading to different degrees of downstream cyber threats like data leakage, data transmission, etc.
In light of these considerations, new articles may be designed along the lines of whoever steals, possesses or uses other means to obtain the data, if the circumstances are serious, should be sentenced to fixed-term imprisonment not more than three years or criminal detention, and/or be fined.
2. Creating References to Criminal Sanctions in Other Regulations. — Another choice to criminalize data scraping is to create references to criminal sanctions in other regulations. As Professor Ric Simmons from the US proposed that it is time to use administrative regulations to combat computer crimes because the CFAA has failed to effectively establish what constitutes ‘computer misuse’. Similarly, in China’s legal system, other regulations have many advantages over the Criminal Law because they can stipulate illicit acts, create separate rules in response to ever-changing data scraping, and pinpoint differences between administrative and criminal liability.
Other regulations have maintained similar articles in many other contexts. In some contexts (such as the Cybersecurity Law of the People’s Republic of China), an administrative article of illegally using information network is stipulated as article 46. In other contexts (such as the E-Commerce Law of the People’s Republic of China), an administrative article concerning infringing citizens’ personal information is stipulated as article 87.
In general, other regulations will be chosen when misconduct is complex or rapidly evolving, thus making courts more reasonable for facing data scraping cases. Particularly for administrative regulations, another distinct advantage of using them is that civil remedies and criminal sanctions can be easily distinguished, and different situations for punishments for each can also be enumerated. On a technical level, other regulations are expected to summarize specific types of data acquisition and assign proper levels of criminal sanctions relying on each situation. Specifically, what can be explained is how the diversity of data can shape different levels of punishment.
Take administrative regulations as an example, obtaining citizens’ sensitive data is more culpable than obtaining citizens’ insensitive data. Likewise, obtaining citizens’ private data is more culpable than obtaining citizens’ general data that does not cover privacy. The crime of illegally obtaining computer information system data does not develop a number of these different situations. Indeed, it is impossible to design any article in the Criminal Law that can make such distinctions, much less let these distinctions keep pace with ubiquitously updated technology. Therefore, making such distinctions by designing articles in other regulations is a feasible solution to criminal sanctions on data scraping.
Concerning the application of criminal provisions (i.e., paragraph 2 of article 285 of the Criminal Law), criminal sanctions on data scraping face many challenges in China’s legal system. First, many intrusions into computer information systems can be regarded as violations of national regulations, thus leading to broad criminalization of ordinary behaviour. Second, the analysis of data acquisition tends to be overlooked in criminal decisions. And finally, interpretations of the term ‘(other) serious circumstances’ fail to indicate why some scrapers should be sanctioned by the Criminal Law.
To properly analyze criminal sanctions on data scraping, the US emphasizes the measure of different values and adopts a contextual model to effectively explore what is hidden inside authorization provisions and damage or loss provisions under the CFAA, mainly including the interpretation and legislation. For the interpretation, concrete interpretations of the terms ‘without authorization’ and ‘exceeds authorized access’ are adopted based on the circumvention of code-based restrictions. Besides, the analysis of damage or loss provisions receives greater attention from scholars and justices. For the legislation, multifunctional provisions should be added to the CFAA to make it unambiguous.
The domestic analysis of criminal sanctions on data scraping is still at the nascent stage. Under this circumstance, it is necessary to learn from other countries. Among them, the US has relatively mature experience. As a result, it is time to adopt the contextualization-based individualized criminal sanctions on data scraping in China. And then we offer solutions to criminal sanctions on data scraping, involving interpretative strategies and legislative improvements. Specifically, the interpretative strategies entail the interpretation of the term ‘intrusions’ in different situations, and the specific interpretation of the aftermath of data scraping. The legislative improvements include amending the term ‘obtain the data’ and creating references to criminal sanctions in other regulations.
The article merely analyzes criminal sanctions on data scraping in the context of legal systems of China and the US. However, this does not mean that any other field of academic research should be excluded from analyses of this topic. On the contrary, especially with the advent of the AI age, it is the interdisciplinary research that is advantageous to advanced analysis involving computer (data) science, ethics, philosophy, etc. All these fields are also worthy of a widespread concern going forward.