Author’s Note*


There is an astounding amount of data generated every moment.[1] While the use of data has always been essential for a well-functioning market economy, the current scale, scope, and speed at which data is collected, organized, and analyzed is unprecedented—leading to the apt label of “big data.”[2] Importantly, an entire infrastructure has been built around data including cloud computing, machine learning, artificial intelligence, and the 5G wireless network.[3]

In this chapter, we discuss the role of big data in antitrust with a particular focus on the rise and success of large digital platforms—as the importance of data has caught the attention of various reports on the digital economy[4] and also raises issues involving privacy.[5] We begin with an economic overview of data and what sets it apart from other firm assets. In this discussion, we will also assess the question of whether big data represents a sizeable barrier to entry that hinders the ability of entrants to compete on equal footing.[6]

Next, we discuss data in the context of network effects. While network effects traditionally are associated with a network of people, some have hypothesized that the advantages that data confers to platforms can also be framed as type of network effect. The idea is that data creates a feedback loop of more data, which ultimately results in an impregnable bulwark against competition. We aim to demonstrate that, while there are commonalities between using data and network effects, there are important differences and distinctions worth highlighting.

We also explore a number of relevant legal considerations involving big data and antitrust. Generally, should courts administer cases that involve big data differently? Should there be a stronger presumption of market power when a large platform possesses big data? On the other side of the coin, can combining big data assets result in merger-specific efficiencies?

Finally, big data is increasingly discussed as a potential remedy for competition problems involving platforms. Proposals range from allowing users to more easily port their data across platforms to forced data sharing and interoperability.[7] We examine the incentive effects from imposing such remedies and potential unintended consequences.[8] Ultimately, these proposals are putting the cart before the horse, as remedies without an actual showing of an antitrust violation is not antitrust enforcement but sector regulation.

I. Understanding Big Data

There is a long tradition of using data to increase efficiency and create value. Mid-twentieth century grocery giant A&P used customer data, inter alia, to discern regional preferences and to forecast demand in order to reduce food waste.[9] An article from Harper’s Magazine in the late 1980s used the term “big data” to refer to the variety of customer data lists used to increase the effectiveness of direct mail advertisements.[10] The article found that direct mail services had access to data from credit-reporting agencies, magazine subscriptions, catalog purchases, customer questionnaires, real estate records, and other sources that could sketch a picture of the likely preferences of millions of individuals.

Today, continuing in the tradition of A&P, Walmart analyzes over 2.5 petabytes of data every hour in its Data Café in Bentonville.[11] In addition to helping it personalize advertisements, Walmart’s data analysis helps it run pharmacies more efficiently, speed up checkout, manage its supply chain, and optimize placement on store shelves.[12]

Yet, the sense that advances in data science are leading to an “overwhelming deluge of information” is not a new phenomenon.[13] That feeling traces back, at least, to the “avalanche of numbers” that occurred in the 1820s and 1830s, when national governments began accumulating more data than ever before to classify and tabulate information about the population in an attempt to improve governance.[14] Sociologist David Beer explains that, while “features of the current data movement are in some ways novel,” data has been scaling up for hundreds of years.[15]

Not only do firms have more data than ever, there are increasingly sophisticated methods for analyzing it including the use of machine learning and artificial intelligence. Routinely, firms collect and store data from incongruent sources such as website traffic, credit card purchases, smart phones and wearables, and numerous other data streams to create a profile of their user base.[16] These advances benefit consumers by, among other things, allowing firms to improve products and lower costs, and enabling the growth of the free-to-consumers model widely employed in the online space.[17] The growth in the importance of data has also led to concerns that incumbent firms’ access to data can create barriers to entry in many industries. We address these issues below.

A.  Role of Big Data in the Production Process

In digital markets, while there is little doubt that big data represents an important factor of production, it is not an end in and of itself.[18] Just as labor, innovation, capital and entrepreneurial skill differentiate firms, so too does the ability to turn “big data” into something useful. For example, a website might track and store every click that its users make while on the site. This data, in turn, could allow the website to better tailor advertisements to a specific user’s interests; determine which features to promote and which features to drop; and improve the overall design of the site.

As in other areas of business, some firms are more proficient than others to use and organize assets, as the value of big data can only be unlocked when combined with other inputs. Thus, rather than simply assuming that the sheer volume of data is what explains outcomes, it is also worth considering whether it is the skill and talent needed to combine the data with other inputs to produce something of value which differentiates firms.

Firms are also differentiated in the characteristics of their final product and in the mix of inputs used in their production processes. Thus, to produce a given level of quality, one firm might use a mix of big data, intellectual property, and highly skilled labor, while another firm might achieve a similar level of quality, using the same set of ingredients but in different proportions—relying on its particular comparative advantage. Professor Harold Demsetz observed that conditions frequently considered barriers to entry, such as scale economies, capital requirements, and advertising expenditures, are not the fundamental source of barriers; the fundamental barriers are rather the cost of information and the uncertainty that an entrant has to overcome.[19] In other words, it is not big data per se that represents the barrier to entry, but rather what big data helps a firm accomplish. This point is consistent with the observation that, over time, incumbents inevitably change how they combine their various inputs to achieve their levels of output and quality. For instance, an incumbent might have originally entered with a lower cost curve, a superior algorithm, or a valuable patent but, over time, to improve its product, it uses big data more extensively than when it first entered the market. Similarly, an entrant might initially operate with “small” or “medium” data but improve quality over time as its installed base of users grows.[20]

The larger point is that the use of big data is not, in of itself, an indication that big data is required to compete effectively or that having significantly more data results in an insurmountable competitive advantage.[21] Consequently, for the purpose of competition policy, it is important to consider why a product produced with the aid of big data might be successful. For example, an innovative algorithm could be the primary reason a product is successful in gaining and retaining consumers. Superior design can also make the difference between a successful product and an unsuccessful one. Even if the use of big data is the primary reason for a firm’s success, a relevant question is whether comparable, but not necessarily equivalent, data are costly to acquire.

B.  Is Big Data a Barrier to Entry?

The prior discussion leads one naturally to a question of whether big data is itself a “barrier to entry.” While a useful shorthand, it is not always entirely clear what constitutes a barrier to entry—as the term has a long history in economics with various scholars emphasizing different aspects of entry in order to create a definition. Below, we briefly summarize some of the key developments.[22] Ultimately, rather than focus on definitions per se, it is more pertinent to directly analyze the specific entry conditions for each relevant market; although, walking through the key developments can give useful insights into that analysis.

In the 1950s, Professor Joseph Bain defined barriers to entry as structural factors that allow incumbents to persistently price above the competitive level without incurring the threat of entry.[23] Thus, Bain considered the following to be examples of barriers to entry: economies of scale that require large capital expenditures, product differentiation, and absolute cost advantages. About a decade later, Professor George Stigler considered barriers to entry as costs that an entrant must incur but incumbents do not.[24] Examples include patents and grandfathered government regulations but not economies of scale to the extent that an entrant has access to the same cost function. The appeal of Stigler’s definition is its recognition that incumbents can earn supra-normal profits over the long-term only if they have some persistent advantage over potential rivals. What is missing from both Bain and Stigler, however, is an assessment of welfare.

About a decade after Stigler, both Professors Franklin Fisher and C.C. von Weizsäcker filled this void with normative definitions that incorporate social welfare.[25] Fisher found “a barrier to entry exists when entry would be socially beneficial but is somehow prevented.”[26] Similarly, von Weizsäcker explained, “a barrier to entry is a cost of producing which must be borne by a firm which seeks to enter an industry but is not borne by firms already in the industry and which implies a distortion in the allocation of resources from the social point of view.”[27] If economies of scale can increase overall welfare and we associate entry barriers with inefficiencies, then, von Weizsäcker asks, “in which sense can we speak of a barrier to entry?”[28] According to Fisher, “the right issue is not whether there are barriers to entry into the production of a particular mousetrap, but whether there are barriers to entry into innovation in mousetraps.”[29]

The tension in defining barriers to entry is that there are really two ways in which the term is discussed in the context of antitrust. As Professor Dennis Carlton clarifies, “Trying to use ‘barriers to entry’ to refer to both the factors that influence the time it takes to reach a new equilibrium and to whether there are excess long-run profits is confusing.”[30] Therefore, for the purpose of competition policy, Carlton recommends that “rather than focusing on whether an entry barrier exists according to some definition, analysts should explain how the industry will behave over the next several years . . .  [which] will force them to pay attention to uncertainty and adjustment costs.”[31]

Consequently, we find that it is best to avoid suggesting that big data is or is not a barrier to entry.[32] Rather, the use of data is one potential factor when examining “the timeliness, likelihood, and sufficiency of entry efforts an entrant might practically employ.”[33] There are clearly impediments that an entrant must overcome in order to compete effectively. Common examples include regulatory compliance costs, expenditures on specialized equipment, developing intellectual property, and hiring skilled labor. Some obstacles are nominal. Some obstacles are substantial. Attempting to classify these impediments as entry barriers or not creates the confusion mentioned by Carlton.

It is also relevant to note that big data is not an exogenous factor that dictates the number of firms in a market, which in turn determines the degree of competition and the rate of return. Rather big data is endogenous, as are other dimensions of non-price competition.[34] For instance, if a firm invests heavily in research and development, which allows it to introduce a new product or to substantially improve an existing product, we would not normally view this as anticompetitive conduct or even conduct that ultimately leads to anticompetitive results. Rather, we would consider investment in innovation to be procompetitive. Similarly, investments in big data can create competitive distance between a firm and its rivals, including potential entrants, but this distance is the result of a competitive desire to improve one’s product. Moreover, the observation that a firm is making large margins gives no indication whether this reflects supra-competitive pricing if we properly consider the rate of return required over the whole production process, including its investment in big data.[35]

Even if big data represents an important component to the success of an incumbent, entrants can differentiate their products along other dimensions important to consumers. For instance, a grocery store entrant might focus more on carrying locally made produce or products that cater to specific diets. An online firm might focus more on building network effects or greater integration with complementary products rather than the use of big data. Additionally, a social media platform could focus narrowly on a particular format and demographic in order to expand its user base.[36] As Tucker & Wellford explain, “[t]he fact that some established online firms collect a large volume of data from their customers or other sources does not mean that new entrants must have the same quantity or type of data in order to enter and compete effectively . . . [L]ack of asset equivalence should not be a sufficient basis to define a barrier to entry.”[37]

In assessing the prospect of entry into any given market, an important consideration is the actual history of entry in that market.[38] The evidentiary value of prior instances of entry, however, depends upon the extent to which current entry conditions are similar to prior entry conditions. The fact that entry occurred previously does not establish that entry is currently easy. On the other hand, one has to be cautious before inferring that long periods without entry implies that entry would not occur if incumbents raised prices or reduced quality and innovation.

With these caveats in mind, we highlight some recent entry episodes, where the entrants successfully overtook incumbents with arguably big data. While the examples of Google disrupting Yahoo and Facebook displacing MySpace are well-documented,[39] they are far from the only instances of an incumbent with a seemingly significant big data advantage losing market share to a newcomer.

Apple’s iTunes is an example of a once powerful incumbent that was dethroned from its market leading position. Started in 2001, iTunes sold digital copies of songs, and eventually audiobooks, eBooks, movies, and television shows.[40] By 2010, Apple’s iTunes had roughly 70 percent of the U.S. online music market.[41] With years in the market, along with the massive sales of Apple iPhones, iPods, and Mac computers, the company inevitably amassed large amounts of data.

That, however, did not stop Spotify from entering the market in 2008.[42] Along with others, Spotify changed the online music industry by offering a music streaming service, where customers could listen free of charge with advertisements or pay for a premium service with the ability to download songs and make their own playlists.[43] One might argue that Apple’s iTunes is not exactly the same product as Spotify’s streaming service; consequently, it is not an apples-to-apples comparison. However, Spotify’s differentiation from iTunes actually proves the larger point that markets evolve in ways that are not easily forecasted. Apple’s recent decision to discontinue iTunes in favor of its Apple Music product, among others, further proves the point. Spotify’s pivot towards a streaming service caused rivals to launch similar services to more effectively compete, such as, Apple Music and Amazon Music. Thus, despite being a relative newcomer in the market, and taking on the incumbent Apple, Spotify currently has double the number of subscribers compared to Apple Music.[44]

Similarly, at the start of 2010, Internet Explorer and Firefox had a combined 86 percent market share in web browsers. Yet, between 2011 and 2012, Chrome overtook both Firefox and Internet Explorer as the predominant web browser even though Chrome had a 6 percent share at the start of 2010.[45] Given that web browsers collect a tremendous amount of data on user behavior,[46] under a big data-centric view of entry barriers, Firefox and Internet Explorer should never have relinquished their market leading positions.

These episodes should perhaps not be surprising as Professors Leslie Chiou and Catherine Tucker find little empirical evidence that the possession of historical data provides an advantage to firms, in terms of their market shares.[47] As Schepp & Wambach point out, “[T]he origin of many innovative start-ups illustrates that companies with smaller but possibly more specialized datasets and analytical expertise may be able to challenge established companies.”[48] Importantly, referencing these prior episodes and research are not intended to suggest that data cannot confer a competitive advantage to digital platforms; yet, there should be no presumption that it creates an insurmountable barrier to entry and the associated conclusion that there is a lack of competition due to data.

II. Data Driven Network Effects

We next move to a discussion of data in the context of network effects—as there is a view that the advantages of big data can be framed as a data-driven network effect.[49] The idea is that the collection and use of data creates a feedback loop of more data,[50] which ultimately insulates incumbent platforms from entrants who, but for their data disadvantage, might offer a better product. We aim to establish that, while there are commonalities between using data and network effects, there are important differences and distinctions that should be considered.

The data-driven network effect argument is summarized in numerous reports on the digital economy.[51] The central idea is that having more data begets more data—all the while improving the quality of the platform for all the groups involved including users and advertisers, which causes them to use the platform more. The apparent concern from the reports is that these data advantages create an unstoppable snowball effect, which prevents or hinders new entry since entrants cannot hope to match the quality of incumbents due to their data advantages.[52]

While this theory of incumbent strength is intuitively appealing, there are a number of relevant points to consider as it applies to antitrust enforcement and policy. First, assuming arguendo that this effect exists, it is entirely based on increased platform quality and innovation. While an increased level of quality and innovation makes life considerably more difficult for rivals, this is an example of meritorious competition that antitrust laws should not be disincentivizing. Further, the primary effect will be on undifferentiated entry, that is, entry based on trying to exactly replicate the incumbent’s product, rather than differentiated entry, which relies less on the type of data that the incumbent has an advantage over.

This also puts a finer point on von Weizsäcker’s work on barriers to entry.[53] What is the point of defining barriers to entry if welfare-enhancing activities like improving a product now constitutes a “barrier to entry”? What purpose does the definition serve? Such a broad, and indiscriminate, use of the term does not differentiate between welfare-enhancing or reducing activities. It would seem that all activities that create a competitive advantage become a barrier to entry including innovations from R&D; learning by doing and trade secrets;[54] and hiring skilled labor and training them to increase productivity. We could add the development and maintenance of strong brand names and lower marginal costs. It seems that every conduct or practice that separates a firm from the pack could be classified as a barrier to entry and, consequently, would imply a poor outcome for markets.[55]

Second, while no one would seriously dispute that having more data is better than having less, the idea of a data-driven network effect is focused too narrowly on a single factor improving quality. As mentioned in supra Section I.A, there are a variety of factors that enter a firm’s production function to improve quality. For example, by all accounts, Google’s own entry into search and its ability to displace the market leading incumbents Yahoo Search and Alta Vista was due to the superiority of its “PageRank” algorithm.[56] In a nutshell, the problem with data-driven network effects is that it presumes that having more data results in market success or, alternatively, incumbents enjoy their market position due to their size and data advantage.

Relatedly, even if data is primarily responsible for a platform’s quality improvements, these improvements do not simply materialize with the presence of more data—which differentiates the idea of data-driven network effects from direct network effects.[57] A firm needs to intentionally transform raw, collected data into something that provides analytical insights.[58] This transformation involves costs including those associated with data storage, organization, and analytics, which moves the idea of collecting more data away from a strict network effect to more of a “data opportunity.”[59] This data opportunity is a production opportunity based on residuals of consumption.[60] Firms that can take advantage of that opportunity and invest in innovation and quality achieve a competitive advantage—yet, as with all investments, there is no certain return.

Third, data has diminishing returns.[61] While it is ultimately an empirical and case-specific question regarding when diminishing returns sets in and its severity, this consideration should be a factor in all cases where data plays a major role. Additionally, depending on the specific market and platform, there are large, commercially available datasets on users,[62] which can supplement and bolster firm-specific data. Relatedly, there is an ever-increasing amount of “open data” that is being made available, which is public and private data that is available for use without an explicit license.[63] Perhaps one of the most important sources of open data is existing social media posts.[64] In short, there is a massive amount of data available, both free and paid, to complement firm-specific data.

III. Relevant Legal Aspects of Big Data

In this section, we explore a number of relevant legal considerations involving big data and antitrust. Generally, should courts administer cases that involve big data differently? Specifically, does the presence of big data imply a set of presumptions within the rule of reason framework—including a presumption of market power? For mergers, does the combination of big data assets result in cognizable efficiencies?[65]

Given the state of the current evidence—including a lack of a large body of prior cases and agency actions involving a central role for big data issues[66]—we believe there is an insufficient basis for having an antitrust presumption involving big data. Of course, this could change if the evidence mounts either in one direction or another. Yet, as has been previously discussed, big data is fundamentally about innovation—as a firm cannot gain a competitive advantage without some degree of effort and ingenuity to turn raw data into something that provides value. Thus, the presence of big data is more naturally categorized in a manner similar to R&D and other innovative activities. To the extent that these activities make it difficult for rivals to compete, then that is a potential consideration in terms of the degree of antitrust market power and the durability of that market power[67]—but it does not, in and of itself, suggest competitive harm.

Additionally, reflexive labelling of big data as a “barrier to entry” provides little guidance to courts as the term has no clear meaning in antitrust law.[68] A brief survey of recent antitrust cases confirms this observation that the current treatment of entry barriers remains relatively perfunctory and lacking in clarity. For instance, conclusions about what constitutes an entry barrier verge on the contradictory.[69] Even in cases where discussion of entry barriers is more thorough, the lack of a universal approach leaves courts with a great deal of uncertainty when it comes to deciding the question.[70] Thus, in terms of big data, courts should bypass labels and determine the precise role that big data plays in a given competition matter.[71]

In terms of efficiencies, there are compelling arguments that mergers that bring together various data sets and/or improve the ability to analyze data should be recognized as cognizable efficiencies. Luib & Cowie detail a number of recent competition matters involving the recognition of big data efficiencies.[72] The court in AT&T-Time Warner recognized that “the combined company can use information from AT&T’s mobile and video customers to better tailor Time Warner’s content and advertising to better compete with online platforms.”[73] In CVS Health-Aetna, “[w]e were able to show that the combination of CVS Health and Aetna would lead to better-integrated medical and pharmacy data.”[74] Also, in 2010, data efficiencies played a role in the Department of Justice’s decision to close its investigation into Microsoft and Yahoo’s search alliance agreement, which involved integrating, to a degree, their search algorithms, advertising assets, and data.[75] The DOJ explicitly invoked the role of data in providing efficiencies and improving Microsoft’s competitive position.[76]

In sum, the burden of production should remain with the plaintiff to demonstrate that the use of big data is a significant hindrance to the timeliness, likelihood, and sufficiency of entry and/or hinders the competitive process—as measured by the negative impact on consumer welfare. What should be avoided is a presumption that the mere presence of big data lowers the burden of production on the plaintiff to either meet its prima facie burden or its burden to rebut the defendant’s efficiency justifications. In contrast, what recent competition cases have illustrated is that big data can play an integral role in realizing efficiencies post-merger.

IV. Data Remedies

Finally, when it comes to data and antitrust, there is increasing momentum to use data and the related platform infrastructure as potential remedies for anticompetitive conduct.[77] These remedies typically include proposals for data portability, data sharing, and interoperability. While these proposals are putting the cart before the horse, in this section, we broadly examine the incentive effects from imposing such remedies and the potential unintended consequences.

The most basic proposal to alleviate the market leadership of various digital platforms is to allow platform participants, such as users and advertisers, to easily and seamlessly port their data to rival platforms. The idea is to lower switching costs and lock-in effects, which will facilitate the ability to multi-home or switch altogether.[78] Of course, there is a major difference between using this proposal as a remedy to address demonstrable anticompetitive harms versus as a proactive regulatory intervention in an effort to improve market outcomes. As a remedy, the benefits of this proposal are that it can be targeted at specific practices, users, and platforms. These benefits might not translate to a regulatory setting, where the statutory language will likely have to be broader. This potentially opens the door for creative means where a platform could adhere to a strict reading of the regulation but implement measures that undermine its effectiveness in actually lowering lock-in and switching costs. This concern is likely significantly mitigated when data portability is used as a remedy—again, as remedies are intended to solve a specific competitive problem and, thus, attempts to undermine the stated objective will more likely be flagged as an order violation. Finally, given that the industry has already moved towards data portability as a best practice, the value of this proposal as an antitrust remedy is more limited.[79]

Various digital reports also call for more aggressive remedies including forced data sharing and interoperability requirements. Data sharing is getting at the heart of the debate and involves taking a platform’s intellectual property and requiring them to share it with rivals. One initial observation is that data sharing raises at least a nominal concern that it would facilitate coordination and, consequently, would be detrimental to the welfare of consumers. Additionally, there are major questions whether data sharing would actually solve the problem it is trying to address—given that advantages in data do not necessarily translate into advantages in innovation and quality.[80] It would seem that a necessary condition to impose such a remedy is a finding that data is an essential facility.[81]

Relatedly, forcing platforms to be interoperable even further necessitates leading platforms to coordinate on the design and infrastructure of their platforms. While this is ostensibly appealing, it also raises concerns about “bad” coordination, that is, agreements that soften competition rather than sharpen it.[82] Further, “standardization” could be implemented in an overly technical and complex manner, where the intent is to actually hinder, rather than facilitate, entry. Finally, forced sharing—of any kind—has been demonstrated to dampen incentives and reduce innovation.[83]

In sum, the basic message is that regulations intended to fix the competitive deficiencies of a digital market are inevitably going to lead to unintended consequences, which can ultimately do more harm than good. Further, attempts to promote rival products through forced sharing and interoperability are highly likely to negatively impact dynamic incentives. Thus, these proposals, while well-intended, should only be seriously considered after a thorough analysis of the full benefits and costs. As a remedy, there certainly can be specific instances where these proposals could address the harm from proven anticompetitive conduct. Yet, even in those instances, agencies and courts should be aware of their inherent weaknesses and, thus, they should be implemented in the narrowest manner possible.


The sheer volume, velocity, and variety of data collected throughout an increasingly digital economy has given rise to the big data era. Big data is being used to innovate in many sectors of the economy including in healthcare, farming, and education.[84] Yet, the widespread use of big data has also given rise to antitrust concerns that big data is actually resulting in reductions in welfare by hindering competition and protecting big tech incumbents. In this chapter, we have examined the role of big data in antitrust with a particular focus on how it fits into the larger production process and whether it should be considered a barrier to entry. Further, we considered the idea that big data is part of a positive feedback loop where increased quality precipitates an increase in users which further increases quality and whether this is ultimately harmful. Finally, we considered whether there is a need to change our antitrust presumptions regarding market power and big data and the merits of various proposed remedies to the alleged big data problem. Ultimately, if big data alters the competitive process, then we should focus on the testable implications that emerge. What we should avoid, however, are shortcuts and regulation, or unwarranted antitrust intervention, based on the “bigness” of big data and perfunctory labels such as barriers to entry.


* This chapter builds on a number of prior works including the GAI Comment on The Federal Trade Commission’s Hearings on Competition and Consumer Protection in the 21st Century, Big Data, and Competition, Nov. 5, 2018; the GAI Comment on the Canadian Competition Bureau’s White Paper, “Big Data and Innovation: Implications for Competition Policy in Canada,” Nov. 17, 2017; and John M. Yun, Antitrust After Big Data, 4 Criterion J. on Innovation 407 (2019), which was published with kind permission from the Baltic Y.B. Int’l L. (forthcoming). I thank Scalia Law student Rachel Burke for excellent research assistance.

[1] See, e.g., Data Never Sleeps 6.0, Domo (2018), (“By 2020, it’s estimated that for every person on earth, 1.7 MB of data will be created every second.”). In 2018, the International Data Corporation (IDC) estimated the global volume of data to be 33 zettabytes, which is equivalent to 33 trillion gigabytes—forecasting it to grow to 175 zettabytes in 2025. David Reinsel, John Gantz & John Rydning, The Digitization of the World: From Edge to Core 6 (2018),

[2] The phrase “big data” likely first entered the economic lexicon in 2000, when economist Francis Diebold used the term. Francis X. Diebold, On the Origin(s) and Development of “Big Data”: The Phenomenon, the Term, and the Discipline 2 (Feb. 13, 2019) (unpublished manuscript), Regardless of its origin, most definitions of big data are similar to the one provided by the OECD: “Big Data is commonly understood as the use of large scale computing power and technologically advanced software in order to collect, process and analyse data characterised by a large volume, velocity, variety and value.” OECD Directorate for Financial and Enterprise Affairs, Competition Committee, Big Data: Bringing Competition Policy to the Digital Era 2, No. DAF/COMP(2016)14 (Nov. 2016),

[3] See, e.g., OECD, Private Equity Investment in Artificial Intelligence 1, Going Digital Policy Note (Dec. 2018), (“. . . AI start-ups have so far attracted around 12% of all worldwide private equity investments in the first half of 2018, a steep increase from just 3% in 2011.”).

[4] See, e.g., Austl. Competition & Consumer Comm’n, Digital Platforms Inquiry Final Report 11 (2019), [hereinafter ACCC Report], (“While user data is not rare, and a large number of businesses track consumers’ digital footprints, no other businesses come close to the level of tracking undertaken by Google and Facebook.”); Directorate-General for Competition, Eur. Comm’n, Competition Policy for the Digital Era 4 (2019), [hereinafter Crémer Report],
publications/reports/kd0419345enn.pdf (“. . .any discussion of market power should analyse, case by case, the access to data available to the presumed dominant firm but not to competitors, and the sustainability of any such differential access to data.”); Dig. Competition Expert Panel, Unlocking Digital Competition 23 (2019), [hereinafter Furman Report],
/uploads/system/uploads/attachment_data/file/785547/unlocking_digital_competition_furman_review_web.pdf (“. . . the scale and breadth of data that large digital companies have been able to amass, usually generated as a by-product of an activity, is unprecedented.”); Stigler Comm. on Dig. Platforms, Stigler Ctr., Final Report 111 (2019) [hereinafter Stigler Report],—committee-report—stigler-center.pdf at 17 (“DPs [digital platforms] use their control over specific types of data to increase their market power and, more importantly, their political power.”).

[5] Professor James C. Cooper analyzes privacy issues in-depth in his chapter, Antitrust and Privacy, in The GAI Report on the Digital Economy (2020).

[6] See, e.g., ACCC Report, supra note 4, at 11 (“The breadth and depth of user data collected by the incumbent digital platforms provides them with a strong competitive advantage, creating barriers to rivals entering and expanding in relevant markets, and allowing the incumbent digital platforms to expand into adjacent markets.”); Stigler Report, supra note 4, at 40 (“Barriers to equivalent data resources, a side effect of not having the history, scale, or scope of the incumbent, can inhibit entry, expansion, and innovation.”).

[7] See, e.g., ACCC Report, supra note 4, at 11 (“The ACCC considers that opening up the data, or the routes to data, held by the major digital platforms may reduce the barriers to competition in existing markets and assist competitive innovation in future markets. This could be achieved by requiring leading digital platforms to share the data with potential rivals . . . Another is to require the platforms to provide interoperability with other services.”); Furman Report supra note 4, at 76 (“The digital markets unit should use data openness as a tool to promote competition, where it determines this is necessary and proportionate to achieve its aims . . . privacy . . . One model would be to require a dataset to be shared in a controlled environment, with access granted to approved businesses.”).

[8] Professor Justin (Gus) Hurwitz provides a more thorough treatment of various proposals in his chapter, Digital Duty to Deal, Data Portability, and Interoperability, in The GAI Report on the Digital Economy (2020).

[9] See Timothy J. Muris & John E. Nuechterlein , Antitrust in the Internet Era: The Legacy of United States v. A&P, 54 Rev. Indus. Org. 651, 657 (2018) (“A&P also succeeded because it did what many tech companies do today, albeit amid much controversy: use data to create greater consumer value.”).

[10] See Erik Larson, What Sort of Car-rt-sort Am I? Junk Mail and the Search for Self, Harper’s Mag., Jul. 1989, at 64.

[11] Bernard Marr, Really Big Data at Walmart: Real-Time Insights from Their 40+ Petabyte Data Cloud, Forbes (Jan. 23, 2017),

[12] Kim Souza, Wal-Mart Works to Use Big Data to Improve Checkout Process, Manage Supply Chain, Talk Bus. & Pol. (Aug. 10, 2017),

[13] David Beer, How Should We Do a History of Big Data?, 3 Big Data & Soc’y 1, 2–4 (2016) (“[T]he sense that we are being faced with a deluge of data about people is not new, in fact it has a long history.”).

[14] Id.

[15] Id.

[16] See Joshua D. Wright & Elyse Dorsey, Antitrust Analysis of Big Data, 2 Competition L. & Pol’y Debate 35, 36 (2016).

[17] See D. Daniel Sokol & Roisin Comerford, Antitrust and Regulating Big Data, 23 Geo. Mason L. Rev. 1129, 1133–40 (2016). The use of “free” in this context is simply to identify the common phrasing associated with this business model—as sensitivities have grown around the use of “free” versus “zero price.” See, e.g., David S. Evans, Antitrust Economics of Free, 7 Competition Pol’y Int’l 71, 72 (2011).

[18] See, e.g., Catherine Tucker, Digital Data as an Essential Facility: Control, CPI Antitrust Chron., Feb. 2020 at 11 (“. . . ultimately the value of data is not the raw manifestation of the data itself, but the ability of a firm to use this data as an input to insight.).

[19] See Harold Demsetz, Barriers to Entry, 72 Am. Econ. Rev. 47, 51 (1982).

[20] See, e.g., Darren S. Tucker & Hill B. Wellford, Big Mistakes Regarding Big Data, Antitrust Source, Dec. 2014, at 7 (“Entering the market and then collecting and analyzing user data is not a theoretical approach but rather the very model followed by many of the leading online firms when they were startups or virtual unknowns, including Google, Facebook, Yelp, Amazon, eBay, Pinterest, and Twitter.”).

[21] See, e.g., OECD, Big Data: Bringing Competition Policy to the Digital Era 3, No. DAF/COMP/M(2016)2/ANN4/FINAL (Apr. 2017),
M(2016)2/ANN4/FINAL/en/pdf (“The control over a large volume of data is a not sufficient factor to establish market power, as nowadays a variety of data can be easily and cheaply collected by small companies—for instance, through point of sale terminals, web logs and sensors—or acquired from the broker industry.”).

[22] For a more in-depth discussion of barriers to entry in economics, see generally R. Preston McAfee et al., What is a Barrier to Entry?, 94 Am. Econ. Rev. 461 (2004). For a thorough discussion of entry in antitrust, see generally Jonathan B. Baker, Responding to Developments in Economics and the Courts: Entry in the Merger Guidelines, Jun. 10, 2002,

[23] See Joseph Bain, Barriers to New Competition 3 (1956).

[24] George J. Stigler, The Organization of Industry 67 (1968) (“A barrier to entry may be defined as a cost of producing (at some or every rate of output) which must be borne by a firm which seeks to enter an industry but is not borne by firms already in the industry.”).

[25] Franklin M. Fisher, Diagnosing Monopoly, 19 Q. Rev. Econ. & Bus. 7, 23 (1979); C.C. von Weizsäcker, A Welfare Analysis of Barriers to Entry, 11 The Bell J. Econ. 399, 400–401 (1980).

[26] Fisher, supra note 25, at 23.

[27] von Weizsäcker, supra note 25, at 400.

[28] Id. at 401.

[29] Fisher, supra note 25, at 27.

[30] Dennis W. Carlton, Barriers to Entry, 1 Issues in Competition L. & Pol’y 601, 606 (2008).

[31] Id. at 615.

[32] See, e.g., W. Kip Viscusi, John M. Vernon & Joseph E. Harrington, Jr., Economics of Regulation and Antitrust 168 (4th ed., 2005) (“There is perhaps no subject that has created more controversy among industrial organization economists than that of barriers to entry. At one extreme, some economists argue that the only real barriers are government related . . . At the other end of the spectrum, some economists argue that almost any large expenditure necessary to start up a business is a barrier to entry.”).

[33] U.S. Dep’t of Justice & Fed. Trade Comm’n, Horizontal Merger Guidelines § 9 (2010) (hereinafter Horizonal Merger Guidelines].

[34] See Carlton supra note 30, at 604 (“Models that focus on only price competition may fail miserably to correctly predict industry concentration and consumer welfare when there are other product dimensions along which competition occurs. This is likely to be particularly true in industries requiring investment and creation of new products.”).

[35] See Michael L. Katz & Carl Shapiro, Systems Competition and Network Effects, 8 J. Econ. Perspectives 93, 107 (1994) (“[M]erely observing a firm with a position of market dominance does not imply that the firm is earning super-normal profits: the firm’s quasi-rents may merely reflect costs incurred earlier to obtain the position of market leadership.”).

[36] Snapchat and TikTok are recent examples. Among users aged 12 to 17, Snapchat is the social media market leader with 16.8 million users, while Instagram and Facebook are second and third at 12.8 million and 11.5 million, respectively. See Facebook is Tops with Everyone but Teens, eMarketer (Aug. 28, 2018), In terms of daily active users, TikTok has grown to 800 million users, which is nearly 50 percent of Facebook’s 1.7 billion users. See Mike Vohaus, ByteDance, Chinese Digital Giant and Owner of TikTok, Reported to Have Revenues of $17 Billion, Forbes (May 27, 2020),

[37] Tucker & Wellford, supra note 20, at 7.

[38] See Horizonal Merger Guidelines, supra note 33, § 9 (“The Agencies consider the actual history of entry into the relevant market and give substantial weight to this evidence.”).

[39] See, e.g., Tucker & Wellford, supra note 20, at 7.

[40] See Kirk McElhearn, 15 Years of iTunes: a Look at Apple’s Media App and Its Influence on an Industry, Macworld (Jan. 9, 2016),

[41] See Gregg Keizer, Apple Controls 70% of U.S. Music Download Biz, Computerworld (May 26, 2010),–of-u-s–music-download-biz.html.

[42] See Company Info, Spotify, (last visited Aug. 19, 2020).

[43] See, e.g., Ingrid Lunden, In Europe, Spotify Royalties Overtake iTunes Earnings by 13%, TechCrunch (Nov. 4, 2014),; Mark Mulligan, Mid-Year 2018 Streaming Market Shares, Midia (Sept. 13, 2018),

[44] See Evan Niu, Is Apple Music at 70 Million Subscribers Yet?, The Motley Fool (Feb. 6, 2020),

[45] See Desktop, Mobile & Tablet Browser Market Share Worldwide, StatCounter GlobalStats, (last visited Aug. 19, 2020). Of course, one could argue that Chrome’s ownership by Google and its trove of data are responsible for its success. According to one research paper, however, Chrome’s success was due to its sheer technical superiority. See Jonathan Tamary & Dror G. Feitelson, The Rise of Chrome, PeerJ Computer Science (Oct. 28, 2015),

[46] See, e.g., Dan Price, 10 Types of Data Your Browser is Collecting About You Right Now, MakeUseOf (Oct. 12, 2018),

[47] Leslie Chiou & Catherine Tucker, Search Engines and Data Retention: Implications for Privacy and Antitrust 19–20 (NBER Working Paper No. 23815, 2017),

[48] Nils-Peter Schepp & Achim Wambach, On Big Data and Its Relevance for Market Power Assessment, 7 J. Eur. Comp. L. & Prac. 120, 122 (2016).

[49] See Cédric Argenton & Jens Prüfer, Search Engine Competition with Network Externalities, 8 J. Comp. L. & Econ. 73, 74 (2012); Jens Prüfer & Christoph Schottmüller, Competing with Big Data 6 (Tilburg L. Sch. Legal Stud. Res. Paper Series No. 06/2017, 2017),

[50] See, e.g., Maurice E. Stucke & Allen P. Grunes, Big Data and Competition Policy 170 (2016) (“T]he more people actively or passively contribute data, the more the company can improve the quality of its product, the more attractive the product is to other users, the more data the company has to further improve its product, which becomes more attractive to prospective users.”).

[51] See Furman Report, supra note 4, at 33 (“The mechanism through which data provide incumbent businesses with a competitive advantage is known as a feedback loop . . . user feedback loops occur when companies collect data from users which they use to improve the quality of their product or service, which then draws in more users, creating a virtuous circle.”); ACCC Report, supra note 4, at 11 (“The multiple touch points that Google and Facebook each have with their users enable them to collect more user data, improve their services and attract more users and advertisers, creating a virtuous feedback loop.”); Stigler Report, supra note 4, at 40 (“A data advantage over rivals can enable a company to achieve a virtuous circle of critical economies of scale leading to network effects, and a competitive balance in its favor, leading to the gathering of yet more data.”).

[52] See Furman Report, supra note 4, at 33–34 (“Data can act as a barrier to entry in digital markets. A data-rich incumbent is able to cement its position by improving its service and making it more targeted for users, as well as making more money by better targeting its advertising . . . The extent to which data are of central importance to the offer but inaccessible to competitors, in terms of volume, velocity or variety, may confer a form of unmatchable advantage on the incumbent business, making successful rivalry less likely.”).

[53] See von Weizsäcker, supra note 25.

[54] Learning by doing is an economic concept based on the idea that firms enjoy lower costs of production due to the cumulative effects of experience in production, which means a more efficient production process. See Kenneth J. Arrow, The Economic Implications of Learning by Doing, 29 Rev. Econ. Studies 155, 156 (1962). See also Peter Thompson, Learning by Doing, in Handbook of Economics of Innovation (Bronwyn Hall & Nathan Rosenberg, eds., 2010). Similar to using big data, learning by doing is not a passive activity as it involves investment in data collection, analysis, and experimentation. See Steven D. Levitt et al., Toward an Understanding of Learning by Doing: Evidence from an Automobile Assembly Plant, 121 J. Pol. Econ. 643, 647 (2013); see also John M. Dutton & Annie Thomas, Treating Progress Functions as a Managerial Opportunity, 9 Acad. Mgmt. Rev. 235, 235 (1984).

[55] An example of an improper condemnation of greater efficiencies is the decision in United States v. Aluminum Co. of Am., 148 F.2d 416, 431 (2d Cir. 1945) (“Nothing compelled [Alcoa] to keep doubling and redoubling its capacity before others entered the field. It insists that it never excluded competitors; but we can think of no more effective exclusion than progressively to embrace each new opportunity as it opened, and to face every newcomer with new capacity already geared into a great organization, having the advantage of experience, trade connections and the elite of personnel.”).

[56] Sergey Brin & Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, 30 Computer Networks & ISDN Systems 107, 109 (1998).

[57] Direct network effects occur when, for instance, adding more people to a communications network (such as, telephone, fax machine, email) results in greater value to all the participants on the network, as there are now a greater number of possible connections.

[58] Of course, there can be situations where the data provided by users is in a more direct format, such as user reviews on Amazon and TripAdvisor. In these instances, the platform does not need to engage in extensive processing of the data to transform it to something of value to other users. However, these instances are more analogous to a traditional, direct network effect than to a data-driven network effect. The reason is that the benefits from a direct network effect derive, not just from the mere presence of other users on the network, but also from their active participation (in this example, through reviews).

[59] See, e.g., John M. Yun, Does Antitrust Have Digital Blind Spots?, 72 S.C. L. Rev. 1, 25 (forthcoming 2020), See also Alexander Krzepicki et al., The Impulse to Condemn the Strange: Assessing Big Data in Antitrust, CPI Antitrust Chron., Feb. 2020, at 16.

[60] See Hal Varian, Use and Abuse of Network Effects, in Toward a Just Society: Joseph Stiglitz and Twenty-First Century Economics 227, 232 (Martin Guzman ed., 2018) (“The key claim here is that ‘more users lead to more data leads to more product improvements which leads to more users.’ This is not really a network effect, direct or indirect. It is a supply side effect: more data allows the search engine to produce higher quality products which in turn attract more users . . . Mere data by itself doesn’t confer a competitive advantage; that data has to be translated into information, knowledge, and action.”).

[61] See, e.g., Xavier Amatriain, Mining Large Streams of User Data for Personalized Recommendations, 14 SIGKDD Explorations 37, 43 (2013) (Discussing his research at Netflix: “The previous discussion on models vs. data has recently become a favorite—and controversial—topic. The improvements enabled thanks to the availability of large volumes of data together with a certain Big Data ‘hype’ have driven many people to conclude that it is ‘all about the data’. But in most cases, data by itself does not help in making our predictive models better.”). See also Enric Junqué de Fortuny et al., Predictive Modeling with Big Data, 1 Big Data 215, 219 (2013) (“for most of the datasets the performance keeps improving even when we sample more than millions of individuals for training the models. One should note, however, that the curves do seem to show some diminishing returns to scale.”).

[62] See, e.g., Anja Lambrecht & Catherine Tucker, Can Big Data Protect a Firm from Competition?, CPI Antitrust Chron., Jan. 2017, at 12 (“This type of commercially available big data typically has broad reach and coverage, allowing many firms whose business does not usually generate big data to gain insights similar to those available to firms that own big data on a large number of customers. There are many examples for very big commercially available data sets.”).

[63] In 2009, the U.S. government created, which is a repository of open data, and is part of a larger U.S. federal government initiative to have an open data policy. While predominantly federal agency data, also includes data from states, counties, cities, universities, private entities, and non-profits. See Data Catalog,, (last visited Aug. 19, 2020). Cities such as New York also have their own open data portals. See NYC Open Data,

[64] For example, Twitter offers free and paid access to its various application programming interfaces (APIs), which are “products, tools, and resources that enable you to harness the power of Twitter’s open, global, and real-time communication network.” See Get Started with the Twitter Developer Platform, Twitter, (last visited Aug. 19, 2020).

[65] According to the Merger Guidelines § 10 at 30, “[c]ognizable efficiencies are merger-specific efficiencies that have been verified and do not arise from anticompetitive reductions in output or service.” Horizonal Merger Guidelines, supra note 33, § 10.

[66] See Sokol & Comerford, supra note 17, at 1130 (“. . . arguments for antitrust intervention when Big Data has come up as an issue have never carried the day for any merger or decided conduct case in any Department of Justice Antitrust Division (‘DOJ’), Federal Trade Commission (‘FTC’) or Directorate-General for Competition (‘DG Competition’) case to date.”). See also Margrethe Vestager, Competition in a Big Data World, speech at Digital Life Design (Jan. 17, 2016), (“[W]e shouldn’t take action just because a company holds a lot of data. After all, data doesn’t automatically equal power . . . The Commission has looked at this issue in two merger cases—Google’s acquisition of DoubleClick, and Facebook’s purchase of WhatsApp. In the particular circumstances of those cases, there was no serious cause for concern . . . We continue to look carefully at this issue, but we haven’t found a competition problem yet.”). A recent German competition authority case involving Facebook certainly invokes the idea of big data; although, the case is more about Facebook’s policy of combining data across its various online properties and the implicit privacy bargain that its users must make. See, e.g., Ursula Knapp & Douglas Busvine, Top German Court Reimposes Data Curbs on Facebook, Reuters (Jun. 23, 2020),

[67] See, e.g., Daniel L. Rubinfeld & Michael S. Gal, Access Barriers to Big Data, 59 Ariz. L. Rev. 339, 346 (2017) (“Taken together, these facts imply that determining whether a big-data collector possesses market power mandates that one define the market in which the data collector operates, as well as the use(s) of such data, much like any other market analysis.”).

[68] See, e.g., Daniel E. Lazaroff, Entry Barriers and Contemporary Antitrust Litigation, 7 U.C. Davis Bus. L.J. 1, 6 (2006) (“. . . the Supreme Court has really never provided a comprehensive analysis of barriers to entry and their role in interpreting the Sherman, Clayton and Federal Trade Commission Acts. Rather, the Court has periodically referenced entry barriers in antitrust cases, resulting in a somewhat cryptic and uncertain message to lower courts, litigants and students of antitrust law.”).

[69] Compare GDHI Mktg. LLC v. Antsel Mktg. LLC, No. 18-CV-2672-MSK-NRN, 2019 WL 4572853 at *9 n.5 (D. Colo. Sept. 20, 2019) (“The mere cost of capital is not a barrier to entry.”) with Philadelphia Taxi Ass’n, Inc v. Uber Techs., Inc., 886 F.3d 332, 342 (3d Cir. 2018) (“Entry barriers include . . . high capital costs.”).

[70] See Buccaneer Energy (USA) Inc. v. Gunnison Energy Corp., 846 F.3d 1297, 1316–17 (10th Cir. 2017).

[71] For instance, when Apple proposed to acquire the music recognition app Shazam in 2018, the European Commission examined several ways in which the acquisition could incentivize Apple to engage in anticompetitive conduct including exploiting its “big data” advantage from having access to Shazam data. Ultimately, the Commission cleared the deal based, in part, on a finding that Shazam’s data did not represent a particularly unique asset in the market. See Nicolo Zingales, Apple/Shazam: Data is Power, But Not a Problem Here, CPI EU News Column (Dec. 2018),

[72] Greg P. Luib and Mike Cowie, Big (But Not Bad) Data and Merger Efficiencies, Dechert LLP: ONPOINT (Jan 2020),

[73] Id.

[74] Id.

[75] See Press Release, U.S. Dep’t of Justice, Statement of the Department of Justice Antitrust Division on its Decision to Close Its Investigation of the Internet Search and Paid Search Advertising Agreement Between Microsoft Corporation and Yahoo! Inc. (Feb. 18, 2010),

[76] Id. (“The increased queries received by the combined operation will further provide Microsoft with a much larger pool of data than it currently has or is likely to obtain without this transaction. This larger data pool may enable more effective testing and thus more rapid innovation of potential new search-related products, changes in the presentation of search results and paid search listings, other changes in the user interface, and changes in the search or paid search algorithms.”).

[77] See supra note 7.

[78] Multi-homing is the practice of concurrently using two or more competing platforms.

[79] See About Us, Data Transfer Project, (last visited Aug. 19, 2020) (“The Data Transfer Project was launched in 2018 to create an open-source, service-to-service data portability platform so that all individuals across the web could easily move their data between online service providers whenever they want.” Apple, Facebook, Google, Microsoft, and Twitter have all committed to the project.).

[80] See supra Sections I & II.

[81] The essential facilities doctrine has a long history in antitrust jurisprudence and involves the recognition that, while firms normally have no duty to help their rivals, there are instances when a monopolist’s control over an input is so essential to a rival’s ability to compete, that withholding or foreclosing that input is considered an illegal restraint of trade. See, e.g., Robert Pitofsky, The Essential Facilities Doctrine Under U.S. Antitrust Law, 70 Antitrust L.J. 443, 444 (2002).

[82] For details on the types of agreements between competitors that can result in anticompetitive harm, see U.S. Fed. Trade Comm’n & Dep’t of Justice, Antitrust Guidelines for Collaboration Among Competitors (2000).

[83] See, e.g., Thomas W. Hazlett & Anil Caliskan, Natural Experiments in U.S. Broadband Regulation, 7 Rev. of Network Econ. 460, 477 (2008) (“Cable modem services held nearly a two-to-one market share advantage when DSL carriers were most heavily obligated to provide ‘open access’ to competing ISPs. Once the FCC eliminated a key provision of that access regime . . . DSL subscribership increased dramatically . . . [and] was 65% higher—more than 9 million households—than it would have been under the linear trend established under ‘open access’ regulation.”).

[84] See, e.g., Ashley Brooks, 7 Data Innovations That are Advancing Industries of All Kinds, Rasmussen College Technology Blog (Jun. 3, 2019),; Lisa Balboa, Divining with Data, The Actuary (Apr. 2019),