Recently, a U.S. judge ruled that Google must share its search results and parts of its search data with competitors. According to economist Jens Prüfer of Tilburg University and co-authors, this ruling confirms exactly what their new study shows: without mandatory data sharing, dominant players like Google remain untouchable because they possess far more data than their competitors.
‘As long as only Google has access to what people worldwide type and click on, smaller search engines simply cannot compete’, says Prof. Dr. Jens Prüfer, one of the authors. ‘Even with a comparable algorithm, they do not get the chance to improve themselves because they have too little data.’ Importance of user data for search quality The researchers conducted an experiment with the independent, privacy-focused search engine Cliqz. In the experiment, the algorithm was kept constant, but the amount of available user data was varied. What did they find? For common search queries, such as ‘weather today’ or ‘pasta recipe’, the amount of data makes little difference to the quality of results. For rare search querie, so-called tail queries, such as medical questions or local information, data is crucial. And these rare queries are decisive: they account for 74% of all search traffic. Without sufficient data, smaller search engines therefore have no realistic chance of becoming a full-fledged competitor. Fair access to data is crucial The researchers advocate for mandatory data sharing by dominant players like Google. In the EU, this is already included in the Digital Markets Act (DMA). The recent U.S. ruling gives this argument extra weight. ‘Access to data is the new access to the market’, says Prof. Dr. Tobias Klein. Large tech companies increasingly determine how we find information, and search engines are no longer neutral tools but gatekeepers. Without intervention, one player will continue to dominate the information landscape. Mandatory data sharing can break this pattern, without compromising user privacy, because data can be shared safely and anonymized. The power of data Without sufficient data, smaller search engines remain powerless, even with good technology. This is also evident in the battle between ChatGPT and Bard, says Jens Prüfer. ‘When OpenAI launched ChatGPT at the end of 2022, Microsoft immediately invested to integrate it into Bing. Google responded with Bard, but it has a major advantage: it possesses far more user data. As a result, Google is likely to remain on top, even if ChatGPT is smarter. Only rules that mandate data sharing can change this.’ Publication The article How Important Are User-Generated Data for Search Result Quality? was published in the Journal of Law and Economics. The authors are Tobias J. Klein (Tilburg University), Madina Kurmangaliyeva (Trinity College Dublin), Jens Prüfer (Tilburg University), and Patricia Prüfer (Centerdata).