View Post
Text and Data Mining: how the Future TDM workshop highlighted the draft exception must be improved for TDM to have a future in Europe

Text and Data Mining: how the Future TDM workshop highlighted the draft exception must be improved for TDM to have a future in Europe

For the legal geeks among us, it is now old news that the European Commission, after promising to modernise copyright, issued a rather unhinged and disappointing copyright review proposal aimed at creating what it claims to be a ‘well-functioning marketplace’.

Neighbouring rights aka ancillary copyright for media snippets, robocopyright type content filtering on user uploaded content, mandatory exceptions that can be overridden by Member States or in case of licensing deals (huh?), … you name it: the review has it.

There is however one small light at the end of that very skewed and scary-looking tunnel: the copyright review does comprise a mandatory exception for text and data mining (aka TDM) in its Article 3 (with additional explanations in Recitals 8 to 13), a crucial element to enable the use of modern techniques on copyrighted material. To show how important TDM is and what’s at stake, we actually put together a short video which we encourage you to share. (Want to skip directly to our ‘magic recipe’ for a workable TDM exception click here)

Why is everyone in the research and innovation fields not throwing a party then? Well, because the proposal as drafted by the European Commission comprises considerable flaws, many of which were highlighted at the FutureTDM workshop.

Where the proposed TDM exception gets it right

Text and data mining is defined under Article 2 sub (2) as ‘ any automated analytical technique aiming to analyse text and data in digital form in order to generate information such as patterns, trends and correlations’ and the proposed TDM exception basically reads:

Article 3
Text and data mining

  1. Member States shall provide for an exception to the rights provided for in Article 2 of Directive 2001/29/EC, Articles 5(a) and 7(1) of Directive 96/9/EC and Article 11(1) of this Directive for reproductions and extractions made by research organisations in order to carry out text and data mining of works or other subject matter to which they have lawful access for the purposes of scientific research.
  2. Any contractual provision contrary to the exception provided for in paragraph 1 shall be unenforceable.
  3. Rightholders shall be allowed to apply measures to ensure the security and integrity of the networks and databases where the works or other subject-matter are hosted. Such measures shall not go beyond what is necessary to achieve that objective.
  4. Member States shall encourage rightholders and research organisations to define commonly-agreed best practices concerning the application of the measures referred to in paragraph 3.


 The proposal comprises four positive elements:

  1. There is an exception: this may seem ridiculous but seeing the lack of ambition of the proposed copyright review, one tends to count one’s blessings these days.
  2. The exception is mandatory, as opposed to the approach based on voluntary exceptions of the current copyright framework (as set out in the InfoSoc Directive), which results in a patchwork of implementations and total legal uncertainty in an online or cross-border environment.
  3. The exception explicitly states that contractual bypasses will not be allowed (art 3 par 2). Frankly, such a principle should be applied to all the existing exceptions as one can hardly understand why policy makers spend months crafting exceptions, arguing there every comma, negotiating there scope, scale and detail, to have all of that legislative work brushed aside by one obscure contractual clause that often the parties at the table not holding copyright cannot negotiate. But let us rejoice at least that one exception will get the common sense treatment of ‘the law is worth more than a contract’.
  4. The exception is not limited to non-commercial activities. This is important as research activities even within institutions such a s universities are often conducted through public-private partnerships or with some form of private funding, which hence makes any restriction to non-commercial unworkable in practice.

Where the proposed TDM exception fails to deliver a positive outcome for Europe

The main legal shortcomings were highlighted in the presentation given by Lucie Guibault, Associate Professor at the Institute for Information Law of the University of Amsterdam, whilst the ‘security & integrity’ addition creates a major practical loophole in the entire legal provision:


Presentation by Prof. Lucie Guibault at the FutureTDM Workshop:

  1. The beneficiaries of the TDM exception are too limited in scope (Article 3 par 1 & Recital 11): the beneficiaries should not be limited to ‘research organisations’ as this is detrimental at two levels: on the one hand, it excludes businesses from benefiting from this exception, at a time where a vibrant start-up community is looking into the potential of these new techniques, and on the other, it excludes individual researchers that are not affiliated to a given research organisations from working in an independent manner if they need to use TDM with legal certainty. The latter also includes investigative journalism, and goes counter to the European Commission’s claim it wants to promote ‘Citizen science‘.
  2. The purpose of use is too narrowly defined and could give rise to discussions (Article 3 par 3 & Recital 12): the proposed draft only covers ‘scientific research’, an extremely limited scope that could even within the scientific community lead to discussions between the proponents of soft sciences (social sciences) and those that only see the merit of hard sciences (natural sciences). It certainly excludes many innovative uses of TDM that bring benefits to our society (or could have the potential to do so) for no obvious reason.
  3. The types of material that are covered by the exception could be interpreted in and unduly restrictive manner (Article 3 par 1): can TDM be applied in an unrestricted manner to any type of minable content or does the exception only cover materials ‘associated with scientific publication’?
  4. The possibility for rightholders to neutralise the exception in practice through so-called security & integrity measures creates a gaping loophole for abuses (Article 3 par 3 & Recital 12): by allowing publishers to introduce random measures to protect the ‘security and integrity’ of their network, the effective use of TDM could simply be rendered impossible, or the use of the publishers own platforms could become the only viable alternative for researchers. There are already known cases of Captcha measures being implemented if researchers want to download articles in bulk (which means algorithms cannot work as human intervention s constantly needed), or measures whereby only one article can be downloaded every 20 seconds (which, as pointed out by Professor Ananiadou from the University of Manchester at the FutureTDM workshop, sounds like a lot but actually means you need 12 years to download 20 million documents). This loophole  could allow rightholders to arbitrarily block access for researchers trying to conduct text and data mining. Safeguards in line with those put in place in the context of ‘traffic management’ by telecom operators could be considered (see Article 3 par 3 of the Telecoms Single Market Regulation [EU 2015/2120]), with requirements of proportionality, efficiency, non-discrimination (for example with the security measures applied to researchers’ algorithms vs tose applied to the publishers’ own platform), etc. could be a good starting point to frame this measure.

So what is needed?

The good news is that the Members of the European Parliament (MEPs) present at the FutureTDM Workshop certainly seemed aware that there was room for improvement and willing to tackle the issue. But let’s also be realistic: those were three very well-informed MEPs, out of 751 MEPs in total, so there is a lot of work to be done to inform their colleagues of the need for a proper TDM exception.

Whilst the UK opened the door in Europe for a TDM exception, the one they drafted is also far from perfect, if only because they felt that the existing InfoSoc Directive made it impossible for them to adopt a TDM exception that would cover commercial uses, hence making it skewed from the start.

Singapore, after introducing fair use a couple of years ago, is now also looking into introducing a TDM exception and, in doing so, is making some valid points in its consultation proposal (see pp. 34-35):

  • 3.64 We propose to create a new exception in the CA, which allows the copying of copyrighted works for the purposes of data analysis. The user of the work must have had legitimate access to the work in the first place (e.g. a subscription to an academic journal, or collating online articles which are not locked behind a pay-wall), and the exception would not differentiate between commercial or non-commercial activities, which means the final analysis can be commercialised. However, the exception is not intended to cover situations where commercial benefit came from the actual copies of the works instead of the analysis. An example is where someone copies the works to collate into a large database for sale as a service without doing any analysis on it.

Muthu works at a media monitoring company, which has taken on a project by a fast food chain to help determine customer sentiment towards their latest menu item. Muthu starts by collating any social media and food blog posts which mentioned the menu item’s name, as well as comments left on review websites and replies on the fast food chain’s websites and social media outlets. As part of the collation, he ends up making a copy of all of the posts, comments and reviews. He then uses his company’s proprietary tool to analyse the data and determine whether general customer sentiment was good or bad towards the new menu item. This sentiment analysis was then passed on to the fast food chain. Under the current CA, any of the people who had made the posts, replies or comments could potentially claim that Muthu did not ask their permission to make copies of their creative works. With the proposed exception, the copying of such creative works can be done without permission as long as the purpose is for data analysis. However, if Muthu’s company simply forwarded the copies of all of the posts, comments and reviews without analysing them, to the fast food chain, the exception would not apply.

In other words, here are the ingredients for the magic recipe:

  • Keep what’s good in the proposed TDM exception: it should be mandatory, not distinguish between commercial and non-commercial and not be bypassed by contractual provisions.
  • Expand the scope and scale of the beneficiaries: the beneficiaries should be both natural persons (=human beings) and legal person (=organisations), and should not be limited to research organisations.
  • Do not limit the purpose to scientific research, nor the scope of the minable materials.
  • Ensure that any security or integrity measures implemented by rightholders are open to a rigorous scrutiny and must abide by a set of parameters that prevent abuse.


View Post
EC Failed to #FixCopyright: Stop ‘RoboCopyright’ and Ancillary Copyright & Start to Focus on Users and Creators

EC Failed to #FixCopyright: Stop ‘RoboCopyright’ and Ancillary Copyright & Start to Focus on Users and Creators

The European Commission promised to modernise copyright, but instead of creating a well-functioning legal framework addressing the concerns of creators and end-users it proposes to protect old business models by creating what it claims to be a ‘well-functioning marketplace’. To do so, the EC creates ‘RoboCopyright’, compelling intermediaries hosting user-uploaded content to implement content filtering technologies and handing over the content policing to the right holders. Our message to the EC: Stop ‘RoboCopyright’ and ancillary copyright, and start to focus on users and creators.


Caroline De Cock, C4C Coordinator


Following the publication of the European Commission’s (EC) proposal for a Directive on ‘Copyright in the Digital Single Market’, the C4C (C4C) coalition would like to share its outcry about the EC’s lack of ambition and the missed opportunity of this copyright review. Our 3 major concerns are – detailed overview below:

  1. Not addressing the promised objective: The EC’s reform proposal starts from the outset that is more important to achieve ‘a well-functioning marketplace for copyright’, rather than creating a well-functioning legal framework for copyright that address the concerns of citizens and end-users, and enables a digital single market.
  1. The introduction of ‘RoboCopyright’: Ignoring any threats to users’ fundamental freedoms, the EC seems to consider algorithms by private companies should filter European citizens’ content on the Internet. (check out ‘RoboCopyright 2.0‘)
  1. Blatant disregard of citizens’ voices: The EC has shrugged off the input to the consultation on the role of publishers in the copyright value chain and on the ‘panorama exception’; which gave clear indications of what Europeans wanted (results). Instead, the EC (1) proposes an EU-wide retroactive ancillary copyright lasting 20 years, and (2) ignored freedom of panorama, save for a footnote in the Impact Assessment.
The EC claims it listens to the concerns of citizens and takes them into account.
Why not on copyright?


More Detailed Overview

Topic Subtopic C4C’s (C4C) Position
Missing elements

C4C regrets that the Commission does not seem to be considering the following elements:

  • updates to the other exceptions (many of which are drafted in obsolete terms) and also making them mandatory (Recital 5);
  • an exception for freedom of panorama;
  • an exception for remote access to library catalogues; and,
  • the introduction of a flexible norm complementing the list of exceptions.
Measures to achieve a well-functioning marketplace for copyright Rights in publications – Protection of news publications concerning online uses (i.e. ‘ancillary copyright’)

C4C is deeply worried about the Commission moving forward with the introduction of ancillary copyright at EU level (Article 11 – Recitals 31-35). We have concerns regarding the underlying logic of such an approach where, against a perceived failure from a commercial nature, the proposed remedy is one that creates new rights under the ‘copyright’ umbrella as opposed to a more ‘ex post’ approach. See our infographic.

In short:

  • The right itself should not be introduced, as it does not deliver positive results (see its failure in Spain and Germany);
  • Article 11 §4: A retroactive right for 20 years on news items is simply absurd and disproportionate in light of the economic reality;
  • Considering that hypothetically throwing more money at news publishers will improve journalistic quality seems a bit of a shortcut at best; and,
  • Making new companies subsidise an old business model is not known to be an incentive for the traditional players to adapt to the new digital realities.
Rights in publications – Claims to fair compensation

C4C has reservations about the Commission’s reasoning that publishers should be able to claim a share of the compensation for uses under exceptions (Article 12 – Recital 36).

In short:

  • this seems to contradict the judgement of the Court of Justice of the European Union (CJEU) in the Reprobel case (C-572/13). The CJEU confirmed that the rationale of the fair compensation requirement is intended to compensate for the harm suffered by right holders, and concluded that publishers are not subject to any harm by, in this case, the reprography and private copying exception.
  • As a result, this provision does not create benefits for creators, which are ‘the forgotten’ stakeholders in this review (except for minimal contractual safeguards, left at the mercy of Member States).
Certain uses of protected content by online services

C4C considers that the Commission’s intentions in this area go beyond the scope of a copyright review, as they fundamentally affect both the e-Commerce Directive and the IPR Enforcement Directive (Article 13 – Recitals 37-39).

In short:

  • Article 13 & Recital 38: The text considerably expands the definition of communication to the public to any act of uploading and sharing through online service providers; and,
  • C4C fears that the Commission intentions will force all intermediaries dealing with user uploaded content, including cloud services, Wikimedia, etc., to be compelled to:
  1. negotiate licences with right holders; and,
  2. implement content filtering technologies. We wonder how this can be achieved without a reform of Article 3 of the InfoSoc Directive (see here), Articles 14 and 15 of the e-Commerce Directive, and Article 3 of the IPR Enforcement Directive.

We encourage you to read this analysis by Martin Husovec from Tilburg University, and see also our infographic.

Measures to adapt exceptions and limitations to the digital and cross-border Text and data mining (TDM)

Although having an exception on text and data mining is positive, some elements are worrisome (Article 3 – Recitals 8-13).

In short:

  1. Article 3 §1 & Recital 11: the beneficiaries of a text and data mining exception should not be limited to ‘research organisations’, to avoid crippling any opportunities for start-ups and individual researchers in this area; and,
  2. Article 3 §3 & Recital 12: allowing academic publishers to introduce random measures to protect the ‘security and integrity’ of their network could allow them to arbitrarily block access for researchers trying to conduct text and data mining. Safeguards in line with those put in place in the context of ‘traffic management’ by telecom operators could be considered (see Article 3 § 3 of the Telecoms Single Market Regulation [EU 2015/2120]).

C4C furthermore welcomes the Commission’s intention to make this a mandatory exception and to not limit it to non-commercial uses only. See our infographic.

Use of works and other subject-matters in digital and cross-border teaching activities C4C welcomes a mandatory exception in this area (Article 4 – Recitals 14-17), but worries about the Commission’s plan to allow Member States to ignore and by-pass this exception through licensing schemes (Article 4 §2).
Preservation of cultural heritage C4C considers that the Commission’s intention to update the exception on preservation of cultural heritage (Article 5 – Recitals 18-22) is not going beyond what was already decided by the Court of Justice of the European Union in the Ulmer case (C-117/13). Furthermore, the Commission seems to only enable preservation of objects permanently in the collection. This could create interpretation issues as regard online material and does not recognise the collaboration efforts between cultural heritage institutions to share artworks to ensure an as wide as possible public can enjoy it. We do applaud the fact that the Commission wants to make this a mandatory exception.
Fair remuneration in contracts of authors and performers Fair remuneration in contracts of authors and performers

C4C applauds that the Commission steps up to ensure more transparency and appropriate remuneration for creators (Title IV Chapter 3 – Recitals 40-43).

However, this needs to be ensured throughout the whole value chain in the various creative industries. We stress the need to focus on the whole of the value chain, because the Commission has focused on the so-called ‘value-gap’ (Recitals 37-39) in reference to online services, without acknowledging that creators often do not get a fair deal form their recording companies or publishing house in the first place (see here).

Measures to improve licensing practices and ensure wider access to content Use of out-of-commerce works by cultural heritage institutions

C4C welcomes the fact that Commission considers collective agreements for digitisation and dissemination of out of commerce works (Article 7-9 – Recitals 23-28). The Commission’s intention seems to model this on the Scandinavian “Extended Collective Licensing” (ECL) scheme, allowing collecting societies to assign non-exclusive licenses for non-commercial use of out of commerce works, even for non-members. This would enable works to be shared and accessed across the EU.


Other resources