Mining of Political Texts



In representative democracies, people and states are represented at the legislative level by a parliament and at the executive level by an executive organ. At the European Union (EU), the parliament consists of the European Parliament (751 representatives, representing the people, abbreviated EP) and the Council of the European Union (28 representatives, representing the member states, also called Council of Ministers, abbreviated CEU). The executive organ is called the European Commission (28 representatives, one per member state, abbreviated EC).

The ordinary legislative procedure is a co-decision procedure between the EP and the CEU. 89% of the European laws are designed that way. The process works as follows. The EC drafts a proposal about a new law and sends it to the EP’s corresponding committee (if the law is about cars, it goes to the Transportation committee). A rapporteur is elected and is responsible to draft a report on the proposal. Her task is to analyse the project, consult with specialists in the particular eld, and discuss with parliamentarians (MEPs) in the committee. These MEPs can then propose amendments (add, remove or modify the text) that are voted within the committee.

Next, the report (containing the amended text) is voted in plenary session and submitted to the CEU, which in turn can amend it. If the two bodies do not agree, this process iterates (for at most three times) until a final text is found and is eventually proposed for adoption.

Data set

We collected a data set consisting of 240,000 legislative amendments proposed by the 751 parliamentarians. Each amendment is composed of small edits proposed by European parliamentarians. From there, we can extract whether an edit is accepted or rejected, its history, and which other edits it conflicts with. This provides a rich and unique data set, that would be useful to computer scientists, political scientists, journalists, or any citizen interested by the activity of his/her representative.


You will have to

  • scrape data from the Web,
  • write Python scripts to extract textual information,
  • explore the data using visualization methods, and
  • postulate and validate models of this data.


  • Very good knowledge of Python and related libraries.
  • Good knowledge of graph analytics.
  • Good knowledge of machine learning and statistical modelling.
  • Knowledge of web stack (HTML/CSS/JavaScript) a big plus.
  • Famililarity of regular expressions a plus.
  • Interests and familiarity in politics a plus.

Applying for the project

This semester project is aimed at one Bachelor's or Master's student. For applying please send your grades and CV to Victor Kristof.