Skip to content

parleh-mate/singapore-parliament-speeches-dbt

Repository files navigation

Structured Datasets for Singapore's Parliament Speeches

This project aims to make parliament speeches from Singapore's Parliament Hansard structured and accessible.

A structured format is an enabler. There are applications in computational linguistic analysis, classification, and political science (Dritsa et. al., 2022). Further empirical research on parliamentary discourse and its wider societal impact in recent times is ever more important, given the decisive role of parlimanets and their rapidly changing relations with the public and media (Erjavec et. al., 2023).

This effort addresses the lack of a centralised dataset for Singapore's parliamentary data analysis. Rauh et. al. (2022) observed that while more and more political text is available online in principle, bringing the various, often only rather loosely structured sources into a machine-readable format that is readily amenable to automated analysis still presents a major hurdle. Therefore, this initiative seeks to overcome that hurdle.

Disclaimer

Please note that this is an entirely independent effort, and this initiative is by no means affiliated with the Singapore Parliament nor Singapore Government.

While best efforts are made to ensure the information is accurate, there may be inevitable parsing errors. Please use the information here with caution and check the underlying data.

This repository

This repository contains code for the data modelling which performs downstream modelling from the raw data which was generated from the earlier data pipeline.

Please refer to the dbt Documentation, which contains information on the columns available and their descriptions. This was created with the help of this article.

The main data product(s) intended for use is

model description
mart_attendance By member, by sitting date, whether the member attended the parliamentary sitting or not. This is supplemented with information about the member and sitting.
mart_speeches Each row represents one paragraph of text, based on the hansard, during the parliamentary sitting. This text corresponds to a speech (or part of a speech) made by a Member of Parliament on a given topic. This is supplemented with information about the topic, the sitting, and the member.
mart_bills By bill, shows a summary of the bill's passage through parliament.

An example of how this dataset is being used is in this Looker Studio dashboard to show overall attendance, bills, and members' speech information..

The services used in this repository are:

How to contribute

If you are interested to contribute, please reach out to [email protected].

References

  • Dritsa, K., Thoma, A., Pavlopoulos, I., & Louridas, P. (2022). A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis. Advances in Neural Information Processing Systems, 35, 28874-28888.
  • Erjavec, T., Ogrodniczuk, M., Osenova, P., Ljubešić, N., Simov, K., Pančur, A., ... & Fišer, D. (2023). The ParlaMint corpora of parliamentary proceedings. Language resources and evaluation, 57(1), 415-448.
  • Rauh, C., & Schwalbach, J. (2020). The ParlSpeech V2 data set: Full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies.

About

dbt Modelling Repository for Singapore Parliament Speeches

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages