Parse parliamentary proceedings / protocols as well as media library RSS feeds in various formats and transform a standardized format.
- Unit per file: 1 session (including agenda items & speeches)
- Data Fields & Mapping: https://docs.google.com/spreadsheets/d/1NglOO9Ss6797RmgY0bBJAlqfdc-LnV5rIJSC4ui30ps/edit?usp=sharing
- Directory Naming Convention: https://github.com/OpenParliamentTV/OpenParliamentTV-Architecture/blob/main/SHORTCODES.md
- Parla-CLARIN Github Repo: https://github.com/clarin-eric/parla-clarin/
- Parla-CLARIN TEI XML Schema: https://clarin-eric.github.io/parla-clarin/
- Parliamentary corpora in the CLARIN infrastructure: https://www.clarin.eu/resource-families/parliamentary-corpora#additional-materials
-
Proceedings & Media RSS Feed Examples: https://github.com/OpenParliamentTV/OpenParliamentTV-Parsers/tree/main/parliaments/DE/data/examples
-
Example of GermaParl TEI XML (not Parla-CLARIN): https://github.com/PolMine/GermaParlTEI
-
Open Discourse Project (Parsers etc. for German Bundestag Data):
-
Old German Bundestag Parsers & Code Snippets:
-
"Bundestag" Github Organisation (various scripts & tools): https://github.com/bundestag
-
Official Bundestag Open Data Resources: https://www.bundestag.de/services/opendata
-
Official Bundestag Video Podcast XML Feed (often this fires 503 errors; need to mirror data!): http://webtv.bundestag.de/player/macros/bttv/podcast/video/plenar.xml?period=19&meetingNumber=11