A new, comprehensive database of all proceedings of the Australian Parliamentary Debates (1998-2022)

From MaRDI portal
(Redirected from Dataset:6685049)



DOI10.5281/zenodo.8121950Zenodo8121950MaRDI QIDQ6685049FDOQ6685049

Dataset published at Zenodo repository.

Rohan Alexander, Lindsay Katz

Publication date: 6 July 2023

Copyright license: Creative Commons Attribution 4.0 International



This database contains data on the proceedings from each sitting day in the Australian Parliament by the House of Representatives from 02 March 1998 to 08 September 2022, in both CSV and parquet forms. These data were parsed entirely from the XML Hansard transcripts available on the Australian Parliamentwebsite. The database is organized as follows: hansard-daily-csv.zipcontains all individual Hansard sitting day files in CSV form. hansard-daily-parquet.zipcontains all individual Hansard sitting day files in parquet form. hansard-corpus.zipcontains the full Hansard corpus in CSV form and in parquet form. hansard-code.zipcontains all the R files we used to build our database, and any necessary CSV files to run those R scripts. TheREADME.mdfile in this folder contains a detailed description of each script, outlines our workflow, and provides some example R code for users of our database. hansard-supplementary-data.zipcontains data on Hansard debate topics, and data on divisions in the House that were transcribed during our time frame. This folder also contains the CSV we used to correctly map PartyFacts IDs to the party abbreviations found in our database.







This page was built for dataset: A new, comprehensive database of all proceedings of the Australian Parliamentary Debates (1998-2022)