19.1. Overview of Parliamentary Papers#

What are Parliamentary Papers?#

Parliamentary Papers are documents presented to the Australian Parliament. Sometimes this is required by law. Other times it’s just for information. The Parliament of Australia website notes:

Documents presented include the annual reports of all government agencies, reports of royal commissions and other government inquiries, parliamentary committee reports, and a wide variety of other material.

As well as Trove, Parliamentary Papers can be found through ParlInfo, Parliament’s own online database.

Here’s a few randomly selected examples:

thumbnail title contributor date fulltext_url
Australia's overseas development assistance program Australia. 1985 https://nla.gov.au/nla.obj-2498391464
Annual report Australia. Department of the Prime Minister and Cabinet. 89299 065e2a55-f786-5427-b492-5419b1c67d38 1979 https://nla.gov.au/nla.obj-843197239
PP no. 83 of 1993|The manner in which Commonwealth pharmaceutical restructuring measures are being implemented in the Anna Bay area of New South Wales|The manner in which Commonwealth pharmaceutical restructuring measures are being implemented in the Anna Bay area of New South Wales / report of the Senate Standing Committee on Community Affairs. Australia. Parliament. Senate. Standing Committee on Community Affairs|Australia. Parliament. Senate. Standing Committee on Community Affairs.|Australia. Parliament. Senate. Standing Committee on Community Affairs. 61551 6b54627c-b59a-5591-89f4-dce52334fd30|West, S. (Suzanne Margaret), 1947- 1993 https://nla.gov.au/nla.obj-2221037685

How many Parliamentary Papers are digitised in Trove?#

Many Commonwealth Parliamentary Papers have been digitised and made available through Trove. But, because of the way they’re arranged and described, it’s difficult to know exactly how many there are. I’ve attempted to harvest details of all the Parliamentary Papers in Trove using a combination of techniques. Based on this dataset, it seems there are currently 24,990 digitised Parliamentary Papers in Trove. Here are some more statistics from this dataset:

Hide code cell source
df = pd.read_csv(
    "https://github.com/GLAM-Workbench/trove-parliamentary-papers-data/raw/main/trove-parliamentary-papers.csv",
    keep_default_na=False,
)

stats = [
    ["Number of digitised Parliamentary Papers", df.shape[0]],
    [
        "Number of Parliamentary Papers with OCRd text",
        df.loc[df["text_file"].notnull()].shape[0],
    ],
    ["Total number of pages", df["pages"].sum()],
    ["Median number of pages per publication", df["pages"].median()],
]

stats_df = pd.DataFrame(stats)
stats_df.style.format(thousands=",", precision=0).hide().hide(axis=1).set_properties(
    **{"text-align": "left"}
)
Number of digitised Parliamentary Papers 24,990
Number of Parliamentary Papers with OCRd text 24,990
Total number of pages 2,448,522
Median number of pages per publication 60

Most of the Parliamentary Papers in Trove were published before 2013. If you search in ParlInfo for Parliamentary Papers published before 2013 the total number of results is 25,853 – close, but not exactly the same. There could be publications missing from Trove, or duplicates in the ParlInfo results.

When were the Parliamentary Papers published?#

The date metadata is not always accurate, but it seems good enough to explore the distribution of Trove’s Parliamentary Papers over time.

Hide code cell source
import altair as alt

df["year"] = df["date"].str.extract(r"\b(\d{4})$")
years = df["year"].value_counts().to_frame().reset_index()

chart_dates = (
    alt.Chart(years)
    .mark_bar(size=3)
    .encode(
        x="year:T", y="count:Q", tooltip=[alt.Tooltip("year:T", format="%Y"), "count:Q"]
    )
    .properties(width="container")
)

display(chart_dates)

Fig. 19.1 Publication dates of digitised Parliamentary Papers in Trove#

From the chart above it looks like the earliest Parliamentary Paper pre-dates the Commonwealth Parliament. What is it?

Hide code cell source
df["year"] = df["year"].astype("Int64")
earliest = df.loc[df["year"].idxmin()]
display(
    HTML(
        f"<a href='{earliest['fulltext_url']}'>{earliest['title']} / {earliest['sub_unit']}</a>"
    )
)

Titles and topics of Parliamentary Papers#

What are all these Parliamentary Papers about? You can use the title, subject, and contributor fields to explore their content.

Here, for example is a word cloud generated from the title field. There’s a lot of annual reports, and many of the titles include the abbreviation “PP”, so I’ve excluded the words “report”, “annual”, “PP”, and “AR”.

Hide code cell source
from wordcloud import STOPWORDS, WordCloud

# Add to the list of standard stopwords
stopwords = ["report", "annual", "pp", "AR"] + list(STOPWORDS)

titles = " ".join(df["title"].to_list())
wc = WordCloud(stopwords=stopwords, width=800, height=300)
wc.generate(titles).to_image()
../../_images/e35f61da6b9a26f8be157012656a0347e9cf61d7f5db08e236d339ce545cae05.png

The subject field contains a list of standard(ish) subject headings. Here’s the top twenty values:

Hide code cell source
import re


def split_and_clean(value):
    values = value.split("|")
    return list(
        set([re.sub(r"(\w)--(\w)", r"\1 -- \2", v).strip(".") for v in values if v])
    )


df["subject"] = df["subject"].apply(split_and_clean)

subjects = df["subject"].explode().to_frame()
# Remove trailing full stops
subjects["subject"] = subjects["subject"].str.strip(".")
subjects["subject"].value_counts().to_frame().reset_index()[:20].style.format(
    thousands=","
).hide()
subject count
Australia 6,522
Australian 6,485
Tariff -- Australia 1,575
Finance, Public -- Australia -- Accounting -- Periodicals 1,568
Administrative agencies -- Australia -- Auditing -- Periodicals 1,165
Finance, Public -- Australia -- Auditing 1,149
Finance, Public -- Auditing 1,139
Executive departments -- Australia -- Auditing -- Periodicals 1,135
Tariff Australia 1,115
Legislative auditing -- Australia -- Periodicals 1,106
Australia -- Appropriations and expenditures -- Periodicals 1,097
Federal issue 1,087
Public works -- Australia -- Periodicals 947
Australia. Parliament. Standing Committee on Public Works -- Periodicals 910
Public buildings -- Australia -- Periodicals 862
Industries -- Australia -- Periodicals 756
Finance, Public -- Australia -- Periodicals 754
Australia -- Industries -- Periodicals 678
Tariff -- Australia -- Periodicals 551
Periodicals 505

The name of the agency that created a particular publication can also give an indication of its content. Here are the top twenty contributing organisations:

Hide code cell source
def clean_contributor(value):
    if cleaned := re.search(r"(.*?) [0-9]+ [0-9a-z\-]+$", str(value)):
        return cleaned.group(1).strip(".")
    else:
        return str(value).strip(".")


contributors = df["contributor"].str.split("|").explode().to_frame()
contributors["cleaned name"] = contributors["contributor"]
contributors["cleaned name"] = contributors["contributor"].apply(clean_contributor)
contributors.dropna()["cleaned name"].value_counts().to_frame().reset_index()[
    :20
].style.format(thousands=",").hide()
cleaned name count
Australia. Tariff Board 3,799
Australia. Parliament 3,275
Australian National Audit Office 3,012
Australia. Parliament. Standing Committee on Public Works 2,041
1,564
Australia. Industries Assistance Commission 1,049
Australia. Parliament. Joint Committee of Public Accounts 820
Australia. Parliament. issuing body 787
Australia 417
Australia. Parliament. Senate. Committee of Privileges 388
Australia. Parliament. Joint Standing Committee on Treaties 341
Australia. Parliament. House of Representatives, issuing body 305
Australia. Parliament, 295
Australia. Royal Commission into Aboriginal Deaths in Custody 284
Australia. Inter-State Commission 276
Australia. Special Advisory Authority 240
Australia. Inter-state Commission 239
Australia. Treasury 233
Australia. Parliament. Senate. Standing Committee on Regulations and Ordinances 218
Australia. Parliament. The Senate, issuing body 212