15.4. Accessing data about newspaper & gazette titles#

What’s a title?#

‘Titles’ in this context refers to the names and details of the publications whose articles are digitised in Trove’s Newspapers & Gazette’s category. For example: Canberra Times, Sydney Morning Herald, or Commonwealth of Australia Gazette.

Title metadata#

Get a list of newspaper & gazette titles#

You can get information about newspaper and gazette titles in Trove from these API endpoints:

  • newspaper/titlesTry it!

  • gazette/titlesTry it!

The data isn’t paginated, so you get all the titles at once. Here’s a basic example showing how to get a list of all the titles from the newspaper/titles endpoint.

import requests

# Set encoding parameter to JSON
params = {"encoding": "json"}

# Supply API key using headers
headers = {"X-API-KEY": YOUR_API_KEY}

# Make the request
response = requests.get(
    "https://api.trove.nla.gov.au/v3/newspaper/titles", params=params, headers=headers
)

# Get the JSON data from the response
data = response.json()

# Get the list of newspapers
newspapers = data["newspaper"]

# Display the first title in the list
newspapers[0]
{'id': '166',
 'title': 'Canberra Community News (ACT : 1925 - 1927)',
 'state': 'ACT',
 'issn': '18388671',
 'troveUrl': 'https://nla.gov.au/nla.news-title166',
 'startDate': '1925-10-14',
 'endDate': '1927-12-16'}

How many newspaper titles are there?#

The responses you get back from the newspaper/titles or gazette/titles endpoints includes a total value that tells you the number of titles matching your request. Reusing the data from the request above, we can get the total number of newspaper titles like this:

data["total"]
1792

Get a list of newspaper titles from a particular state#

You can filter the list of titles by adding the state parameter. Possible values for state are:

  • nswTry it!

  • actTry it!

  • qldTry it!

  • tasTry it!

  • saTry it!

  • ntTry it!

  • waTry it!

  • vicTry it!

  • nationalTry it!

  • internationalTry it!

Here’s an example showing how to get only newspapers published in Victoria.

import requests

# Add the state parameter and set it to 'vic' to get titles from Victoria
params = {"encoding": "json", "state": "vic"}

# Supply API key using headers
headers = {"X-API-KEY": YOUR_API_KEY}

response = requests.get(
    "https://api.trove.nla.gov.au/v3/newspaper/titles", params=params, headers=headers
)

data = response.json()
newspapers = data["newspaper"]

# Display the first title in the list
newspapers[0]
{'id': '295',
 'title': 'Advertiser (Footscray, Vic. : 1914 - 1918)',
 'state': 'Victoria',
 'issn': '22000941',
 'troveUrl': 'https://nla.gov.au/nla.news-title295',
 'startDate': '1914-01-10',
 'endDate': '1918-12-21'}

Get details of a single newspaper or gazette title#

To retrieve information about an individual title, use the newspaper/title or gazette/title endpoints with a title identifier. To construct the request url, add the title’s numeric identifier to the endpoint:

https://api.trove.nla.gov.au/v3/newspaper/title/[TITLE ID].

For example, to request metadata about the Canberra Times you’d use:

https://api.trove.nla.gov.au/v3/newspaper/title/11

Try it!

Here’s how you’d retrieve metadata describing the Canberra Times:

import requests

# Numeric id of the title you want
title_id = "11"

request_url = f"https://api.trove.nla.gov.au/v3/newspaper/title/{title_id}"

# Add the state parameter and set it to 'vic' to get titles from Victoria
params = {"encoding": "json"}

# Supply API key using headers
headers = {"X-API-KEY": YOUR_API_KEY}

# Make the API request
response = requests.get(request_url, params=params, headers=headers)

# Extract the JSON data
data = response.json()

data
{'id': '11',
 'title': 'The Canberra Times (ACT : 1926 - 1995)',
 'state': 'ACT',
 'issn': '01576925',
 'troveUrl': 'https://nla.gov.au/nla.news-title11',
 'startDate': '1926-09-03',
 'endDate': '1995-12-31'}

You can use the newspaper/title and gazette/title endpoints to get information on what issues of a particular newspaper are available on Trove. By setting the include parameter to years, you get the total number of issues per year.

Try it!

If you want more information on individual issues you need to set the range parameter to a specific date range.

Aggregate search results by title using the l-title facet#

You can also explore the characteristics of newspaper titles in Trove by using the API’s /result endpoint with category set to newspaper, and l-title set to the numeric identifier of a title. For example, to find out how many digitised articles from the Canberra Times are available on Trove, you can just make an API request without any search terms:

import requests

# Set the `l-title` parameter to a title's numeric id
params = {"category": "newspaper", "l-title": "11", "encoding": "json"}

# Supply API key using headers
headers = {"X-API-KEY": YOUR_API_KEY}

# Make the API request
response = requests.get(
    "https://api.trove.nla.gov.au/v3/result", params=params, headers=headers
)

# Extract the JSON data
data = response.json()

# Find and display the total number of articles
total = data["category"][0]["records"]["total"]

print(f"There are {total:,} articles from the Canberra Times in Trove.")
There are 3,265,867 articles from the Canberra Times in Trove.

You can use facets such as decade, year, category, and illustrationType to examine other characteristics of an individual title.

The GLAM Workbench notebook Visualise Trove newspaper searches over time shows how you can use the decade and year facets with l-title to explore changes in a title over time, and even compare the content of different titles. This chart shows the number of articles containing the term ‘worker’ in three different newspapers, the Tribune, the Sydney Morning Herald, and the Sydney Sun.

../../_images/compare-title-queries.png

Fig. 15.13 The raw number and proportion of articles containing the term ‘worker’ by year in the Tribune, Sydney Morning Herald, and Sydney Sun#

Find catalogue entries for newspaper titles#

==Update this section once search and books sections are done==

  • use ISSNs to search in “Books & Libraries”

  • note that the issn field in API records doesn’t always contain ISSNs

  • search for format:Periodical/Newspaper, add filters such as “nla.gov.au/nla.news” and “trove.nla.gov.au”, weed out journals and eDeposit (how many are there?)

Title text#

With the exception of some Government Gazettes which are available as bulk downloads, there’s no direct way of accessing all the text of a title. You’d need to use the /result endpoint to assemble a collection of articles and then aggregate the OCRd text from the individual articles. This could be done issue by issue, or by setting the l-title facet without a search query, and then harvesting the complete result set.

Depending on the title, this could take a significant amount of time and generate a large amount of data. You might want to use the Trove Newspaper & Gazette Harvester for a job like this.

Images and PDFs from titles#

There’s no direct method for downloading all the images or PDFs from a newspaper title in Trove. However, there are methods for getting issues as PDFs and assembling a collection of front page images.

Downloading every issues as a PDF#

If you have an issue’s identifier you can download it as a PDF. You can get a complete list of issue identifiers for a title from the /newspaper/title endpoint. So it’s possible to work through all the issue identifiers to download every issue of a title as a PDF. This method is documented in the GLAM Workbench notebook Harvest the issues of a newspaper as PDFs.

Downloading pages as images or PDFs#

To download pages from a newspaper in Trove, you’d need to assemble a collection of page identifiers. If all you want are the front pages of a newspaper, you can obtain the page identifiers from the issue metadata.

If you want more pages, you’d could try using the /result endpoint with the l-title facet to download the metadata from every article in a newspaper. You’d also need to set the reclevel parameter to full to include the page identifiers in the article records. You could then extract the page identifiers from the article records and remove any duplicates. However, there’s no guarantee that this method will find every page as it depends on how the articles are indexed.

Once you have assembled a collection of page identifiers you can download the pages as images or PDFs.