4. Collections within collections#

4.1. A collection of collections#

Trove is a collection of collections. Some of those collections are harvested into Trove from collaborating organisations. Other collections, such as the digitised newspapers, are a product of the NLA’s own processing pipelines. Within these groups, resources are linked together in a variety of different ways. For example, a newspaper article is part of an issue, a periodical issue is part of a title, a photograph might be part of an album, a book might be part of a series, and a letter might be part of a manuscript collection. These sorts of relationships can help you navigate through the layers of collections, moving from whole to part and back again.

Unfortunately, Trove doesn’t have a standard way of representing these sorts of parent/child relationships. Instead, a variety of metadata fields, facets, hierarchical structures, and interfaces group things in inconsistent and sometimes confusing ways. This means that collections are often fragmented in Trove search results, making it hard to see patterns and connections.

4.2. Trove’s categories are not collections!#

Trove’s top-level categories look like collections, but beware! While some categories, like Newspapers & Gazettes, have clear cut boundaries, others don’t. As I’ve suggested in the section on categories and zones, it’s best to think of the categories as contexts for discovery rather than collections.

4.3. Works and versions can hide collections#

Trove creates many thousands of little collections by trying to group different versions of the same thing together as ‘works’. This grouping operates on a couple of levels and results in a hierarchical structure where works contain versions, and versions contain the original metadata records. However, it’s not always successful. As well as obvious grouping errors, works sometimes cram different members of a collection together, as if they’re actually the same thing. Even more concerning, it seems that works and versions have sometimes been used deliberately to describe collections of digitised resources. The problem with this is that the members of these collections are difficult to find and use as their individual metadata has either not been recorded or has been munged together as a single ‘work’. In effect, work groupings hide collections. This is one of the reasons why I recommend unpacking all the versions from works and saving them individually when you’re harvesting data.

4.4. Finding collections by contributor#

You can explore the different collections harvested into Trove by filtering your search results according to the source of the metadata. Organisations contributing data to Trove are assigned a National Union Catalogue (NUC) identifier. You can find an organisation’s NUC by searching the Australian Libraries Gateway, or by looking in the dropdown list of ‘Organisations’ in Trove’s advanced search form (the NUC is in square brackets).

../_images/nuc-dropdown.png

Fig. 4.1 NUC identifiers are displayed in square brackets#

Advanced search for NUCs is broken!

While you can use Trove’s advanced search to find a NUC, you can’t reliably use it to search for items using that NUC. There’s a longstanding bug that means if you select an organisation whose NUC includes a colon you’ll get no results. You can fix the broken url by simply putting double quotes around the NUC value, or use one of the methods below.

Once you have a NUC you can find records from that organisation by using either the nuc: index or the partnerNuc facet – as far as I can tell the results are the same. For example, to find records from the ANU’s institutional repository you’d search for nuc:"ANU:IR" (note the double quotes around the NUC value).

Try it!

4.5. isPartOf relationships#

Some parent/child relationships in Trove are documented using the Dublin Core isPartOf metadata field. This field can appear in records aggregated into Trove from other organisations, as well as in records of digitised resources created by the NLA itself. In the web interface, isPartOf values can be displayed under a variety of headings, including ‘Appears in’, ‘Part of’, and ‘Series’. Here’s an example linking an individual oral history interview to an oral history project:

../_images/part-of-example.png

Fig. 4.2 Example of the way isPartOf values are displayed in the Trove web interface#

Here’s how the same record appears in the API:

"isPartOf": [
    {
        "value": "Australian women scientists oral history project [sound recording].",
        "type": "publication"
    },
    {
        "value": "Australian women scientists oral history project.",
        "type": "series"
    }
],

Try it!

The value of isPartOf is a text string rather than an identifier, so the connections can be a bit fuzzy. Also, a record can have multiple isPartOf values linking it to different points in a collection hierarchy – a record might be linked both to its direct parent and the top-level collection record. This means you can’t reliably reconstruct a collection from the isPartOf values alone. Nonetheless, they can be useful in finding groupings of related resources.

As you can see in the API example above, isPartOf values can be qualified by supplying a type. The most common types seem to be series and publication. The Trove Data Dictionary doesn’t explain how the type qualifiers are meant to be used, but it seems that series generally denotes a collection, while publication is a more restricted grouping, such as the title of a parent publication.

To limit your search results to items with a particular isPartOf value, you can use either the series: index or the contribcollection facet, however, they behave slightly differently.

Use the contribcollection facet to filter searches#

The contribcollection facet seems to match all isPartOf values, irrespective of their type. However, the matches must be exact, and are case-sensitive – searching by Annual report, Annual Report and Annual report. will all produce different results! For example, to find more oral histories from the ‘Australian women scientists oral history project’ you could add l-contribcollection=Australian women scientists oral history project. to your query, but it only returns two results.

Try it!

If you remove the trailing full stop from the collection name, you’ll find another 22!

Try it!

Using contribcollection in the web interface

The contribcollection facet is not displayed in the web interface, so you can’t just check a box to filter results. To use it you’ll need to manually edit the search url adding &l-contribcollection=[YOUR IS_PART_OF VALUE].

Inconsistencies in the metadata, such as trailing full stops, and the unforgiving nature of the contribcollection facet makes it tricky to use. You need to know the exact isPartOf value in advance, and be prepared to make multiple queries to capture any variations.

To complicate matters further, the title facet used in the Magazines & Newsletters category also works with values from the isPartOf field. It seems to return a subset of the results you get from contribcollection, perhaps because it’s only using isPartOf values which have type set to publication. It’s also case-sensitive, and expects exact matches, but it looks like trailing full stops have been removed.

Search the isPartOf field using the series index#

You can use the series index to search isPartOf values. It seems to only match records where the isPartOf type is series, but you can it’s much more flexible than the contribcollection facet as it accepts partial matches and is case-insensitive. For example, a search for series:"Australian women scientists oral history project" returns 25 results – no need to worry about trailing full stops!

Try it!

Because it accepts partial matches, you can use the series index to search for items from collections that include certain keywords. For example, there’s a lots of separate ephemera collections from the NLA in Trove, to find items from all of them you can search for series:ephemera.

Try it!

This lets you poke around to see what collections are available, rather than having to know the isPartOf value in advance.

Harvesting all isPartOf values#

Your ability to find the range of available isPartOf values using standard search queries is limited. Using the API you can set facet to contribcollection to get a list of the 100 most common isPartOf values related to your search. For example, to find a list of the NLA’s oral history collections you can search for nuc:ANL in the music category, set the format facet to Sound/Interview, lecture, talk, and facet to contribcollection.

Hide code cell source
import requests
import pandas as pd

params = {
    "q": "nuc:ANL",
    "category": "music",
    "l-format": "Sound/Interview, lecture, talk",
    "facet": "contribcollection",
    "encoding": "json",
    "n": 0
}

headers = {"X-API-KEY": YOUR_API_KEY}

response = requests.get(
        "https://api.trove.nla.gov.au/v3/result", params=params, headers=headers
    )
data = response.json()

facets = data["category"][0]["facets"]["facet"][0]["term"]

# Display the top ten values
df = pd.DataFrame(facets)
df[["display", "count"]][:10].style.hide()
display count
Hazel de Berg collection 1268
National Press Club luncheon address 906
Rob and Olya Willis folklore collection 761
Australia 1938 oral history project 588
Cultural context of unemployment oral history project 488
National Press Club luncheon address. 442
Menzies MS 4936 collection 347
Bringing them home oral history project 336
Australian generations oral history project 297
Chris Sullivan folklore collection 266

To get a complete list of collections you’d need to harvest the isPartOf values from the full results set using the API. A method for doing this is described in HOW TO: Harvest data relating to digitised resources.

4.6. Digitised collections#

Resources digitised by the NLA and delivered through Trove are sometimes grouped into collections. In the Digitised Newspapers & Gazettes category, relationships exist between the different parts of a newspaper – titles, issues, pages, and articles – enabling you move from one level to another, and to access and aggregate data from the different components. For example, you can find articles published on a particular page.

These sorts of relationships are not as clearly defined for other types of digitised resources, and little data about them is directly available from the Trove API. The Other digitised resources section describes in detail the issues and possible workarounds for different content types.

4.7. Finding aids#

Finding aids have their own in-built hierarchical structure of collections, series, sub-series, and items. They’re mostly used in describing manuscript collections, but some of the NLA’s photograph and ephemera collections are also described using finding aids.

Finding aids are created using the Encoded Archival Description (EAD) standard. The original EAD hierarchy is presented as a nested list in Trove. Links from the list go to the digitised item viewer.