Collections within collections

4. Collections within collections#

On this page

Learn about the different ways Trove creates and represents collections of resources.

4.1. A collection of collections#

Trove is a collection of collections. Some of those collections are harvested into Trove from collaborating organisations. Other collections, such as the digitised newspapers, are a product of the NLA’s own processing pipelines. Within these groups, resources are linked together in a variety of different ways. For example, a newspaper article is part of an issue, a periodical issue is part of a title, a photograph might be part of an album, a book might be part of a series, and a letter might be part of a manuscript collection. These sorts of relationships can help you navigate through the layers of collections, moving from whole to part and back again.

Unfortunately, Trove doesn’t have a standard way of representing these sorts of parent/child relationships. Instead, a variety of metadata fields, facets, hierarchical structures, and interfaces group things in inconsistent and sometimes confusing ways. This means that collections are often fragmented in Trove search results, making it hard to see patterns and connections.

4.2. Trove’s categories are not collections!#

Trove’s top-level categories look like collections, but beware! While some categories, like Newspapers & Gazettes, have clear cut boundaries, others don’t. As I’ve suggested in the section on categories and zones, it’s best to think of the categories as contexts for discovery rather than collections.

4.3. Works and versions can hide collections#

Trove creates many thousands of little collections by trying to group different versions of the same thing together as ‘works’. This grouping operates on a couple of levels and results in a hierarchical structure where works contain versions, and versions contain the original metadata records. However, it’s not always successful. As well as obvious grouping errors, works sometimes cram different members of a collection together, as if they’re actually the same thing. Even more concerning, it seems that works and versions have sometimes been used deliberately to describe collections of digitised resources. The problem with this is that the members of these collections are difficult to find and use as their individual metadata has either not been recorded or has been munged together as a single ‘work’. In effect, work groupings hide collections. This is one of the reasons why I recommend unpacking all the versions from works and saving them individually when you’re harvesting data.

4.4. Finding collections by contributor#

You can explore the different collections harvested into Trove by filtering your search results according to the source of the metadata. Organisations contributing data to Trove are assigned a National Union Catalogue (NUC) identifier. You can find an organisation’s NUC by searching the Australian Libraries Gateway, or by looking in the dropdown list of ‘Organisations’ in Trove’s advanced search form (the NUC is in square brackets).

Advanced search for NUCs is broken!

While you can use Trove’s advanced search to find a NUC, you can’t reliably use it to search for items using that NUC. There’s a longstanding bug that means if you select an organisation whose NUC includes a colon you’ll get no results. You can fix the broken url by simply putting double quotes around the NUC value, or use one of the methods below.

Once you have a NUC you can find records from that organisation by using either the nuc: index or the partnerNuc facet – as far as I can tell the results are the same. For example, to find records from the ANU’s institutional repository you’d search for nuc:"ANU:IR" (note the double quotes around the NUC value).

4.5. `isPartOf` relationships#

Some parent/child relationships in Trove are documented using the Dublin Core isPartOf metadata field. This field can appear in records aggregated into Trove from other organisations, as well as in records of digitised resources created by the NLA itself. In the web interface, isPartOf values can be displayed under a variety of headings, including ‘Appears in’, ‘Part of’, and ‘Series’. Here’s an example linking an individual oral history interview to an oral history project:

../_images/part-of-example.png — Fig. 4.2 Example of the way `isPartOf` values are displayed in the Trove web interface#

Here’s how the same record appears in the API:

"isPartOf": [
    {
        "value": "Australian women scientists oral history project [sound recording].",
        "type": "publication"
    },
    {
        "value": "Australian women scientists oral history project.",
        "type": "series"
    }
],

The value of isPartOf is a text string rather than an identifier, so the connections can be a bit fuzzy. Also, a record can have multiple isPartOf values linking it to different points in a collection hierarchy – a record might be linked both to its direct parent and the top-level collection record. This means you can’t reliably reconstruct a collection from the isPartOf values alone. Nonetheless, they can be useful in finding groupings of related resources.

As you can see in the API example above, isPartOf values can be qualified by supplying a type. The most common types seem to be series and publication. The Trove Data Dictionary doesn’t explain how the type qualifiers are meant to be used, but it seems that series generally denotes a collection, while publication is a more restricted grouping, such as the title of a parent publication.

To limit your search results to items with a particular isPartOf value, you can use either the series: index or the contribcollection facet, however, they behave slightly differently.

Search the `isPartOf` field using the `series` index#

You can use the series index to search isPartOf values. It seems to only match records where the isPartOf type is series, but you can it’s much more flexible than the contribcollection facet as it accepts partial matches and is case-insensitive. For example, a search for series:"Australian women scientists oral history project" returns 25 results – no need to worry about trailing full stops!

Because it accepts partial matches, you can use the series index to search for items from collections that include certain keywords. For example, there’s a lots of separate ephemera collections from the NLA in Trove, to find items from all of them you can search for series:ephemera.

This lets you poke around to see what collections are available, rather than having to know the isPartOf value in advance.

Harvesting all `isPartOf` values#

Your ability to find the range of available isPartOf values using standard search queries is limited. Using the API you can set facet to contribcollection to get a list of the 100 most common isPartOf values related to your search. For example, to find a list of the NLA’s oral history collections you can search for nuc:ANL in the music category, set the format facet to Sound/Interview, lecture, talk, and facet to contribcollection.

display	count
Hazel de Berg collection	1268
National Press Club luncheon address	906
Rob and Olya Willis folklore collection	761
Australia 1938 oral history project	588
Cultural context of unemployment oral history project	488
National Press Club luncheon address.	442
Menzies MS 4936 collection	347
Bringing them home oral history project	336
Australian generations oral history project	297
Chris Sullivan folklore collection	266

To get a complete list of collections you’d need to harvest the isPartOf values from the full results set using the API. A method for doing this is described in HOW TO: Harvest data relating to digitised resources.

4.6. Digitised collections#

Resources digitised by the NLA and delivered through Trove are sometimes grouped into collections. In the Digitised Newspapers & Gazettes category, relationships exist between the different parts of a newspaper – titles, issues, pages, and articles – enabling you move from one level to another, and to access and aggregate data from the different components. For example, you can find articles published on a particular page.

These sorts of relationships are not as clearly defined for other types of digitised resources, and little data about them is directly available from the Trove API. The Other digitised resources section describes in detail the issues and possible workarounds for different content types.

4.7. Finding aids#

Finding aids have their own in-built hierarchical structure of collections, series, sub-series, and items. They’re mostly used in describing manuscript collections, but some of the NLA’s photograph and ephemera collections are also described using finding aids.

Finding aids are created using the Encoded Archival Description (EAD) standard. The original EAD hierarchy is presented as a nested list in Trove. Links from the list go to the digitised item viewer.

Collections within collections

Contents

4. Collections within collections#

4.1. A collection of collections#

4.2. Trove’s categories are not collections!#

4.3. Works and versions can hide collections#

4.4. Finding collections by contributor#

4.5. `isPartOf` relationships#

Use the `contribcollection` facet to filter searches#

Search the `isPartOf` field using the `series` index#

Harvesting all `isPartOf` values#

4.6. Digitised collections#

4.7. Finding aids#

Collections within collections

Contents

4. Collections within collections#

4.1. A collection of collections#

4.2. Trove’s categories are not collections!#

4.3. Works and versions can hide collections#

4.4. Finding collections by contributor#

4.5. isPartOf relationships#

Use the contribcollection facet to filter searches#

Search the isPartOf field using the series index#

Harvesting all isPartOf values#

4.6. Digitised collections#

4.7. Finding aids#

4.5. `isPartOf` relationships#

Use the `contribcollection` facet to filter searches#

Search the `isPartOf` field using the `series` index#

Harvesting all `isPartOf` values#