Skip to main content
Version: 1.12.x

Collector Plugins

Collectors are core HopprPlugin subclasses used to retrieve artifacts from wherever they are maintained (e.g. a URL, or a specialized repository), and place them in a directory structure. That directory structure can be used by other plugins (in particular, Bundlers) to create bundles that can be transferred to isolated networks. Hoppr supports a number of basic PURL collectors out of the box and we encourage you to open source new collectors.

Classes​

Collector

BaseCollectorPlugin Class​

An SBOM does not always specify the source repositories from which components are to be copied. The repositories to search for components can also be specified by supplying manifest configuration file(s). Often there will be more than one repository for a given PURL type. Therefore, a collector plugin is expected to either:

  • Search through all appropriate repositories until the requested component is found, or
  • Prior to collection, temporarily configure the underlying CLI tool (skopeo, yum, dnf, helm, etc.) to only search for requested components in these repositories

The BaseCollectorPlugin class contains functionality that is common to both of these scenarios. Implementation of these scenarios is handled by its subclasses: SerialCollectorPlugin and BatchCollectorPlugin respectively.

We strongly recommend that any plugin class intended to act as a collector inherit from either the SerialCollectorPlugin or BatchCollectorPlugin class (in the hoppr.base_plugins.collector module), which overrides the HopprPlugin class process_component as a @final method. Subclasses of SerialCollectorPlugin or BatchCollectorPlugin should not override process_component.

All collectors should update the BOM with parameters identifying how the collection was made. The paramaters that must be updated are:

  • hoppr:collection:directory
  • hoppr:collection:plugin
  • hoppr:collection:repository
  • hoppr:collection:timetag

The set_collection_params method in the Collector base class is an easy way to accomplish this. After making these changes, the updated component must be returned as part of the Result for the process.

Note that both SerialCollectorPlugin and BatchCollectorPlugin fail any collection that does not properly perform these updates.

SerialCollectorPlugin Class​

This class provides the base logic to search each repository listed in the manifest for each component listed in the SBOM.

The process_component method of the SerialCollectorPlugin class:

  • Loops through each applicable repository and finds any credentials necessary for access.
  • Calls the abstract method collect for the specified component, repository, and credentials.
  • If the collect method succeeds, processing exits without checking any more repositories.
  • If no repository returns a successful collect response, the last failed response is returned to the caller.

BatchCollectorPlugin Class​

This class does not loop over the repositories listed in the manifest. It contains a config_file class attribute that can be used to define a temporary location to write a configuration file for the underlying CLI tool. Collector plugins inheriting from this class should override the pre_stage_process method inherited from the HopprPlugin class to pre-process the list of repositories in the manifest and generate a temporary configuration file.

The process_component method of the BatchCollectorPlugin class:

  • Calls the abstract method collect for the specified component, repository, and credentials.
  • If the collect method succeeds, processing exits without checking any more repositories.
  • If no repository returns a successful collect response, the last failed response is returned to the caller.

Sample Collector Plugin Implementation​

import requests

from requests.auth import HTTPBasicAuth
from typing import Any

from hoppr.base_plugins.hoppr import hoppr_rerunner
from hoppr.base_plugins.collector import SerialCollectorPlugin
from hoppr.result import Result
from hoppr.context import Context

class MyCollectorPlugin(SerialCollectorPlugin):

supported_purl_types = ["generic"] # This plugin will only run its process_component method for
# components with the generic purl type

def get_version(self): # Required
return "1.0.7"

@hoppr_rerunner
def collect(self, comp: Any, repo_url: str, creds: CredObject = None) -> Result:
authentication = None
if creds is not None:
authentication = HTTPBasicAuth(creds.userid, creds.password)

response = requests.get(repo_url), auth = authentication)
result = Result.from_http_response(response) # Result class convenience method
if result.is_success():
self.set_collection_params(comp, repo_url, "")
result = Result.success(result.message, return_obj=comp)

return result

This collector retrieves an artifact directly from the repo_url. This is not realistic. To be useful, there would need to be at least a few changes:

  • It would need to specify a path or query paramter on the URL, likely based on the component PURL.
  • It does not save the downloaded file in the directory structure specified in the context, and the relative directory passed to set_collection_params is empty.

The Result returned is based on the request response, using a convenience function that sets the response type based on the HTTP response code (< 300: SUCCESS; 300-499: FAIL; > 500: RETRY).

Existing Collector Plugins​

To see the current implemented collectors see the `collect_`` files