Collector Plugins
Collectors are core HopprPlugin subclasses used to retrieve artifacts from wherever they are
maintained (e.g. a URL, or a specialized repository), and place them in a directory structure. That
directory structure can be used by other plugins (in particular, Bundlers) to create bundles that
can be transferred to isolated networks. Hoppr supports a number of basic PURL collectors out of
the box and we encourage you to open source new collectors.
Classes​

BaseCollectorPlugin Class​
An SBOM does not always specify the source repositories from which components are to be copied.
The repositories to search for components can also be specified by supplying manifest
configuration file(s). Often there will be more than one repository for a given PURL type.
Therefore, a collector plugin is expected to either:
- Search through all appropriate repositories until the requested component is found, or
- Prior to collection, temporarily configure the underlying CLI tool (skopeo, yum, dnf, helm, etc.) to only search for requested components in these repositories
The BaseCollectorPlugin class contains functionality that is common to both of these scenarios.
Implementation of these scenarios is handled by its subclasses:
SerialCollectorPlugin and
BatchCollectorPlugin respectively.
We strongly recommend that any plugin class intended to act as a collector inherit from either the
SerialCollectorPlugin or BatchCollectorPlugin class (in the hoppr.base_plugins.collector
module), which overrides the HopprPlugin class process_component as a @final method. Subclasses
of SerialCollectorPlugin or BatchCollectorPlugin should not override process_component.
All collectors should update the BOM with parameters identifying how the collection was made. The paramaters that must be updated are:
hoppr:collection:directoryhoppr:collection:pluginhoppr:collection:repositoryhoppr:collection:timetag
The set_collection_params method in the Collector base class is an easy way to accomplish this. After making these changes, the updated component must be returned as part of the Result for the process.
Note that both SerialCollectorPlugin and BatchCollectorPlugin fail any collection that does not properly perform these updates.
SerialCollectorPlugin Class​
This class provides the base logic to search each repository listed in the manifest for each component listed in the SBOM.
The process_component method of the SerialCollectorPlugin class:
- Loops through each applicable repository and finds any credentials necessary for access.
- Calls the abstract method
collectfor the specified component, repository, and credentials. - If the
collectmethod succeeds, processing exits without checking any more repositories. - If no repository returns a successful
collectresponse, the last failed response is returned to the caller.
BatchCollectorPlugin Class​
This class does not loop over the repositories listed in the manifest. It contains a config_file class attribute that can be used to
define a temporary location to write a configuration file for the underlying CLI tool. Collector plugins inheriting from this class
should override the pre_stage_process method inherited from the HopprPlugin class to pre-process the list of repositories in the
manifest and generate a temporary configuration file.
The process_component method of the BatchCollectorPlugin class:
- Calls the abstract method
collectfor the specified component, repository, and credentials. - If the
collectmethod succeeds, processing exits without checking any more repositories. - If no repository returns a successful
collectresponse, the last failed response is returned to the caller.
Sample Collector Plugin Implementation​
import requests
from requests.auth import HTTPBasicAuth
from typing import Any
from hoppr.base_plugins.hoppr import hoppr_rerunner
from hoppr.base_plugins.collector import SerialCollectorPlugin
from hoppr.result import Result
from hoppr.context import Context
class MyCollectorPlugin(SerialCollectorPlugin):
supported_purl_types = ["generic"] # This plugin will only run its process_component method for
# components with the generic purl type
def get_version(self): # Required
return "1.0.7"
@hoppr_rerunner
def collect(self, comp: Any, repo_url: str, creds: CredObject = None) -> Result:
authentication = None
if creds is not None:
authentication = HTTPBasicAuth(creds.userid, creds.password)
response = requests.get(repo_url), auth = authentication)
result = Result.from_http_response(response) # Result class convenience method
if result.is_success():
self.set_collection_params(comp, repo_url, "")
result = Result.success(result.message, return_obj=comp)
return result
This collector retrieves an artifact directly from the repo_url. This is not realistic. To be useful, there would need to be at least a few changes:
- It would need to specify a path or query paramter on the URL, likely based on the component PURL.
- It does not save the downloaded file in the directory structure specified in the context, and the relative directory passed to set_collection_params is empty.
The Result returned is based on the request response, using a convenience function that sets the response type based on the HTTP response code (< 300: SUCCESS; 300-499: FAIL; > 500: RETRY).
Existing Collector Plugins​
To see the current implemented collectors see the `collect_`` files