Collector Plugins
Collectors are core HopprPlugin
subclasses used to retrieve artifacts from wherever they are
maintained (e.g. a URL, or a specialized repository), and place them in a directory structure. That
directory structure can be used by other plugins (in particular, Bundlers) to create bundles that
can be transferred to isolated networks. Hoppr supports a number of basic PURL collectors out of
the box and we encourage you to open source new collectors.
Classes​
BaseCollectorPlugin Class​
An SBOM does not always specify the source repositories from which components are to be copied.
The repositories to search for components can also be specified by supplying manifest
configuration file(s). Often there will be more than one repository for a given PURL type.
Therefore, a collector plugin is expected to either:
- Search through all appropriate repositories until the requested component is found, or
- Prior to collection, temporarily configure the underlying CLI tool (skopeo, yum, dnf, helm, etc.) to only search for requested components in these repositories
The BaseCollectorPlugin
class contains functionality that is common to both of these scenarios.
Implementation of these scenarios is handled by its subclasses:
SerialCollectorPlugin
and
BatchCollectorPlugin
respectively.
We strongly recommend that any plugin class intended to act as a collector inherit from either the
SerialCollectorPlugin
or BatchCollectorPlugin
class (in the hoppr.base_plugins.collector
module), which overrides the HopprPlugin
class process_component
as a @final method. Subclasses
of SerialCollectorPlugin
or BatchCollectorPlugin
should not override process_component
.
All collectors should update the BOM with parameters identifying how the collection was made. The paramaters that must be updated are:
hoppr:collection:directory
hoppr:collection:plugin
hoppr:collection:repository
hoppr:collection:timetag
The set_collection_params
method in the Collector
base class is an easy way to accomplish this. After making these changes, the updated component must be returned as part of the Result
for the process.
Note that both SerialCollectorPlugin
and BatchCollectorPlugin
fail any collection that does not properly perform these updates.
SerialCollectorPlugin Class​
This class provides the base logic to search each repository listed in the manifest for each component listed in the SBOM.
The process_component
method of the SerialCollectorPlugin
class:
- Loops through each applicable repository and finds any credentials necessary for access.
- Calls the abstract method
collect
for the specified component, repository, and credentials. - If the
collect
method succeeds, processing exits without checking any more repositories. - If no repository returns a successful
collect
response, the last failed response is returned to the caller.
BatchCollectorPlugin Class​
This class does not loop over the repositories listed in the manifest. It contains a config_file
class attribute that can be used to
define a temporary location to write a configuration file for the underlying CLI tool. Collector plugins inheriting from this class
should override the pre_stage_process
method inherited from the HopprPlugin
class to pre-process the list of repositories in the
manifest and generate a temporary configuration file.
The process_component
method of the BatchCollectorPlugin
class:
- Calls the abstract method
collect
for the specified component, repository, and credentials. - If the
collect
method succeeds, processing exits without checking any more repositories. - If no repository returns a successful
collect
response, the last failed response is returned to the caller.
Sample Collector Plugin Implementation​
import requests
from requests.auth import HTTPBasicAuth
from typing import Any
from hoppr.base_plugins.hoppr import hoppr_rerunner
from hoppr.base_plugins.collector import SerialCollectorPlugin
from hoppr.result import Result
from hoppr.context import Context
class MyCollectorPlugin(SerialCollectorPlugin):
supported_purl_types = ["generic"] # This plugin will only run its process_component method for
# components with the generic purl type
def get_version(self): # Required
return "1.0.7"
@hoppr_rerunner
def collect(self, comp: Any, repo_url: str, creds: CredObject = None) -> Result:
authentication = None
if creds is not None:
authentication = HTTPBasicAuth(creds.userid, creds.password)
response = requests.get(repo_url), auth = authentication)
result = Result.from_http_response(response) # Result class convenience method
if result.is_success():
self.set_collection_params(comp, repo_url, "")
result = Result.success(result.message, return_obj=comp)
return result
This collector retrieves an artifact directly from the repo_url. This is not realistic. To be useful, there would need to be at least a few changes:
- It would need to specify a path or query paramter on the URL, likely based on the component PURL.
- It does not save the downloaded file in the directory structure specified in the context, and the relative directory passed to set_collection_params is empty.
The Result returned is based on the request response, using a convenience function that sets the response type based on the HTTP response code (< 300: SUCCESS; 300-499: FAIL; > 500: RETRY).
Existing Collector Plugins​
To see the current implemented collectors see the `collect_`` files