popv.hub.HubModel#

class popv.hub.HubModel(local_dir, metadata=None, repo_name=None, model_card=None, ontology_dir=None)[source]#

Wrapper for BaseModelClass backed by HuggingFace Hub.

Parameters:
  • repo_name (str | None (default: None)) – ID of the HuggingFace repo where this model is uploaded

  • local_dir (str) – Local directory where the data and pre-trained model reside.

  • metadata (dict | str | None (default: None)) – Dict or a path to a file on disk where this metadata can be read from.

  • model_card (HubModelCardHelper | ModelCard | str | None (default: None)) – The model card for this pre-trained model. Model card is a markdown file that describes the pre-trained model/data and is displayed on HuggingFace. This can be either an instance of ModelCard or an instance of HubModelCardHelper that wraps the model card or a path to a file on disk where the model card can be read from.

Attributes table#

adata

Returns the full training data for this model.

local_dir

The local directory where the data and pre-trained model reside.

metadata

The metadata for this model.

minified_adata

Returns the minified data for this model.

model_card

The model card for this model.

ontology_dir

The local directory where the models are downloaded.

repo_name

The local directory where the data and pre-trained model reside.

Methods table#

annotate_data(query_adata[, ...])

Annotate the query data with the trained model.

map_genes(adata, gene_symbols, organism)

Map genes to CELLxGENE census gene IDs.

pull_from_huggingface_hub(repo_name[, ...])

Download the given model repo from HuggingFace.

push_to_huggingface_hub(repo_name[, ...])

Push this model to HuggingFace.

save([overwrite])

Save the model card and metadata to the model directory.

Attributes#

HubModel.adata[source]#

Returns the full training data for this model.

If the data has not been loaded yet, this will call cellxgene_census.download_source_h5ad(). Otherwise, it will simply return the loaded data.

HubModel.local_dir[source]#

The local directory where the data and pre-trained model reside.

HubModel.metadata[source]#

The metadata for this model.

HubModel.minified_adata[source]#

Returns the minified data for this model.

If the data has not been loaded yet, this will call scanpy.read_h5ad(). Otherwise, it will simply return the loaded data.

HubModel.model_card[source]#

The model card for this model.

HubModel.ontology_dir[source]#

The local directory where the models are downloaded.

HubModel.repo_name[source]#

The local directory where the data and pre-trained model reside.

Methods#

HubModel.annotate_data(query_adata, query_batch_key=None, save_path='tmp', prediction_mode='fast', methods=None, gene_symbols=None)[source]#

Annotate the query data with the trained model.

Parameters:
  • query_adata (AnnData) – The query data to annotate.

  • query_batch_key (str | None (default: None)) – The batch key in the query data.

  • save_path (str (default: 'tmp')) – Path to save the query models.

  • prediction_mode (str (default: 'fast')) – The prediction mode to use. Either “fast” or “inference”. “fast” will only predict on the query data, while “inference” will integrate query and reference data.

  • methods (list | None (default: None)) – List of methods to use for annotation. If None, all methods in the model will be used.

  • gene_symbols (str | None (default: None)) – Gene symbols given as query_adata.var_names.

Return type:

AnnData

Returns:

AnnData The annotated data.

HubModel.map_genes(adata, gene_symbols, organism)[source]#

Map genes to CELLxGENE census gene IDs.

Return type:

AnnData | None

classmethod HubModel.pull_from_huggingface_hub(repo_name, cache_dir=None, revision=None, **kwargs)[source]#

Download the given model repo from HuggingFace.

The model, its card, data, metadata are downloaded to a cached location on disk selected by HuggingFace and an instance of this class is created with that info and returned.

Parameters:
  • repo_name (str) – ID of the HuggingFace repo where this model needs to be uploaded

  • cache_dir (str | None (default: None)) – The directory where the downloaded model artifacts will be cached

  • revision (str | None (default: None)) – The revision to pull from the repo. This can be a branch name, a tag, or a full-length commit hash. If None, the default (latest) revision is pulled.

  • kwargs – Additional keyword arguments to pass to huggingface_hub.snapshot_download().

HubModel.push_to_huggingface_hub(repo_name, repo_token=None, repo_create=False, repo_create_kwargs=None, collection_slug=None, delete_existing_files=False, **kwargs)[source]#

Push this model to HuggingFace.

If the dataset is too large to upload to HuggingFace, this will raise an exception prompting the user to upload the data elsewhere. Otherwise, the data, model card, and metadata are all uploaded to the given model repo.

Parameters:
  • repo_name (str) – ID of the HuggingFace repo where this model needs to be uploaded

  • repo_token (str | None (default: None)) – HuggingFace API token with write permissions if None uses token in HfFolder.get_token()

  • repo_create (bool (default: False)) – Whether to create the repo

  • repo_create_kwargs (dict | None (default: None)) – Keyword arguments passed into huggingface_hub.HfApi.create_repo() if repo_create=True.

  • collection_slug (str | None (default: None)) – The internal name in HuggingFace for a dataset collection.

  • delete_existing_files (bool (default: False)) – Whether to delete existing files in the repo before uploading new ones.

  • **kwargs – Additional keyword arguments passed into huggingface_hub.HfApi.upload_file().

HubModel.save(overwrite=False)[source]#

Save the model card and metadata to the model directory.

Parameters:

overwrite (bool (default: False)) – Whether to overwrite existing files.

Return type:

None