popv.preprocessing.Process_Query

popv.preprocessing.Process_Query#

class popv.preprocessing.Process_Query(query_adata, ref_adata, ref_labels_key, ref_batch_key, cl_obo_folder, query_batch_key=None, query_layer_key=None, ref_layer_key=None, prediction_mode='retrain', unknown_celltype_label='unknown', n_samples_per_label=300, save_path_trained_models='tmp/', pretrained_scvi_path=None, relabel_reference_cells=False, hvg=4000)[source]#

Processes the query and reference dataset in preparation for the annotation pipeline.

Parameters:
  • query_adata (AnnData) – AnnData of query cells

  • ref_adata (AnnData | bool) – AnnData of reference cells. Can contain only latent spaces if prediction_mode is not ‘retrain’.

  • ref_labels_key (str) – Key in obs field of reference AnnData with cell-type information

  • ref_batch_key (str) – List of Keys (or None) in obs field of reference AnnData to use as batch covariate

  • cl_obo_folder (list | str | bool) – Folder containing the cell-type obo for OnClass, ontologies for OnClass and nlp embedding of cell-types. Passing a list will use element 1 as obo, element 2 as ontologies and element 3 as nlp embedding. Setting it to false will disable ontology use.

  • query_batch_key (str | None (default: None)) – Key in obs field of query adata for batch information.

  • query_layer_key (str | None (default: None)) – If not None, expects raw_count data in query_layer_key.

  • ref_layer_key (str | None (default: None)) – If not None, expects raw_count data in ref_layer_key.

  • prediction_mode (str | None (default: 'retrain')) – Execution mode of cell-type annotation. “retrain”: Train all prediction models and saves them to disk if save_path_trained_models is not None. “inference”: Classify all cells based on pretrained models. “fast”: Fast inference using only query cells and single epoch in scArches.

  • unknown_celltype_label (str (default: 'unknown')) – Label for cells without a known cell-type.

  • n_samples_per_label (int | None (default: 300)) – Reference AnnData will be subset to these amount of cells per cell-type to increase speed.

  • pretrained_scvi_path (str | None (default: None)) – If path is None, will train scVI from scratch. Else if pretrained_path is set and all the genes in the pretrained models are present in query adata, will train the scARCHES version of scVI and scANVI, resulting in faster training times.

  • relabel_reference_cells (bool (default: False)) – If True, will relabel reference cells with cell-type information from query cells in inference mode.

  • save_path_trained_models (str (default: 'tmp/')) – If mode==’retrain’ saves models to this directory. Otherwise trained models are expected in this folder.

  • hvg (int | None (default: 4000)) – If Int, subsets data to n highly variable genes according to sc.pp.highly_variable_genes

Methods table#

Methods#