popv.preprocessing.Process_Query#
- class popv.preprocessing.Process_Query(query_adata, ref_adata, ref_labels_key, ref_batch_key, cl_obo_folder, query_batch_key=None, query_layer_key=None, ref_layer_key=None, prediction_mode='retrain', unknown_celltype_label='unknown', n_samples_per_label=300, save_path_trained_models='tmp/', pretrained_scvi_path=None, relabel_reference_cells=False, hvg=4000)[source]#
Processes the query and reference dataset in preparation for the annotation pipeline.
- Parameters:
query_adata (
AnnData) – AnnData of query cellsref_adata (
AnnData|bool) – AnnData of reference cells. Can contain only latent spaces if prediction_mode is not ‘retrain’.ref_labels_key (
str) – Key in obs field of reference AnnData with cell-type informationref_batch_key (
str) – List of Keys (or None) in obs field of reference AnnData to use as batch covariatecl_obo_folder (
list|str|bool) – Folder containing the cell-type obo for OnClass, ontologies for OnClass and nlp embedding of cell-types. Passing a list will use element 1 as obo, element 2 as ontologies and element 3 as nlp embedding. Setting it to false will disable ontology use.query_batch_key (
str|None(default:None)) – Key in obs field of query adata for batch information.query_layer_key (
str|None(default:None)) – If not None, expects raw_count data in query_layer_key.ref_layer_key (
str|None(default:None)) – If not None, expects raw_count data in ref_layer_key.prediction_mode (
str|None(default:'retrain')) – Execution mode of cell-type annotation. “retrain”: Train all prediction models and saves them to disk if save_path_trained_models is not None. “inference”: Classify all cells based on pretrained models. “fast”: Fast inference using only query cells and single epoch in scArches.unknown_celltype_label (
str(default:'unknown')) – Label for cells without a known cell-type.n_samples_per_label (
int|None(default:300)) – Reference AnnData will be subset to these amount of cells per cell-type to increase speed.pretrained_scvi_path (
str|None(default:None)) – If path is None, will train scVI from scratch. Else if pretrained_path is set and all the genes in the pretrained models are present in query adata, will train the scARCHES version of scVI and scANVI, resulting in faster training times.relabel_reference_cells (
bool(default:False)) – If True, will relabel reference cells with cell-type information from query cells in inference mode.save_path_trained_models (
str(default:'tmp/')) – If mode==’retrain’ saves models to this directory. Otherwise trained models are expected in this folder.hvg (
int|None(default:4000)) – If Int, subsets data to n highly variable genes according to sc.pp.highly_variable_genes