SaSi: A Self-augmented and Self-Interpreted Deep-Learning Approach for Few-shot Cryo-ET Particle Detection

SaSi:
A Self-augmented and Self-Interpreted Deep-Learning Approach for Few-shot Cryo-ET Particle Detection

Gokul Adethya T
Bhanu Pratyush Mantha
Tianyang Wang
Xingjian Li
Min Xu

Abstract

Cryo-electron tomography (cryo-ET) has emerged as a powerful technique for imaging macromolecular complexes in their near-native states. However, the localization of 3D particles in cellular environments still presents a significant challenge due to low signal-to-noise ratios and missing wedge artifacts. Deep learning approaches have shown great potential, but they need huge amounts of data, which can be a challenge in cryo-ET scenarios where labeled data is often scarce. In this paper, we propose a novel Self-augmented and Self-interpreted (SaSi) deep learning approach towards few-shot particle detection in 3D cryo-ET images. Our method builds upon self-augmentation techniques to further boost data utilization and introduces a self-interpreted segmentation strategy for alleviating dependency on labeled data, hence improving generalization and robustness. As demonstrated by experiments conducted on both simulated and real-world cryo-ET datasets, the SaSi approach significantly outperforms existing state-of-the-art methods for particle localization, demonstrating its compatibility with model architectures ranging from CNNs to Vision Transformers (ViT). This research increases understanding of how to detect particles with very few labels in cryo-ET and thus sets a new benchmark for few-shot learning in structural biology.

Approach

Sampling

Tomograms are divided into smaller W × W × W subvolumes, with window sliding to reduce boundary information loss. Samples are centered on particle centroids, balanced by particle type, and augmented with spatial transformations. Weak point labels are converted to pseudo-strong labels via spheres.

Post Processing

Voxel-wise classification is obtained using an arg max operation on the model output. We apply 3D connected components (cc3d) with 26-connectivity, which proves reliable in few-shot settings and requires no hyperparameter tuning. cc3d outperforms Meanshift and MP-NMS post-processing strategies from Deepfinder and DeepETpicker, making it a strong baseline for these methods.

Augmix

AugMix enhances generalization by generating augmented particles and increasing volumetric density in cryo-ET images. Only random shifting, rotation, and flipping are applied to prevent structural distortion. Multiple augmented variants are mixed using a Dirichlet-weighted sum, combined with the original image via skip connections, and the resulting ground truth masks are generated using the arg max operation.

Self-Supervised

We apply contrastive learning with SimCLR and NT-Xent Loss to learn general cryo-ET features. Augmented variants are encoded and projected to 128-dimensional vectors, optimizing similarity for pairs and minimizing it for different samples. We use two configurations: Periodic Self-Supervision (PSS) and Initial Self-Supervision (ISS). Self-supervised learning enhances feature extraction, transitioning to supervised learning for task-specific adaptation.

Self-Interpreted Image Segmentation

This method trains segmentation models without ground truth masks by enforcing consistency between predictions and spatially transformed inputs. Given an input and its transformed variant, the model learns to align their segmentation maps. This approach applies to both supervised and self-supervised phases, covering encoder and decoder learning. The model's ability to maintain consistent segmentation under transformations indicates structural understanding.

Results

Post-Processing

cc3d performed consistently better than other post processing which could be due to other post processing overly reliant on better model’s base predictions. And cc3d requires no parameter tuning.

SaSi on real-world dataset

SaSi shows more improvement in SHREC dataset compared to Real world dataset where it still remains challenging.

AugMix Performance

For N = 3:

AugMix performs worse than baselines across models.
Limited training samples increase uncertainty and distort information.

For N = 5, 10:

Significant performance improvement with AugMix.
Robustness of AugMix becomes evident.

Self-Supervised Learning

Performance Trends: SSL combined with AugMix generally enhances performance across models, except for DeepETPicker, which exhibits anomalous behavior. It could be due to high parameter count causing faster overfitting.
SSL Effectiveness: SSL aids in learning robust representations by leveraging intrinsic data properties, reducing label biases, and improving performance for imbalanced classes.
PSS Advantages: PSS effectively integrates SSL and supervised learning, mitigating overfitting through comprehensive use of tomograms.
ISS Simplicity: ISS simplifies implementation by using fixed SSL steps and generally performs better with increased SSL steps, avoiding hyperparameter tuning (periodicity related parameters).
Challenges: ISS outperforms PSS by avoiding complex tuning, though it depends on AugMix and consistency loss for robustness since supervised learning component can still induce overfitting.

Consistency

Consistency Loss Complementarity: Consistency loss complements AugMix, PSS, and ISS, enhancing performance with minimal trade-offs.
AugMix Improvement: Consistency loss significantly improves AugMix's performance for small sample sizes (e.g., N = 3), addressing prior shortcomings.
Combination Benefits: The combination of self-interpreted learning, AugMix, and SSL yields notable improvements, suggesting consistency mechanisms provide effective regularization for better generalization but still remains a bit unstable.

WanDB Report

Citation

[arxiv version]

@inproceedings{
    lorem ipsum
}