The revolutionary CRISPR-Cas9 technology has revolutionized genetic engineering, and it holds immense potential for therapeutic interventions. However, the presence of off-target mutations and mismatch capacity poses significant challenges to its safe and precise implementation. In this study, we explore the implications of off-target effects on critical gene regions, including exons, introns, and intergenic regions. Leveraging a benchmark dataset and using innovative data preprocessing techniques, we have put forth the advantages of categorical encoding over one-hot encoding in training machine learning classifiers. Crucially, we use latent class analysis (LCA) to uncover subclasses within the off-target range, revealing distinct patterns of gene region disruption. Our comprehensive approach not only highlights the critical role of model complexity in CRISPR applications but also offers a transformative off-target scoring procedure based on ML classifiers and LCA. By bridging the gap between traditional target-off scoring and comprehensive model analysis, our study advances the understanding of off-target effects and opens new avenues for precision genome editing in diverse biological contexts. This work represents a crucial step toward ensuring the safety and efficacy of CRISPR-based therapies, underscoring the importance of responsible genetic manipulation for future therapeutic applications.
Add the publication’s full text or supplementary notes here. You can use rich formatting such as including code, math, and images.