spatial Transcription region deg genemodule heatmap GO analysis
After the spatial transcriptome is annotated manually, the regional annotation information of the spatial group is obtained. At this time, we want to obtain the functional annotation information of its differentially expressed genes. There is no universal script here. I only provide an idea. ,For reference.
Since most of our spatial group data is in the h5ad file format of anndata, we need to convert it into the rds file format that can be read by seurat. Here we use a convenient tool that we talked about in our previous blog Interaction between rds and h5ad Conversion, the approximate code is as follows:
1 | sceasy::convertFormat('/data/*.h5ad', |
So we got the corresponding rds file, and then we used the findallmarkers function in seurat to screen differentially expressed genes and remove some genes we don't need, such as mitochondrial genes, ribosomal protein genes, hemoglobin genes, and cytoplasmic genes. , collagen genes, etc. Here we can filter through regular expressions and sort avglog2fc to filter out the top 50 differentially expressed genes of each group, and then calculate them with genemodule. What needs to be noted here is that when I was calculating deg, I did not normalize the data of the spatial group, because it is bin50 data. Of course, it is not impossible to normalize it. You can decide whether you want to do this. Carry out normalization; in addition, when calculating genemodule, I chose to select the raw matrix and the matrix after normalization to calculate genemodule. You can decide whether to normalize based on the results. Although addgenemodule's The documentation states that the input matrix is a normalized matrix.
There are also some strange-looking codes in the script. The matrix input to calculate deg and the matrix used to calculate genemodule can be different. The reason is that we may capture certain positions on the space group chip differently from other places, and we need to remove them. At this time, we can select the matrix with certain areas removed to calculate deg, and then use the normal chip to calculate genemodule.
1 | # Rscript your_script.R path_to_rds_file.rds refined_pred_column_name path_to_output_csv_file.csv |
After calculating the information of genemodule, we need to draw it on the spatial group to see whether these calculated deg are enriched in the areas we divided. Scanpy is used here. The reason for using scanpy is the spatial drawing function of scanpy. More versatile and convenient.
1 | # python your_script.py --adata_file_path path_to_adata_file --csv_file_path path_to_csv_file --gene_modules module_name |
So far, deg has been calculated, genemodule has been calculated, and drawing has been completed. Next, draw the heatmap. Here we use Python for calculation and R language for drawing. We treat each area as a psudo bulk. For each gene, calculate its average expression in each psudo bulk, and then Save it as csv and then draw it using R language.
1 | import pandas as pd |
The plotting function looks like this:
1 | # Rscript 1.R gene_list_value rds_path_value gene_modules_value gene_module_path_value gene_module_normalized_path_value |
The heatmap of the R function draws more than one graph, and multiple graphs. You can choose the one you need. The last step is to perform GO enrichment analysis. Here I use clusterProfiler to perform GO enrichment analysis. The code here is also relatively simple. You can refer to it yourself.
1 | # Rscript your_script.R path_to_rds_file.rds refined_pred_column_name path_to_output_csv_file.csv |
This code can give a rough reference. Sometimes we only want BP-related data. We can also read the table generated above and redraw it. The code is as follows:
1 | import pandas as pd |
In general, we have implemented the calculation of deg of spatial group data, the calculation and drawing of genemodule, the calculation of the expression amount of each gene in each spatial annotation region and the drawing of heatmap, as well as the GO analysis of each region.
Finally, there is a simple sh script to run these 6 python or R scripts in a unified manner, as shown below:
1 | Save address of all files |
that's all.