demoRAT
Demo for RAT is provided with the package. There are two bash scripts provided inside the demo/demo3_rat/. First one is run_rat_preprocess.sh and second is run_rat.sh where run_rat_preprocess.sh should be run first. Both are explained as under.
Step 1: Preparing files for dds_analysis
The first step involves preparing files for dds_analysis by running the preprocess command. This command integrates DMR and DEG data and prepares the necessary input files for further analysis. Here is the code:
dds_analysis preprocess \
-in_folder ../../data/rat_data/in_data/final_demo_data/rat_data/out_data/DMR_CpG_context/out_map2genome/ \
-in_string '_rat' \
-in_tss_file_mr ../../data/rat_data/in_data/final_demo_data/rat_data/out_data/DMR_CpG_context/out_map2genome/5_chroms_all_mr_data_range_dmrRanking_TSS_Up5000_Down1000_removedShort_overlap1e-09.bed \
-in_dist_file ../../data/rat_data/in_data/final_demo_data/rat_data/out_data/DMR_CpG_context/out_map2genome/5_chroms_all_mr_data_range_dmrRanking_noGenes_5dist_Up1000000_Up5000removedShort_overlap1e-09.bed\
-in_deg_file ../../data/rat_data/in_data/final_demo_data/rat_data/in_data/DEG/Adrenal1vsAdrenal2_DEG_genes_zscores.tsv\
-out_folder ../../data/rat_data/out_data/ \
-tss_file ../../data/rat_data/in_data/final_demo_data/rat_data/out_data/DMR_CpG_context/data/TSS_Up5000_Down1000_removedShort.bed \
-full_mr_file ../../data/rat_data/in_data/final_demo_data/rat_data/out_data/DMR_CpG_context/5_chroms_all_mr_data_range_dmrRanking.bed \
-in_genome_file ../../data/rat_data/in_data//final_demo_data/genome/rn6/rn6.enhancers_all_rn5_merged_rn6liftOvered_4dmr.bed \
-gene_col_name 'gene_name'
echo "To find DMR regions that are overlapping with TSS or 5distance regions of DEG - and preprocess Done"
Methylation region data with TSS: 5_chroms_all_mr_data_range_dmrRanking_TSS_Up5000_Down1000_removedShort_overlap1e-09.bed
5_chroms_all_mr_data_range_dmrRanking_noGenes_5dist_Up1000000_Up5000removedShort_overlap1e-09.bed
First two columns of DEG file: Adrenal1vsAdrenal2_DEG_genes_zscores.tsv
gene_name A_1_ A_2_ A_3_ A_4_ A_5_ A_6_ A_7_ A_8_ A_9_ A_10_ A_11_ A_12_ A_13_ A_14_ A_15_ A_16_ A_17_ A_18_ A_19_ A_20_
Arid1a 0.7561530862764123 0.7823439770635116 0.758952508244845 0.7698258391396798 0.7497426117338984 0.769720948287346 0.732882230955056 0.7296251043089049 0.7514965201895426 0.7338532863506189 0.7886223599117105 0.8529983070846237 0.7885741576782851 0.8218302605645192 0.8091759613067595 0.8418558180210152 0.8557029309669116 0.8693442751832348 0.845426060550429 0.8252216195608033
Thrap3 0.9440078946411076 0.9447565791934174 0.9617047828634032 0.9798829216418276 0.9845821803378495 1.0257580329373464 0.9429194156914074 1.0212971018858565 0.9475315367780314 0.9980933740996731 1.0788865992755503 1.0205072299517544 1.0017010542203388 1.1051539689926302 1.0494474332333024 1.0576674827137567 1.0449053416313256 1.100779331162259 1.089149536395491 1.0448197466318785
Head of Full MR data: 5_chroms_all_mr_data_range_dmrRanking.bed
chr1 1606237 1607593 chr1:mr0:hyper:D 0.9922001543476859
chr1 1608763 1614639 chr1:mr1:hypo:D 0.6957408652288558
chr1 1616202 1632163 chr1:mr2:mix:U 0.002962664792698039
chr1 1633282 1670344 chr1:mr3:mix:U 0.5354765760385694
chr1 1672428 1702222 chr1:mr4:hypo:D 0.6678625470204508
chr1 1703720 1752087 chr1:mr5:hypo:D 0.5689430990479308
chr1 1753230 1757707 chr1:mr6:mix:D 0.9692464930654068
chr1 1759129 1759849 chr1:mr7:hypo:U 0.015063164586032518
Step 2: Export data:
The second step involves running the dmr_analysis dmr_exportData command to export relevant methylation region data.
Defining input/output paths
IN_DATA_PATH='../../data/rat_data/in_data/final_demo_data/rat_data/'
IN_MR_PATH=${IN_DATA_PATH}'/out_data/DMR_CpG_context/'
IN_DEG_PATH=${IN_DATA_PATH}'/in_data/DEG/'
# Define output path
OUT_PATH='../../data/rat_data/out_data/'
# Define file paths
FILE_FOLD=${OUT_PATH}/out4mr_not_in_tss_enhancer
BACK_FILE=${OUT_PATH}/background_samples_list.tsv
# Set variables
in_data_str='_rat'
is_run_dmr_export=1
is_run_dtarget=1
Export data for DMRs overlapping with TSS or 5’distance regions
if [ $is_run_dmr_export == 1 ]; then
dmr_analysis dmr_exportData \
--input_mr_data_folder ${IN_MR_PATH} \
--output_file_folder ${OUT_PATH}/out4dmr_in_deg_tss_5dist \
--input_file_format 0 \
--number_of_processes 10 --input_file ${OUT_PATH}'/uqdmr_regions_in_deg_tss_5dist'${in_data_str}'.bed' -wtStr '_Ctrl'
echo "Export data of DMRs overlapping to TSS or 5distance - Done "
echo ""
Export data for MRs that are not in TSS or enhancer regions
dmr_analysis dmr_exportData \
--input_mr_data_folder ${IN_MR_PATH} \
--output_file_folder ${OUT_PATH}/out4mr_not_in_tss_enhancer \
--input_file_format 0 \
--number_of_processes 10 --input_file ${OUT_PATH}'/mr_regions_not_in_enhancers'${in_data_str}'_tss.bed' -wtStr '_Ctrl'
echo "Export data of MRs not in TSS or enhancers - Done "
fi
Output of above export file is :
chr1 1606237 1607593 chr1:mr0:hyper:D 0.9922011104993337
chr1 1608763 1614639 chr1:mr1:hypo:D 0.6973919073178528
chr1 1616202 1632163 chr1:mr2:mix:U 0.002969192663881476
chr1 1753230 1757707 chr1:mr6:mix:D 0.9680422087690608
chr1 1759129 1759849 chr1:mr7:hypo:U 0.01498598083737056
chr1 2046752 2046955 chr1:mr14:mix:U 0.0014361082628083287
chr1 2066186 2066327 chr1:mr15:hypo:D 0.9992651107122464
chr1 2238597 2239647 chr1:mr21:mix:U 0.010873500659218552
Create background file list if it does not exist
if ! [ -f $BACK_FILE ]; then
echo $BACK_FILE " not exists and create one ! "
if [ -e $FILE_FOLD ]; then
ls ./${FILE_FOLD}/chr*/data/*raw*.* > $BACK_FILE
echo "Create " $BACK_FILE
else
echo "Cannot create background file because no data folder find! " $FILE_FOLD
fi
fi
Step 3: Running dds_analysis dTarget_methy_vs_express
The third step involves running the dds_analysis dTarget_methy_vs_express command to predict putative target genes for DMRs based on their associations from either TSS or 5’distance regions. Here is the code:
# Run dTarget_methy_vs_express for predicting target genes
if [ $is_run_dtarget == 1 ]; then
gene_mr_file=${OUT_PATH}'/uqGeneDmr_regions_in_deg_tss'${in_data_str}'.bed'
gene_exp_file=${IN_DEG_PATH}'/Adrenal1vsAdrenal2_DEG_genes_zscores.tsv'
in_mr_data_folder=${OUT_PATH}/out4dmr_in_deg_tss_5dist
in_background_mr_file=$BACK_FILE
number_of_samples=10
# Test target gene and DMR associations from TSS regions
dds_analysis dTarget_methy_vs_express -inGeneMRfile $gene_mr_file -mrTAB \
-inGeneEXPfile $gene_exp_file -expTAB \
-inMRfolder $in_mr_data_folder -outName 'tss_region_' \
-output_path $OUT_PATH -sampleName 'sample_name4replace.tsv' \
-pathDepth 1 -inBackgroundList $in_background_mr_file -cutoff 0.05 -totalSamples $number_of_samples -numOfprocesses 10
echo "Done with TSS target gene prediction"
# Test target gene and DMR associations from 5'distance regions
gene_mr_file=${OUT_PATH}'/uqGeneDmr_regions_in_deg_5dist'${in_data_str}'_overlap_enhancer.bed'
dds_analysis dTarget_methy_vs_express -inGeneMRfile $gene_mr_file -mrTAB \
-inGeneEXPfile $gene_exp_file -expTAB \
-inMRfolder $in_mr_data_folder -outName 'distance_region_' \
-output_path $OUT_PATH -sampleName 'sample_name4replace.tsv' \
-pathDepth 1 -inBackgroundList $in_background_mr_file -cutoff 0.01 -totalSamples $number_of_samples -numOfprocesses 10
echo "Done with 5'distance target gene prediction"
fi
Step 4: Plotting selected target gene and DMR associations
gene_exp_file=${IN_DEG_PATH}'/Adrenal1vsAdrenal2_DEG_genes_zscores.tsv'
OUT_PATH='../../data/rat_data/out_data/'
dds_analysis plot_mr_vs_exp -inGeneEXPfile ${gene_exp_file} \
-dpi 300 -inMRfolder ${OUT_PATH}/out4dmr_in_deg_tss_5dist \
-sampleName sample_name4replace.tsv -expTAB -inGene 'Tab2' -inMR 'chr1:mr16' -wtStr '_Ctrl' -output_path ${OUT_PATH}
The output for the above command where we plot chromosome 1 methylation region 16 is following:
