======== demoFLMR ======== This demo has five shell scripts. s1* should be run first, s2* should run second and so on for complete and correct run of the demo. Step 0: ------- Generate predefined genomic regions files based on gene Code annotation: .. code-block:: bash OUT_FOLDER='../../data/fl_mr/in_data/in_genome_regions/' IN_FOLDER='../../data/fl_mr/in_data/gencode/' dmr_analysis dmr_gene_annotation -F ${OUT_FOLDER} -i no -l 10 \ -rem no -xL 50000000 -X 10000 -Y 10000 -M 10000 -N 1000000 -hu yes -n no \ -r ${IN_FOLDER}/gencode.v19.annotation_gene.refFlat \ -g ${IN_FOLDER}/hg19.chrom.sizes.clear.sorted echo gene_annotation-DON Please note that the script assumes certain file paths and data organization. Make sure to modify the paths according to your specific file structure and requirements. Also, remember to add the enhancer file manually in the `list_region_files.txt` file after running this script. Here we assume 1. mutation blocks overlapping to enhancers and differential methylation regions are done (e.g., demo7 of bpb3 package) 2. and differential methylated regions (DMR) are already done by dmr_analysis This script is similar to demo1 (e.g., integrated analysis of DEG, DMR, mutation blocks, and TAD - in paper https://pubmed.ncbi.nlm.nih.gov/35495111/) But we only consider mutation blocks that are overlapping with enhancers and differential DNA methylation regions The output folders: out_chromSegment, out_expression, out_genome, out_dds may need to be created manually!! .. code-block:: bash #path of main folder to output data OUT_FOLDER='../../data/fl_mr/out_data/' # path of input data folder IN_FOLDER='../../data/fl_mr/in_data/' #Path of mutation blocks and block summary file exported by bpb3 package IN_BLOCKS=${IN_FOLDER}'/bpb3_demo7_mr_enhancers/out/mussd_blocks/' IN_BLOCK_SUMMARY='blocks_summary.tsv' #Path of mutation blocks with patient ID that will be exported by dds_analysis find_block_patientID #IN_BLOCK_PATIENT=${IN_BLOCKS}'/blocks_summary_and_patientID2.tsv' #Other INPUT PATH and files et al #Convert bpb3 mutation blocks summary file to a bed format file with positions dds_analysis bpb3summary2bed_format --in_block_summary_file ${IN_BLOCKS}/$IN_BLOCK_SUMMARY echo ${IN_BLOCKS}'/'${IN_BLOCK_SUMMARY} echo "preprocess_data - Done" replace='_block_position.bed' finds='.tsv' IN_BLOCK_FILE=${IN_BLOCK_SUMMARY//$finds/$replace} #Input path of a list of predefined genomic regions that will be used to map mutation blocks, and DMRs. IN_GENOME_REGIONS=${IN_FOLDER}'/in_genome_regions/' IN_LIST_REGIONS='list_region_files.txt_gencode' IN_GENOME_refFlat=${IN_GENOME_REGIONS}'/data/gencode.v19.annotation_gene.ref_clean_sorted.bed' #path of genome lib information files such as TAD, chromatin segment, and genome files et al IN_GENOME_LIB='../../data/' #path of chromatin segment files that will be used to intersect with mutation blocks IN_CHROMSEG=${IN_GENOME_LIB}'/chromSegment/hg19/' #path of TAD information that will be used to map mutation blocks IN_TAD=${IN_GENOME_LIB}'/in_tad/' #path to common TAD positions across 5 human cell lines IN_TAD_INFO=${IN_TAD}'/Table4_TAD_annotations_sorted_chr.bed' #path to common TAD boundary position among 5 human cell lines IN_TAD_BOUNDARY=${IN_TAD}'/Table1_common_boundaries_merged_sorted_chr.bed' #path to gene positions in sorted bed format which is generated by s1_run_annotation.sh IN_GENE_POSITION=${IN_GENOME_REGIONS}'data/gene_Up1_Down1removedShort.bed' #Path of differentially methylated regions, differentially expressed genes, and output folder for mutation blocks mapped to predefined genomic regions. #Input path to DMRs/MRs predicted and ranked by dmr_analysis IN_DMR=${IN_FOLDER}'/DMR_CpG_context/2_chroms_all_mr_data_range_dmrRanking.bed' #path to DEG genes exported from bpb3 package IN_EXPRESS=${IN_FOLDER}'/in_DEG/differentially_expressed_genes_nonfolder.txt' #add N bp flank region on the two sides of mutation blocks before intersecting them to DMRs FLANK_BP=0 #OUTPUT PATH #path for exporting mutation blocks mapped to predefined genomic regions OUT_GENOME=${OUT_FOLDER}'/out_genome/' #path for exporting mutation blocks mapped to chromatin segmentations OUT_CHROMSEG=${OUT_FOLDER}'/out_chromSegment/' #path for exported mutation blocks associated with DEGs OUT_EXPRESS=${OUT_FOLDER}'/out_expression/' #path for final results OUT_FINAL=${OUT_FOLDER}/'out_DmrDeg2block' Block summary input file is following: .. code-block:: bash block_id chrom start_pos end_pos number_of_mutations number_of_patients mutation_distribution region_names block_2_18_60987833_60988552 18 60987808 60988576 118 9 35,21,13,8,11,9,8,5,8 chr18:mr22253:hypo:D block_1_18_60984827_60987028 18 60984802 60987052 160 9 24,32,10,22,5,22,24,6,15 chr18:mr22251:hyper:D,chr18:mr22252:hypo:D,chr18:mr22253:hypo:D block_0_18_60983893_60984144 18 60983868 60984168 22 4 8,4,9,1 chr18:mr22250:hyper:D,chr18:mr22251:hyper:D,chr18:mr22252:hypo:D,chr18:mr22253:hypo:D DMR file: 2_chroms_all_mr_data_range_dmrRanking.bed .. code-block:: bash chr18 57001634 57003436 chr18:mr0:mix:U 0.1308206406073999 chr18 57005127 57007836 chr18:mr1:mix:U 0.0674646718818239 chr18 57012900 57015233 chr18:mr2:mix:U 0.08007764185618908 chr18 57017008 57018986 chr18:mr3:hypo:U 0.0020920250443539944 chr18 57019993 57020668 chr18:mr4:mix:U 0.3942126190276321 chr18 57022124 57027729 chr18:mr5:hypo:D 0.9998645092181799 chr18 57028952 57030829 chr18:mr6:hypo:D 0.9999113822308282 chr18 57033570 57042036 chr18:mr7:mix:U 0.16519832525306027 chr18 57043738 57045135 chr18:mr8:mix:D 0.9931967885078828 chr18 57046285 57065804 chr18:mr9:mix:U 0.1181969538449906 chr18 57066825 57071330 chr18:mr10:mix:U 0.09771325259636 chr18 57076353 57080279 chr18:mr11:hypo:D 0.9999480400136458 Head of the TAD boundaries Table1_common_boundaries_merged_sorted_chr.bed .. code-block:: bash chr1 1200000 1275000 1 75000 chr1 3700000 3775000 1 75000 chr1 5925000 6000000 1 75000 chr1 6475000 6550000 1 75000 chr1 7900000 7975000 1 75000 chr1 8350000 8425000 1 75000 chr1 8875000 8950000 1 75000 chr1 9550000 9625000 1 75000 chr1 10450000 10525000 1 75000 chr1 10975000 11050000 1 75000 chr1 12050000 12125000 1 75000 Step 1: _______ Map mutation block to predefined genomic regions .. code-block:: bash dds_analysis map_block2genome --in_sortedBlock_file $IN_BLOCK_FILE \ --in_genomeRegion_file $IN_LIST_REGIONS \ --in_referenceGenome_file $IN_GENOME_refFlat \ --out_file_folder $OUT_GENOME --is_MR_or_Blocks 1 echo map_block2genome - Done echo "" #read -n 1 -p "Click for continuing 1 :" echo "" Step 2: _______ Map mutation block to chromSegment, here assume all input mutation block has the same input file as the map_block2genome .. code-block:: bash dds_analysis map_block2chromSegment --in_sortedBlock_file $IN_BLOCK_FILE \ --in_chromSegment_folder $IN_CHROMSEG \ --out_file_folder $OUT_CHROMSEG --is_MR_or_Blocks 1 echo map_block2chromSegment - Done echo "" #read -n 1 -p "Click for continuing 2 :" echo "" Step 3: ------- Map mutation blocks to DMR after adding flank regions to two-sides of blocks if it is needed .. code-block:: bash dds_analysis map_block2dmr --in_sortedBlock_file $IN_BLOCK_FILE \ --in_dmr_file $IN_DMR \ --in_flank_region2block $FLANK_BP --out_file_folder $OUT_FOLDER echo map_block2dmr - Done echo "" #if the file name is known then skip this read -p "Enter exported map_block2dmr file name with full path for continuing 3: " IN_BLOCK_DMR echo "" echo "IN_BLOCK_DMR: $IN_BLOCK_DMR " echo "" Step 4: ------- Combine genomic regions with block information and find their associated differential expressed genes .. code-block:: bash dds_analysis find_geneExp4block --in_blocks_genome_folder $OUT_GENOME \ --in_sortedBlock_file $IN_BLOCK_FILE \ --in_deg_file $IN_EXPRESS \ --in_feature_list 'TSS,gene,TES,5dist,enhancers' \ --out_file_folder $OUT_EXPRESS echo find_geneExp4block - Done #read -n 1 -p "Click for continuing 4 :" Step 5: ------- Find patient IDs for each mutation block, this step can be slow if the number of blocks is large! but only run once for block summary from BB2 .. code-block:: bash dds_analysis find_block_patientID \ --in_block_summary_file ${IN_BLOCKS}/$IN_BLOCK_SUMMARY \ --in_block_folder $IN_BLOCKS echo find_block_patientID - Done echo "" #if file name is known then skip this line #read -p "Enter exported patient ID file name with full path for continuing 4: " IN_BLOCK_PATIENT echo "" IN_BLOCK_PATIENT=${IN_BLOCKS}'/blocks_summary_and_patientID2.tsv' echo "IN_BLOCK_PATIENT: $IN_BLOCK_PATIENT " echo "" #read -n 1 -p "Click for continuing 5 :" echo "" #This file exported from earlier step: map_block2dmr #this file name need be manually input #IN_BLOCK_DMR='blocks_summary_block_position_0flank_0.7Proba_2blocks_2blocks2mr_2blocks2dmr' Step 6: ------- Combine DMR, DEG and block information together .. code-block:: bash dds_analysis combine_dmr_deg2block --in_sortedBlock_patient_file $IN_BLOCK_PATIENT \ --in_dmr_file ${IN_BLOCK_DMR} \ --in_deg_folder_and_file_suffix ${OUT_EXPRESS}'/*.tsv' \ --out_file_folder $OUT_FINAL echo combine_dmr_deg2block - Done finds='.tsv' replace='' FULL_PATH_IN_BLOCK_DMR=${IN_BLOCK_DMR//$finds/$replace} IN_BLOCK_DMR_NAME=$(basename "$FULL_PATH_IN_BLOCK_DMR") echo $IN_BLOCK_DMR_NAME blocks_summary_block_position_0flank_0.6Proba_3blocks_2blocks2mr_2blocks2dmr_deg_info_filtered_DMR_or_DEG_uniqGene_commonTAD_Boundary.tsv: .. code-block:: bash gene_name patients gene_type block_id new_mr_sites patient_id enhancers TAD2gene Boundary2gene TAD2block Boundary2block isTAD BCL2 9 TSS block_2_18_60987833_60988552 chr18:mr621 patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 nan 18:60675000:61075000:Low-active na 18:60675000:61075000:Low-active na 1.0 BCL2 9 TSS block_1_18_60984827_60987028 chr18:mr621 patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 enhancer 18:60675000:61075000:Low-active na 18:60675000:61075000:Low-active na 1.0 BCL2 4 TSS block_0_18_60983893_60984144 nan patient_0,patient_3,patient_4,patient_7 enhancer 18:60675000:61075000:Low-active na 18:60675000:61075000:Low-active na 1.0 KDSR 9 TES block_2_18_60987833_60988552 chr18:mr621 patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 nan 18:60675000:61075000:Low-active na 18:60675000:61075000:Low-active na 1.0 KDSR 9 TES block_1_18_60984827_60987028 chr18:mr621 patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 enhancer 18:60675000:61075000:Low-active na 18:60675000:61075000:Low-active na 1.0 SERPINB5 9 5dist block_2_18_60987833_60988552 chr18:mr621 patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 nan 18:61150000:61575000:Low 18:61075000:61150000:1 18:60675000:61075000:Low-active na 0.0 HMSD 9 5dist block_2_18_60987833_60988552 chr18:mr621 patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 nan na 18:61575000:61650000:1 18:60675000:61075000:Low-active na 0.0 Step 7: ------- Filter blocks based DMR or DEG information .. code-block:: bash dds_analysis filter_blocks --in_combined_DmrDegBlock_file $OUT_FINAL/${IN_BLOCK_DMR_NAME}_deg_info.tsv echo filter_blocks - Done #read -n 1 -p "Click for continuing 7: " Step 8: ------- Collect unique genes name from predicted blocks with DMR and DEG .. code-block:: bash dds_analysis collect_gene_names4blocks --in_filtered_blockGene_file $OUT_FINAL/${IN_BLOCK_DMR_NAME}_deg_info_filtered_DMR_and_DEG.tsv dds_analysis collect_gene_names4blocks --in_filtered_blockGene_file $OUT_FINAL/${IN_BLOCK_DMR_NAME}_deg_info_filtered_DMR_or_DEG.tsv echo collect_gene_names2blocks - Done blocks_summary_block_position_0flank_0.6Proba_3blocks_2blocks2mr_2blocks2dmr_deg_info_filtered_DMR_and_DEG.tsv .. code-block:: bash block_id chrom start_pos end_pos number_of_mutations number_of_patients mutation_distribution patient_id new_mr_sites mr_logReg_proba filtered_TSS filtered_deg_p_value2TSS filtered_TES filtered_deg_p_value2TES filtered_5dist filtered_deg_p_value25dist enhancers deg_p_value2enhancers block_2_18_60987833_60988552 18 60987808 60988576 118 9 35,21,13,8,11,9,8,5,8 patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 chr18:mr621 0.624000054239737 BCL2 1.4739795539267432e-09 KDSR 2.253337153854709e-06 SERPINB5,HMSD 0.0188981284877041,1.3204080353838545e-07 block_1_18_60984827_60987028 18 60984802 60987052 160 9 24,32,10,22,5,22,24,6,15 patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 chr18:mr621 0.624000054239737 BCL2 1.4739795539267432e-09 KDSR 2.253337153854709e-06 enhancer ,enhancer nan,nan blocks_summary_block_position_0flank_0.6Proba_3blocks_2blocks2mr_2blocks2dmr_deg_info_filtered_DMR_or_DEG .. code-block:: bash block_id chrom start_pos end_pos number_of_mutations number_of_patients mutation_distribution patient_id new_mr_sites mr_logReg_proba filtered_TSS filtered_deg_p_value2TSS filtered_TES filtered_deg_p_value2TES filtered_5dist filtered_deg_p_value25dist enhancers deg_p_value2enhancers block_2_18_60987833_60988552 18 60987808 60988576 118 9 35,21,13,8,11,9,8,5,8 patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 chr18:mr621 0.624000054239737 BCL2 1.4739795539267432e-09 KDSR 2.253337153854709e-06 SERPINB5,HMSD 0.0188981284877041,1.3204080353838545e-07 block_1_18_60984827_60987028 18 60984802 60987052 160 9 24,32,10,22,5,22,24,6,15 patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 chr18:mr621 0.624000054239737 BCL2 1.4739795539267432e-09 KDSR 2.253337153854709e-06 enhancer ,enhancer nan,nan block_0_18_60983893_60984144 18 60983868 60984168 22 4 8,4,9,1 patient_0,patient_3,patient_4,patient_7 BCL2 1.4739795539267432e-09 enhancer If no TAD information available then we can skip below two lines of check_block_gene_inTAD but use the exported file from collect_gene_names4blocks for step 10 gene ranking !! Step 9: ------- Check whether the block/gene/ in the same TAD or boundary .. code-block:: bash dds_analysis check_block_gene_inTAD --in_filtered_blockUqGene_file $OUT_FINAL/${IN_BLOCK_DMR_NAME}_deg_info_filtered_DMR_and_DEG_uniqGene.tsv \ --in_gene_position_file ${IN_GENE_POSITION} \ --in_TAD_position_file ${IN_TAD_INFO} \ --in_TAD_boundary_file ${IN_TAD_BOUNDARY} echo check_block_gene_inTAD - Done for DMR and DEG dds_analysis check_block_gene_inTAD --in_filtered_blockUqGene_file $OUT_FINAL/${IN_BLOCK_DMR_NAME}_deg_info_filtered_DMR_or_DEG_uniqGene.tsv \ --in_gene_position_file ${IN_GENE_POSITION} \ --in_TAD_position_file ${IN_TAD_INFO} \ --in_TAD_boundary_file ${IN_TAD_BOUNDARY} echo check_block_gene_inTAD - Done for DMR or DEG Step 10: -------- Rank gene list Here tss=4, tes=enhancer=gene=2, 5dist=1 If there is not a TAD information available for the filtering, then we can use $OUT_FINAL/${IN_BLOCK_DMR_NAME}_deg_info_filtered_DMR_or_DEG_uniqGene.tsv as input for unique_gene_file .. code-block:: bash dds_analysis dds_geneRanking --in_unique_gene_file $OUT_FINAL/${IN_BLOCK_DMR_NAME}_deg_info_filtered_DMR_or_DEG_uniqGene_commonTAD_Boundary_list2UqGene.tsv \ --in_DEG_file ${IN_EXPRESS} \ --in_DMR_file ${IN_DMR} -inCutoff 0.5 echo geneRanking - Done blocks_summary_block_position_0flank_0.6Proba_3blocks_2blocks2mr_2blocks2dmr_deg_info_filtered_DMR_or_DEG_uniqGene_commonTAD_Boundary_list2UqGene.tsv .. code-block:: bash gene_name gene_type block_id new_mr_sites patients isTAD enhancers patient_id BCL2 TSS~TSS~TSS block_2_18_60987833_60988552~block_1_18_60984827_60987028~block_0_18_60983893_60984144 chr18:mr621~chr18:mr621~nan 9~9~4 1.0~1.0~1.0 nan~enhancer~enhancer patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10~patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10~patient_0,patient_3,patient_4,patient_7 KDSR TES~TES block_2_18_60987833_60988552~block_1_18_60984827_60987028 chr18:mr621~chr18:mr621 9~9 1.0~1.0 nan~enhancer patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10~patient_0,patient_2,patient_3,patient_4,patient_5,patient_7,patient_8,patient_9,patient_10 Step 11: -------- Find enhancer target genes .. code-block:: bash echo " For results geneRanking in out_DmrDeg2block (either DMR_and_DEG or DMR_or_DEG" #if file name is known then skip this line #read -p "Enter dds_geneRanking exported blocks_summary_block_position_* file with full path for continuing : " IN_GENE2BLOCK IN_GENE2BLOCK='_deg_info_filtered_DMR_or_DEG_uniqGene_commonTAD_Boundary_list2UqGene.tsv' finds='.tsv' replace='_selectedGenes_gt_0.5.txt' IN_GENE2BLOCK=${IN_GENE2BLOCK//$finds/$replace} IN_GENE2BLOCK=$OUT_FINAL/${IN_BLOCK_DMR_NAME}${IN_GENE2BLOCK} echo "IN_GENE2BLOCK: $IN_GENE2BLOCK " SELECTED_GENE='BCL2' dds_analysis find_enhancer_target_genes --in_enhancer_file_folder ${IN_FOLDER}/in_enhancer/hg19_enhancer2gene_bed \ --in_DDS_file ${IN_GENE2BLOCK} \ --in_selected_gene ${SELECTED_GENE} --out_folder ${OUT_FOLDER}/out_enhancers echo find_enhancer_target_genes - Done - ${SELECTED_GENE} BCL2_blocks_sorted.bed : .. code-block:: bash chr18 60983893 60984144 block_0_18_60983893_60984144 nan TSS 4 enhancer chr18 60984827 60987028 block_1_18_60984827_60987028 chr18:mr621 TSS 9 enhancer Preprocess for next analysis: ----------------------------- Before running following steps, it assumes that DMRs are already predicted by dmr_analysis The script is used to prepare files for dds_analysis dTarget_methy_vs_express A description of parameters: in_folder: is path of DMRs mapped to genome in_string: string in exported file in_tss_file_mr: DMRs mapped to TSS regions in_dist_file: DMRs mapped to 5distance regions in_deg_file: differential expression file (bpb3 output format) out_folder: output file path tss_file: a predefined TSS regions for further analysis full_mr_file: a list of all ranked dmr from dmr_analysis in_genome_file: a bed formatted enhancer position file gene_col_name: column name of gene ID/name .. code-block:: bash dds_analysis preprocess \ -in_folder '../../data/fl_mr/out_data/out_map2genome/' \ -in_string '_fl' \ -in_tss_file_mr '../../data/fl_mr/out_data/out_map2genome/2_chroms_all_mr_data_range_dmrRanking_TSS_Up10000_Down10000_overlap1e-09.bed' \ -in_dist_file '../../data/fl_mr/out_data/out_map2genome/2_chroms_all_mr_data_range_dmrRanking_noGenes_5dist_Up1000000_Up10000_overlap1e-09.bed' \ -in_deg_file '../../data/fl_mr/in_data/in_DEG/differentially_expressed_genes_nonfolder.txt' \ -out_folder '../../data/fl_mr/out_data/' \ -tss_file '../../data/fl_mr/in_data/in_genome_regions/data/TSS_Up10000_Down10000.bed' \ -full_mr_file '../../data/fl_mr/in_data/DMR_CpG_context/2_chroms_all_mr_data_range_dmrRanking.bed' \ -in_genome_file '../../data/fl_mr/in_data/in_genome_regions/data/hg19_all_enhancers_merged_4dmr.bed' \ -gene_col_name '#gene' echo "To find DMR regions that are overlapping with TSS or 5distance regions of DEG - and preprocess Done" Further Steps: -------------- This script is used to provide a demo for performing dds_analysis dTarget_methy_vs_express on FL data by integrating DMR and DEG data together and predicting the putative target genes of a DMR based on their associations with either TSS or 5'distance reginos. .. code-block:: bash #main path of input data IN_DATA_PATH='../../data/fl_mr/' #path of DMR results from dmr_analysis IN_MR_PATH=${IN_DATA_PATH}'/in_data/DMR_CpG_context/' #path of DEG results from bpb3 IN_DEG_PATH=${IN_DATA_PATH}'/in_data/in_DEG/' #DEG file name from bpb3 differential_analysis, the original DEF file from bpb3 that was used to convert Zscores #in dds_analysis filterDEG4bpb3 IN_DEG_FILE='differentially_expressed_genes_nonfolder.txt' in_data_str='_fl' #path to output data OUT_PATH='../../data/fl_mr/out_data/' #path to exported MRs that are not located in TSS or enhancer regions FILE_FOLD=${OUT_PATH}/out4mr_not_in_tss_enhancer #file name for background sample list that contain all MRs not located in TSS or enhancers BACK_FILE=${OUT_PATH}/background_samples_list.tsv #whether to skip below two steps in the pipeline is_run_dmr_export=1 # 1 for exporting, 0 or other values for skipping this step is_run_dtarget=1 # 1 for run dTarget prediction , 0 or other values for skipping this step if [ $is_run_dmr_export == 1 ]; then ##15 here two input file are generated by dds_analysis preprocess #1. export data of dmr regions that are located in either TSS or 5distance regions by using dmr_analysis dmr_analysis dmr_exportData \ --input_mr_data_folder ${IN_MR_PATH} \ --output_file_folder ${OUT_PATH}/out4dmr_in_deg_tss_5dist \ --input_file_format 0 \ --number_of_processes 10 --input_file ${OUT_PATH}/uqdmr_regions_in_deg_tss_5dist${in_data_str}.bed -wtStr 'gcb_' echo "Export data of DMRs overlapping to TSS or 5distance - Done " echo "" #2. export data of mrs that are not in tss and enhancer dmr_analysis dmr_exportData \ --input_mr_data_folder ${IN_MR_PATH} \ --output_file_folder ${OUT_PATH}/out4mr_not_in_tss_enhancer \ --input_file_format 0 \ --number_of_processes 10 --input_file ${OUT_PATH}/mr_regions_not_in_enhancers${in_data_str}_tss.bed -wtStr 'gcb_' echo "Export data of MRs not in TSS or enhancers - Done " echo "" fi #end export data #16. #3. create background file list if it is not exist. #for large list of files, bash script may not work but a python script is needed! if ! [ -f $BACK_FILE ]; then echo $BACK_FILE " not exists and create one ! " if [ -e $FILE_FOLD ]; then ls ./${FILE_FOLD}/chr*/data/*raw*.* > $BACK_FILE echo "Create background file list in " $BACK_FILE else echo "Cannot create background file because no data folder find! " $FILE_FOLD fi fi #17. #4. to run dds_analysis dTarget_methy_vs_express for predicting putative target genes to DMRs based on gene expression profiles # unique gene to DMR file prepared by dds_analysis preprocess gene_mr_file=${OUT_PATH}/uqGeneDmr_regions_in_deg_tss${in_data_str}.bed #prepare a tab delimited gene expression file in which the group mean values and rratio are added. #this file will be used to plot average methylation levels of selected gene in TSS and Enhancer regions #After inputting a DEG file exported by bpb3 differential_expression, it exports a tab delimiated file #by adding three columns values of the group mean and rratio. #This filtered DEG file will only be used in plot_tss_enhancer_mrs for exporting data #and the function only consider input data as RPKM values dds_analysis filterDEG4bpb3 --in_group1_str '41' --in_group2_str 'SRR' \ --in_folder ${IN_DEG_PATH} \ --in_file ${IN_DEG_FILE} \ --min_median_RPKM 0 --rr_cutoff 0.1 #we can skip this manual input step if know the input gene expression file name if [ 1 == 2 ]; then echo "" read -p "To continue please copy the exported zscore cluster file name and path from bpb3 filterDEG4bpb3 then click return: " gene_exp_file echo "" echo "" echo "gene_exp_file is : $gene_exp_file " echo "" read -p "To continue please copy the exported group mean file name and path from bpb3 filterDEG4bpb3 then click return: " IN_DEG_FILE echo "" echo "" echo "IN_DEG_FILE is : $IN_DEG_FILE " echo "" echo "" fi #end test #Here, we assume the input gene exp file name is known gene_exp_file0=${IN_DEG_PATH}/${IN_DEG_FILE} #here we assume file name end with .txt finds='.txt' replace1='_rratio_filtered4cluster.csv' replace2='_rratio_filtered.csv' gene_exp_file=${gene_exp_file0//$finds/$replace1} IN_DEG_FILE=${gene_exp_file0//$finds/$replace2} echo "" echo "gene_exp_file is : $gene_exp_file " echo "" echo "IN_DEG_FILE is : $IN_DEG_FILE " echo "" # path of DMRs associated with DEG, TSS and 5distance, prepared by run step 1. in_mr_data_folder=${OUT_PATH}/out4dmr_in_deg_tss_5dist # a file for a list of background MR samples in_background_mr_file=$BACK_FILE #number of random sampling for the test number_of_samples=10 if [ $is_run_dtarget == 1 ]; then #18. here for a demo purpose the p-val reg_cutoff >0.05 , but it shall use a pval at least <0.05 and the number of samples >=1000 in real prediction ! #or users can filter the results based on exported data. #5. to test target gene and DMRs association from TSS regions dds_analysis dTarget_methy_vs_express -inGeneMRfile $gene_mr_file -mrTAB \ -inGeneEXPfile $gene_exp_file -expTAB \ -inMRfolder $in_mr_data_folder -outName 'tss_region_' \ -output_path $OUT_PATH -sampleName sample_name4replace.tsv \ -pathDepth 1 -inBackgroundList $in_background_mr_file -reg_cutoff 0.1 -cutoff 0.5 -totalSamples $number_of_samples -numOfprocesses 10 echo "Done with TSS target gene prediction" echo "" #19. #6. to test target gene and DMRs association from 5dist regions gene_mr_file=${OUT_PATH}/uqGeneDmr_regions_in_deg_5dist${in_data_str}_overlap_enhancer.bed dds_analysis dTarget_methy_vs_express -inGeneMRfile $gene_mr_file -mrTAB \ -inGeneEXPfile $gene_exp_file -expTAB \ -inMRfolder $in_mr_data_folder -outName 'distance_region_' \ -output_path $OUT_PATH -sampleName sample_name4replace.tsv \ -pathDepth 1 -inBackgroundList $in_background_mr_file -reg_cutoff 0.1 -cutoff 0.1 -totalSamples $number_of_samples -numOfprocesses 10 echo "Done with 5distance region target gene prediction" echo "" fi #end dTarget run Plotting: _________ Plot selected target gene and DMR associations .. code-block:: bash echo ${gene_exp_file} echo ${OUT_PATH}/out4dmr_in_deg_tss_5dist dds_analysis plot_mr_vs_exp -inGeneEXPfile ${gene_exp_file} \ -dpi 300 -inMRfolder ${OUT_PATH}/out4dmr_in_deg_tss_5dist \ -expTAB -inGene 'BCL2' -inMR 'chr18:mr639' -wtStr 'gcb_' -output_path ${OUT_PATH} Plotting of methylation region 639 in chromosome 18 can be seen in the figure given below: .. image:: BCL2_chr18_mr639.jpg :alt: BCL2 vs Chr18:mr639 Plotting: _________ Plot average methylation in TSS and enhancer regions for selected target gene .. code-block:: bash dds_analysis plot_tss_enhancer_mrs \ -exp_file $IN_DEG_FILE \ -dmr_file ${IN_MR_PATH}/2_chroms_all_mr_data_range_dmrRanking.bed \ -tss_file ${OUT_PATH}'/tss_region_'${number_of_samples}'sampling.csv' \ -enc_file ${OUT_PATH}'/distance_region_'${number_of_samples}'sampling.csv' \ -is_negative 2 -genes 'BCL2,KIAA1468,AKAP17A' -mr_folder ${OUT_PATH}/out4dmr_in_deg_tss_5dist/ \ -folder_name '' --dmr_file_not_compressed \ -gX 2000 -gY 1000 -wtStr 'gcb_' \ -out_folder ${OUT_PATH}/plot_tss_enhancer_mrs echo "Done with plot_tss_enhancer_mrs" .. image:: flmr_5mC_TSS5dist_X2000_Y1000_G2000_binsize100_sigma50_all_DMRs_2023-05-24.jpg :alt: Enhancer region with methylation region for BCL2