demoHAP1

Demo for hap1 cells is provided with the package. There are two bash scripts provided inside the demo/demo2_hap1_cell/. First one is run_hap1_preprocess.sh and second is run_hap1_cell where run_hap1_preprocess.sh should be run first. Both are explained as under.

Step 1: dds_analysis preprocess

The dds_analysis preprocess step prepares the input files for further analysis using the dds_analysis module. It requires specifying various parameters and input file paths. Here is the code:

# Before running following steps, it assumes that DMRs are already predicted by dmr_analysis

dds_analysis preprocess \
      -in_folder ../../data/hap1_cell/in_data/final_demo_data/hap1_cell/out_data/DMR_CpG_context/out_map2genome/ \
      -in_string '_hap1' \
      -in_tss_file_mr ../../data/hap1_cell/in_data/final_demo_data/hap1_cell/out_data/DMR_CpG_context/out_map2genome/3_chroms_all_mr_data_range_dmrRanking_TSS_Up5000_Down1000_removedShort_overlap1e-09.bed \
      -in_dist_file ../../data/hap1_cell/in_data/final_demo_data/hap1_cell/out_data/DMR_CpG_context/out_map2genome/3_chroms_all_mr_data_range_dmrRanking_noGenes_5dist_Up1000000_Up5000removedShort_overlap1e-09.bed \
      -in_deg_file ../../data/hap1_cell/in_data/final_demo_data/hap1_cell/in_data/DEG/HAP1_P1_vs_HAP1_KO1_differentially_expressed_genes_min1.1Fd_min1RPKM.txt \
      -out_folder ../../data/hap1_cell/out_data/ \
      -tss_file  ../../data/hap1_cell/in_data/final_demo_data/hap1_cell/out_data/DMR_CpG_context/data/TSS_Up5000_Down1000_removedShort.bed \
      -full_mr_file ../../data/hap1_cell/in_data/final_demo_data/hap1_cell/out_data/DMR_CpG_context/3_chroms_all_mr_data_range_dmrRanking.bed \
      -in_genome_file ../../data/hap1_cell/in_data//final_demo_data/genome/hg38/hg38_all_enhancers_merged_hglft_genome_327b3_4dmr.bed \
      -gene_col_name '#gene'

echo "To find DMR regions that are overlapping with TSS or 5distance regions of DEG - and preprocess Done"

Here head of MR file with TSS region used as input is displayed: 3_chroms_all_mr_data_range_dmrRanking_TSS_Up5000_Down1000_removedShort_overlap1e-09.bed

chr3    8500658 8506658 chr3:8500658:8506658:NR_033378||TSS:5000:1000||LMCD1-AS1:-:8221146:8501658      chr3    8462210 8501263 chr3:mr61:hypo:U        0.009313857563925538
chr3    8500658 8506658 chr3:8500658:8506658:NR_033378||TSS:5000:1000||LMCD1-AS1:-:8221146:8501658      chr3    8502690 8519521 chr3:mr62:hypo:D        0.9473746593032959
chr3    8496806 8502806 chr3:8496806:8502806:NM_001278234||TSS:5000:1000||LMCD1:+:8501806:8568125       chr3    8462210 8501263 chr3:mr61:hypo:U        0.009313857563925538
chr3    8496806 8502806 chr3:8496806:8502806:NM_001278234||TSS:5000:1000||LMCD1:+:8501806:8568125       chr3    8502690 8519521 chr3:mr62:hypo:D        0.9473746593032959
chr3    8496822 8502822 chr3:8496822:8502822:NM_001278233&NM_001278235||TSS:5000:1000||LMCD1:+:8501822:8574668&8551274  chr3    8462210 8501263 chr3:mr61:hypo:U        0.009313857563925538
chr3    8496822 8502822 chr3:8496822:8502822:NM_001278233&NM_001278235||TSS:5000:1000||LMCD1:+:8501822:8574668&8551274  chr3    8502690 8519521 chr3:mr62:hypo:D        0.9473746593032959

While head of methylation regions file overlapped with 5 dist region is following: 3_chroms_all_mr_data_range_dmrRanking_noGenes_5dist_Up1000000_Up5000removedShort_overlap1e-09.bed

chr3    7540284 8535284 chr3:7540284:8535284:NR_046606||5dist:5000:1000000||GRM7-AS1:-:7519740:7535284  chr3    8101574 8101679 chr3:mr0:mix:U  0.0020392338824331358
chr3    7540284 8535284 chr3:7540284:8535284:NR_046606||5dist:5000:1000000||GRM7-AS1:-:7519740:7535284  chr3    8102773 8107735 chr3:mr1:mix:D  0.9863575510990631
chr3    7540284 8535284 chr3:7540284:8535284:NR_046606||5dist:5000:1000000||GRM7-AS1:-:7519740:7535284  chr3    8119920 8125713 chr3:mr3:mix:D  0.9800274651202444
chr3    7540284 8535284 chr3:7540284:8535284:NR_046606||5dist:5000:1000000||GRM7-AS1:-:7519740:7535284  chr3    8136677 8139855 chr3:mr4:mix:D  0.9795108977746336

Input file that consist of methylation regions is following: 3_chroms_all_mr_data_range_dmrRanking.bed

chr3    8101574 8101679 chr3:mr0:mix:U  0.0020392338824331358
chr3    8102773 8107735 chr3:mr1:mix:D  0.9863575510990631
chr3    8108750 8115603 chr3:mr2:mix:D  0.9763066966833287
chr3    8119920 8125713 chr3:mr3:mix:D  0.9800274651202444
chr3    8136677 8139855 chr3:mr4:mix:D  0.9795108977746336

Differentially expression file is below: HAP1_P1_vs_HAP1_KO1_differentially_expressed_genes_min1.1Fd_min1RPKM.txt

AADAT   0.006070520526336222    16.90556267502578       15.3097562941675        16.90556267502578       13.389386678473848      12.573766472283713      11.850516581141726
AARS    0.0006937443220722942   58.2440643381434        58.810394758261374      59.96440144545457       33.27901746502996       32.504309785698375      29.03704380096956
AATK    0.02271698518484375     2.5554957565029683      2.643258236240186       2.0719454919445686      3.1558313097737094      3.4185648937579978      3.93619336697585
ABAT    1.9842909957276316e-05  0.36312747169342474     0.4141701884134888      0.4404845359100948      1.1351009276855188      1.1024911071781391      1.1728150315920693
ABCA1   0.013656558666489764    3.074263544209392       2.5879886642784986      2.4299433764226444      1.6758838422325957      1.4414555073263386      1.3930064239495081

Set paths:

#main path of input data
IN_DATA_PATH='../../data/hap1_cell/in_data/final_demo_data/hap1_cell/'

#path of DMR results from dmr_analysis
IN_MR_PATH=${IN_DATA_PATH}'/out_data/DMR_CpG_context/'

#path of DEG results from bpb3
IN_DEG_PATH=${IN_DATA_PATH}'/in_data/DEG/'

#DEG file name from bpb3 differential_analysis, the original DEF file from bpb3 that was used to convert Zscores in dds_analysis preprocess
IN_DEG_FILE='HAP1_P1_vs_HAP1_KO1_differentially_expressed_genes_min0Fd_min0RPKM.txt '
in_data_str='_hap1'

#path to output data
OUT_PATH='../../data/hap1_cell/out_data/'

#path to exported MRs that are not located in TSS or enhancer regions
FILE_FOLD=${OUT_PATH}/out4mr_not_in_tss_enhancer
#file name for background sample list that contain all MRs not located in TSS or enhancers
BACK_FILE=${OUT_PATH}/background_samples_list.tsv

#whether to skip below two steps in the pipeline
is_run_dmr_export=1 # 1 for exporting, other values for skipping this step
is_run_dtarget=1    # 1 for run dTarget prediction , other values for skipping this step

Step 2: DMR data export

The DMR data export step involves exporting DMR data that is either located in TSS or 5’distance regions. It uses the dmr_exportData command from the dmr_analysis module. Here is the code:

# Before running this step, ensure DMRs are already predicted by dmr_analysis

dmr_analysis dmr_exportData \
      --input_mr_data_folder ${IN_MR_PATH} \
      --output_file_folder ${OUT_PATH}/out4dmr_in_deg_tss_5dist \
      --input_file_format 0 \
      --number_of_processes 10 --input_file ${OUT_PATH}/uqdmr_regions_in_deg_tss_5dist${in_data_str}.bed -wtStr 'HAP1_P_'

echo "Export data of DMRs overlapping to TSS or 5distance - Done"
 chr13       95217448        95424140        chr13:mr8:hypo:D        chr13:95300446:95306446:NM_005845||TSS:5000:1000||ABCC4:-:95019828:95301446;chr13:95300451:95306451:NM_001301829&NM_001105515||TSS:5000:1000||ABCC4:-:95019834&95095770:95301451
 chr3        9055959 9261176 chr3:mr88:mix:D chr3:9248646:9254646:NM_014850||TSS:5000:1000||SRGAP3:-:8980590:9249646


# Export data of MRs not in TSS or enhancers
dmr_analysis dmr_exportData  \
      --input_mr_data_folder ${IN_MR_PATH} \
      --output_file_folder ${OUT_PATH}/out4background \
      --input_file_format 0 \
      --number_of_processes 10 --input_file ${OUT_PATH}/uqdmr_regions_not_in_deg_tss_5dist${in_data_str}.bed -wtStr 'HAP1_P_'

echo "Export data of MRs not in TSS or enhancers - Done"

Step 3: Background file list creation

The background file list creation step involves creating a background file list if it doesn’t already exist. The background file list contains all MRs that are not located in TSS or enhancer regions. Here is the code:

background_filelist="${OUT_PATH}/backgroundFileList.txt"

if [[ ! -f "$background_filelist" ]]; then
    cd ${OUT_PATH}/out4background
    ls | grep ".bed" > $background_filelist
    cd -
    echo "Creating a list of background files - Done"
else
    echo "Background file list already exists."
fi

echo "Background file list creation - Done"

Step 4: DEG File preparation:

Prepare a tab delimited gene expression file in which the group mean values and rratio are added. This file will be used to plot average methylation levels of selected gene in TSS and Enhancer regions After inputting a DEG file exported by bpb3 differential_expression, it exports a tab delimited file by adding three columns values of the group mean and rratio. This filtered DEG file will only be used in plot_tss_enhancer_mrs for exporting data This function only consider input data as RPKM values

gene_mr_file=${OUT_PATH}/uqGeneDmr_regions_in_deg_tss${in_data_str}.bed



dds_analysis filterDEG4bpb3 --in_group1_str 'HAP1_P1' --in_group2_str 'HAP1_KO1' \
        --in_folder ${IN_DEG_PATH} \
        --in_file ${IN_DEG_FILE} \
        --min_median_RPKM 0 --rr_cutoff 0.0

#we can skip this manual input step if know the input gene expression file name
if [ 1 == 2 ];
then
read -p "To continue please copy the exported zscore cluster file name and path from bpb3 filterDEG4bpb3 then click return: " gene_exp_file

echo "gene_exp_file is :  $gene_exp_file "
read -p "To continue please copy the exported group mean file name and path from bpb3 filterDEG4bpb3 then click return: " IN_DEG_FILE

echo "IN_DEG_FILE is :  $IN_DEG_FILE "

fi
#end test

We assume know the input gene exp file name gene_exp_file0=${IN_DEG_PATH}/${IN_DEG_FILE} Here we assume file name end with .txt

finds='.txt'
replace1='_rratio_filtered4cluster.csv'
replace2='_rratio_filtered.csv'
gene_exp_file=${gene_exp_file0//$finds/$replace1}
IN_DEG_FILE=${gene_exp_file0//$finds/$replace2}
echo ""
echo "gene_exp_file is :  $gene_exp_file "
echo ""
echo "IN_DEG_FILE is :  $IN_DEG_FILE "
echo ""

# path of DMRs associated with DEG, TSS and 5distance, prepared by run dds_analysis preprocess
in_mr_data_folder=${OUT_PATH}/out4dmr_in_deg_tss_5dist
# a file for a list of background samples
in_background_mr_file=$BACK_FILE
#number of random sampling for the test
number_of_samples=10

HAP1_P1_vs_HAP1_KO1_differentially_expressed_genes_min1.1Fd_min1RPKM_rratio_filtered4cluster.txt head is following:

FAM166AP3   -0.4982081983511595     -0.9394358401186298     -0.6707136220439952     -1.7706649617423529     -1.7471009146010719     -1.759213053860002
RP11-561I11.3       -0.7538360859791011     -0.8163195171038196     -1.0486517498078607     -1.7688207617054021     -1.745213065641895      -1.7573721462587126
AC009299.4  -1.240842456343193      -0.9978063637216715     -0.8863905055414616     -1.7671318296845058     -1.743530972980885      -1.7556519670664428
FAM41C      -1.1073411320948858     -1.332098180960522      -1.106948209449783      -1.7284061127409744     -1.7596852452138803     -1.7717564601163207
ARSE        -0.8784524519818689     -1.0690546795243978     -0.8785276236434105     -1.7487097864101793     -1.7626589252195035     -1.639011789153589
RP11-442H21.2       0.05741831209072534     -0.13825346076745668    -0.22036980044826926    -1.3796771543980877     -1.529168641624075      -1.7594854289267512
RP11-422P24.12      -1.096537825707665      -1.0424048390869258     -1.282182441903589      -1.7668478469733657     -1.7432581180749236     -1.6746361853756075
RP4-590F24.2        -1.1047323074565782     -1.228979590661412      -1.1307512157539712     -1.676088854525317      -1.760365854510401      -1.755233404634837
MIR4419A    -0.16271440608073698    -0.16705481613532722    -0.49940830855525836    -1.582622407034913      -1.545955456168695      -1.5599446568981838
INHBE       0.4194153086557727      0.37903743335847656     0.4158868327542566      -1.3577006540490815     -1.153783055168014      -1.4861394387459987
CHAC1       2.3416693314259813      2.3286405094448774      2.3462661518248256      0.2075086031948541      0.21713102593121023     0.05283409591964295

Step 5: dTarget Prediction based in expression profiles:

The dTarget prediction step predicts putative target genes for DMRs based on gene expression profiles. It performs the prediction separately for DMRs associated with TSS regions and 5’distance regions. Here is the code:

# Predict target genes for DMRs overlapping with TSS regions
dds_analysis dTarget_methy_vs_express -inGeneMRfile $gene_mr_file  -mrTAB  \
     -inGeneEXPfile $gene_exp_file -expTAB \
     -inMRfolder $in_mr_data_folder -outName 'tss_region_' \
     -output_path $OUT_PATH -sampleName sample_name4replace.tsv \
     -pathDepth 1 -inBackgroundList $in_background_mr_file -reg_cutoff 0.5 -cutoff 1.0 -totalSamples $number_of_samples -numOfprocesses 10

 echo "Done with TSS target gene prediction"

 echo "Target gene prediction for DMRs overlapping with TSS regions - Done"

# Predict target genes for DMRs associated with 5'distance regions

 dds_analysis dTarget_methy_vs_express -inGeneMRfile $gene_mr_file -mrTAB  \
     -inGeneEXPfile $gene_exp_file -expTAB \
     -inMRfolder $in_mr_data_folder -outName 'distance_region_'  \
      -output_path $OUT_PATH -sampleName sample_name4replace.tsv \
     -pathDepth 1 -inBackgroundList $in_background_mr_file -reg_cutoff 0.5 -cutoff 1.0 -totalSamples $number_of_samples -numOfprocesses 10

echo “Done with 5distance region target gene prediction”

echo “Target gene prediction for DMRs associated with 5’distance regions - Done”

Step 6: Plotting selected target gene and DMR associations

This step involves plotting the associations between selected target genes and DMRs based on gene expression profiles. Here is the code:

echo ${gene_exp_file}
echo  ${OUT_PATH}/out4dmr_in_deg_tss_5dist
dds_analysis plot_mr_vs_exp -inGeneEXPfile ${gene_exp_file}  \
        -dpi 300 -inMRfolder ${OUT_PATH}/out4dmr_in_deg_tss_5dist \
    -expTAB -inGene 'TRIM32' -inMR 'chr9:mr104' -wtStr 'HAP1_P' -output_path ${OUT_PATH}
echo "Done with plot_mr_vs_exp "
../_images/Trim32_5mC_TSS5dist_X2000_Y1000_G2000_binsize100_sigma50_neg_DMRs_2023-05-24.jpg

Step 7: Plotting average methylation pattern

The final step involves plotting the average methylation in TSS and enhancer regions for selected target gene. Here is the code:

dds_analysis plot_tss_enhancer_mrs \
    -exp_file $IN_DEG_FILE \
    -dmr_file ${IN_MR_PATH}/3_chroms_all_mr_data_range_dmrRanking.bed  \
    -tss_file ${OUT_PATH}/tss_region_10sampling.csv  \
    -enc_file ${OUT_PATH}/distance_region_10sampling.csv \
    -is_negative 1 -genes 'MRPS25,PLCL2,TRIM32' -mr_folder ${OUT_PATH}/out4dmr_in_deg_tss_5dist/ \
    -folder_name '' --dmr_file_not_compressed \
    -gX 2000 -gY 1000 -wtStr 'HAP1_P_' \
    -out_folder ${OUT_PATH}/plot_tss_enhancer_mrs

echo "Done with plot_tss_enhancer_mrs"
Enhancer vs Methylation level