PYTHON๐Ÿ

[Tool] ArtDeCo tool ์ด์šฉํ•ด contamination ์•Œ์•„๋‚ด๊ธฐ

Heeseo Cho 2024. 3. 22. 15:33

Sample contamination ์ด๋ž€ NGS ์‹คํ—˜ํ•˜๋Š” ์‚ฌ๋žŒ์ด๋ผ๋ฉด ๋ˆ„๊ตฌ๋‚˜ ํ•œ๋ฒˆ์ฏค ์ƒ๊ฐํ•ด๋ณผ๋ฒ•ํ•œ ์ด์Šˆ์ด๋‹ค.

์‹คํ—˜ํ•˜๋‹ค๊ฐ€ ์˜†์— well์— contam ๋˜๊ธฐ๋Š” ๋งค์šฐ ์‰ฝ์ง€๋งŒ, ์–ด๋–ค ์ƒ˜ํ”Œ์— ์˜ํ•ด contam์ด ๋˜์—ˆ๋Š”์ง€ ์•Œ์•„๋‚ด๊ธฐ๋Š” ์–ด๋ ต๋‹ค.

์ด๋Ÿฌํ•œ contamination issue๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ช‡๊ฐ€์ง€ tool์„ ์ฐพ์•„๋ณธ ๊ฒฐ๊ณผ ArtDeCo๋ผ๋Š” tool์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค.

 

1. ArtDeCo ๋ž€?

NGS๋ฅผ ์ด์šฉํ•œ DNA sample ์‚ฌ์ด์˜ cross-contamination์„ ์•Œ์•„๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ๊ฐœ๋ฐœํ•œ tool ์ด๋‹ค.

์•„๋ž˜ ๋…ผ๋ฌธ์„ ๋ณด๋ฉด batch ๋‚ด ์ƒ˜ํ”Œ๋“ค์—๊ฒŒ์„œ ์ค‘๋ณต๋˜๋Š” SNV์˜ AR(Alleic ratio) ํ™•์ธํ•˜์—ฌ ์ƒ˜ํ”Œ๋“ค์˜ Contamination์„ ํ™•์ธํ•œ๋‹ค.

 

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03551-0

 

ARTDeco: automatic readthrough transcription detection - BMC Bioinformatics

Background Mounting evidence suggests several diseases and biological processes target transcription termination to misregulate gene expression. Disruption of transcription termination leads to readthrough transcription past the 3′ end of genes, which ca

bmcbioinformatics.biomedcentral.com

 

์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•ด ์ข€ ๋” ์ž์„ธํžˆ ์„ค๋ช…ํ•˜์ž๋ฉด WCS ์˜ percentage of contaminaiton์„ ํ†ตํ•ด contamiantion ์„ ์ฐพ๋Š”๋‹ค.

WCS(Worst case scenario)=max((r*2);(1-a)*2)
1. WCS percentage of contamination >= 1% ์ธ ๊ฒฝ์šฐ, contamination ์žˆ๋‹ค๊ณ  ์˜ˆ์ธก
 
2. ์˜ค์—ผ ํ™•์ธ์„ ์œ„ํ•ด contamination sample์˜ ์‹๋ณ„ํ•จ. àcontamination sample๊ณผ ๋‹ค๋ฅธ sample์˜ genotype์„ ๋น„๊ตํ•จ (์˜ค์—ผ์ƒ˜ํ”Œ์˜ homozygous SNP๋งŒ ์‚ฌ์šฉ๋˜๋ฉฐ, hetero SNP๋Š” variability๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ์ œ์™ธ. ์ฆ‰, Ref/Ref ์™€ Alt/Alt ์™€ ๊ฐ™์€ homozygous SNP ๋งŒ ์ด์šฉ)

 

3. AR<0.25, ๋˜๋Š” >0.75 ๋งŒ ์‚ฌ์šฉํ•˜์˜€๋‹ค. (์˜ค์—ผ๋˜์—ˆ๊ฑฐ๋‚˜, ์˜ค์—ผ๋˜์ง€ ์•Š์•˜์ง€๋งŒ noise ์กด์žฌํ•˜๋Š” SNP ํฌํ•จ(<0.005, >0.995))

 

2. Tool ์‚ฌ์šฉ๋ฒ•

ArtDeCo๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ช‡๊ฐ€์ง€ input์ด ํ•„์š”ํ•˜๋‹ค.

ํฌ๊ฒŒ 2๊ฐ€์ง€ ํŒŒ์ผ์ด ํ•„์š”ํ•œ๋ฐ, 1. SNP ์ •๋ณด๊ฐ€ ์žˆ๋Š” ํŒŒ์ผ, 2.coverage input ์ด๋ ‡๊ฒŒ ํ•„์š”ํ•˜๋‹ค.

๊ทธ๋ฆฌ๊ณ  ๋งˆ์ง€๋ง‰์— output ํŒŒ์ผ ๊ฒฝ๋กœ๋„ ์จ์ฃผ๋ฉด ์ข‹๋‹ค.

 

1.SNP ์ •๋ณด ํŒŒ์ผ

SNP ์ •๋ณดํŒŒ์ผ์€ 1000Genome, KRGDB์™€ ๊ฐ™์ด reference๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” SNP reference ํŒŒ์ผ์ด ํ•„์š”ํ•˜๋‹ค.

๋…ผ๋ฌธ์—์„œ๋Š” 1000Genome database๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

2.coverage input

coverage inputํŒŒ์ผ์€ bamํŒŒ์ผ์„ ์ด์šฉํ•˜์—ฌ depth ๋ฅผ ๊ณ„์‚ฐํ•ด์„œ ๊ฐ base์˜ coverage๋ฅผ ๋งŒ๋“ค์–ด์ค˜์•ผํ•œ๋‹ค.

์ด๋•Œ GATK์˜ depth of coverage๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

 

์œ„์˜ ํŒŒ์ผ๋“ค์ด ์ค€๋น„๋˜์—ˆ์œผ๋ฉด ์•„๋ž˜์™€ ๊ฐ™์€ Command๋กœ ArtDeCo๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ๋œ๋‹ค.