Telomere-to-Telomere Y Chromosome Experiments
The Telomere-to-Telemere (T2T) consortium released the full assembly of NA24385's Y chromosome. This presents some interesting opportunities to attempt to access more of the regions left undefined in GRCh38. To that end we have created a reference combining the previously released as CHM13. The combined reference was then used with bwa-mem2 to align a WGS sample from each major subclade for plotting a callable loci histogram.
Zimin et al. 2021, A reference-quality, fully annotated genome from a Puerto Rican individual added CM034974.1 for HG01243. This individual is R1b-DF27 and therefore closer to the GRCh38 reference.
Sample | Approximate HG | Test Type | Accession | Callable | Poor Mapping | Histogram |
---|---|---|---|---|---|---|
WGS1213 | A00 | 30x WGS150 | CP086569.1 | 15,254,009 | 21,161,595 | |
HG01890 | A0 | 40x WGS150 | CP086569.1 | 16,615,534 | 19,895,879 | |
HG02613 | A1a | 40x WGS150 | CP086569.1 | 16,586,917 | 29,939,569 | |
SAMEA3302894 | A1b1 | 40x WGS100 | CP086569.1 | 15,345,990 | 33,570,375 | |
HG03225 | B-FT315355 | 40x WGS150 | CP086569.1 | 16,704,898 | 29,629,714 | |
HG00628 | callertypeC-MF2235 | 40x WGS150 | CP086569.1 | 16,923,490 | 24,421,698 | |
NA19004 | D-CTS131 | 40x WGS150 | CP086569.1 | 16,982,847 | 28,296,213 | |
NA20348 | E2b-CTS1441 | 40x WGS150 | CP086569.1 | 16,907,287 | 30,656,243 | |
HG02040 | F1 | 40x WGS150 | CP086569.1 | 16,970,015 | 35,699,948 | |
NA20870 | G-Z3043 | 40x WGS150 | CP086569.1 | 17,093,678 | 24,008,773 | |
HG02686 | H-Z34945 | 40x WGS150 | CP086569.1 | 17,152,735 | 27,197,525 | |
HG00360 | I2a-CTS695 | 40x WGS150 | CP086569.1 | 17,192,655 | 32,083,004 | |
NA24385 | J1a | 100x WGS250 | CP086569.1 | 19,964,587 | 42,398,216 | |
HG01258 | J1b-YP1273 | 40x WGS150 | CP086569.1 | 17,403,602 | 28,031,420 | |
HG03790 | L-M2355 | 40x WGS150 | CP086569.1 | 17,030,927 | 32,256,361 | |
SAMEA3302892 | M-Z42281 | 40x WGS100 | CP086569.1 | 15,618,252 | 29,181,429 | |
NA18748 | N-CTS582 | 40x WGS150 | CP086569.1 | 16,997,997 | 29,733,540 | |
NA18647 | O1a-MF643967 | 40x WGS150 | CP086569.1 | 17,056,721 | 34,581,137 | |
HG03914 | Q-YP748 | 40x WGS150 | CP086569.1 | 17,150,579 | 24,771,588 | |
WGS229 | R1b-CTS4466 | 30x WGS150 | CM034974.1 | 16,873,503 | 23,373,816 | |
CP086569.2 | 16,650,411 | 23,674,644 | ||||
SAMEA3302626 | S-Z33752 | 50x WGS100 | CP086569.1 | 15,605,348 | 35,614,665 | |
NA20758 | T-CTS54 | 40x WGS150 | CP086569.1 | 17,096,414 | 24,671,518 |
Sample | Subclade | Test Type | Total Reads | Accession | Y Reads | Callable | Poor Mapping | Low Coverage | No Coverage | Homozygous SNPs |
---|---|---|---|---|---|---|---|---|---|---|
HG005 aka NA24143 | D-CTS932 | PacBio | 9,730,321 | CM034974.1 | 78,098 | 21,033,140 | 15,500,660 | 14,131,990 | 15,067,964 | 24,755 |
CP086569.2 | 78,613 | 20,784,173 | 15,571,397 | 15,083,428 | 11,021,031 | 19,815 | ||||
HG006 aka NA24594 | D-CTS932 | PacBio | 7,592,384 | CM034974.1 | 68,487 | 21,150,279 | 18,512,662 | 14,825,451 | 7,991,795 | 37,703 |
CP086569.2 | 68,935 | 20,922,984 | 19,932,169 | 14,581,211 | 7,023,665 | 28,665 | ||||
HG02486 | E | PacBio | 6,766,931 | CM034974.1 | 63,201 | 21,473,376 | 18,777,624 | 14,558,065 | 7,671,122 | 39,626 |
CP086569.2 | 63,380 | 20,834,760 | 20,298,306 | 15,009,225 | 6,317,738 | 33,564 | ||||
HG02572 | E1a | PacBio | 4,959,945 | CM034974.1 | 41,808 | 21,520,972 | 16,183,558 | 15,596,949 | 9,178,708 | 41,808 |
CP086569.2 | 47,393 | 20,796,414 | 17,744,846 | 15,608,384 | 8,310,385 | 31,937 | ||||
HG03098 | E1a | PacBio | 5,720,080 | CM034974.1 | 52,376 | 21,862,244 | 18,054,828 | 14,651,183 | 7,911,932 | 40,996 |
CP086569.2 | 52,377 | 21,053,190 | 19,022,551 | 16,070,689 | 6,313,599 | 32,677 | ||||
HG01106 | E1b | PacBio | 6,660,104 | CM034974.1 | 73,992 | 21,010,710 | 24,938,293 | 11,367,766 | 5,163,418 | 52,698 |
CP086569.2 | 74,681 | 20,823,985 | 26,344,273 | 11,316,395 | 3,975,376 | 42,951 | ||||
HG01109 | E1b | PacBio | 4,795,279 | CM034974.1 | 40,277 | 21,541,073 | 15,300,209 | 16,121,354 | 9,517,551 | 34,979 |
CP086569.2 | 40,389 | 21,033,765 | 16,392,387 | 16,236,877 | 8,797,000 | 30,245 | ||||
HG02055 | E1b | PacBio | 6,390,548 | CM034974.1 | 56,821 | 21,681,642 | 18,068,529 | 14,891,146 | 7,838,870 | 41,986 |
CP086569.2 | 56,916 | 20,806,458 | 19,510,255 | 15,850,014 | 6,293,302 | 35,737 | ||||
HG02145 | E1b | PacBio | 6,253,441 | CM034974.1 | 26,558 | 20,844,546 | 8,883,824 | 20,755,633 | 11,996,184 | 28,246 |
CP086569.2 | 26,765 | 20,156,734 | 9,520,859 | 22,151,520 | 10,630,916 | 24,267 | ||||
HG02717 | E1b | PacBio | 8,437,382 | CM034974.1 | 72,150 | 21,477,696 | 18,606,132 | 14,551,331 | 7,845,028 | 39,173 |
CP086569.2 | 72,524 | 20,656,240 | 20,108,707 | 15,436,045 | 6,259,037 | 31,389 | ||||
HG03579 submitted by open-genomes.org | G-Z1911 | PacBio | PR1/CM034974.1 | 68,238 | 23,122,656 | 20,969,062 | 13,337,800 | 5,050,669 | 50,946 | |
PR1/CP086569.1 | 68,034 | 22,501,167 | 22,919,714 | 12,782,183 | 4,253,768 | 40,413 | ||||
HG03579 | G-Z1911 | PacBio | 7,835,850 | CHM13/CM034974.1 | 67,987 | 23,133,136 | 21,041,524 | 13,288,135 | 5,017,392 | 51,154 |
CHM13/CP086569.2 | 67,971 | 22,416,564 | 22,808,334 | 12,870,510 | 4,364,621 | 40,330 | ||||
HG002 aka NA24385 | J1a | PacBio | 8,963,661 | CM034974.1 | 75,299 | 28,399,685 | 15,243,611 | 12,641,758 | 6,195,133 | 33,063 |
CP086569.2 | 76,036 | 39,426,936 | 12,080,876 | 10,779,593 | 172,624 | 7 | ||||
HG003 aka NA24149 | J1a | PacBio | 12,896,867 | CM034974.1 | 116,815 | 30,526,251 | 21,355,454 | 7,059,384 | 3,526,251 | 56,491 |
CP086569.2 | 119,185 | 44,299,910 | 13,092,169 | 4,973,221 | 94,729 | 72 | ||||
HG01258 | J1b-YP1273 | PacBio | 5,736,451 | CM034974.1 | 47,067 | 25,546,746 | 16,466,468 | 12,512,788 | 7,954,185 | 49,987 |
CP086569.2 | 47,389 | 27,046,827 | 16,589,688 | 11,116,730 | 7,706,784 | 22,524 | ||||
HG00621 | O2a | PacBio | 5,709,327 | CM034974.1 | 46,995 | 22,667,644 | 17,123,691 | 13,766,385 | 8,922,467 | 46,921 |
CP086569.2 | 47,240 | 22,433,710 | 17,801,371 | 14,000,663 | 8,224,285 | 35,117 | ||||
HG00673 | O2a | PacBio | 6,030,186 | CM034974.1 | 57,411 | 22,632,708 | 19,530,346 | 12,298,086 | 8,019,047 | 48,348 |
CP086569.2 | 57,804 | 22,546,893 | 20,738,629 | 12,572,166 | 6,602,341 | 35,526 | ||||
HG01928 | Q | PacBio | 4,738,407 | CM034974.1 | 39,425 | 23,613,031 | 16,086,870 | 14,551,296 | 8,228,990 | 46,177 |
CP086569.2 | 39,810 | 22,588,475 | 17,930,022 | 14,586,712 | 7,354,820 | 39,467 | ||||
HG01952 | Q | PacBio | 6,099,801 | CM034974.1 | 49,783 | 23,627,288 | 16,767,891 | 13,803,453 | 8,281,555 | 34,126 |
CP086569.2 | 50,461 | 22,455,141 | 18,527,028 | 13,793,919 | 7,683,941 | 36,330 | ||||
HG03492 | R1a | PacBio | 5,160,141 | CM034974.1 | 39,269 | 24,262,986 | 14,316,214 | 16,025,035 | 7,875,952 | 34,126 |
CP086569.2 | 39,821 | 22,660,521 | 16,093,553 | 17,105,546 | 6,600,409 | 32,869 | ||||
HG01243 | R1b-DF27 | PacBio | 5,584,056 | CM034974.1 | 41,235 | 29,388,453 | 9,526,987 | 9,691,121 | 13,873,626 | 11,982 |
CP086569.2 | 40,703 | 22,524,984 | 13,757,202 | 13,478,913 | 12,698,930 | 28,965 | ||||
HG01442 | R1b-S227 | PacBio | 6,591,165 | CM034974.1 | 52,923 | 25,758,400 | 15,339,027 | 12,963,387 | 8,419,373 | 24,059 |
CP086569.2 | 53,312 | 22,163,767 | 18,461,738 | 14,584,180 | 7,250,344 | 7,250,344 | ||||
HG01358 | R1b-U152 | PacBio | 6,510,450 | CM034974.1 | 56,383 | 26,023,197 | 16,396,753 | 12,312,905 | 7,747,332 | 27,154 |
CP086569.2 | 56,826 | 22,655,326 | 19,507,970 | 14,022,105 | 6,274,628 | 36,161 | ||||
9LKSM | R1b-CTS4466 | Chromium Linked-Read | 733,125,588 | CP086569.1 | 6,056,810 | 22,523,320 | 14,984,385 | 22,512,802 | 2,436,325 | 5,347 |
Combination of Multiple Sequencing Runs
One of the future considerations for YDNA-Warehouse is to begin combining sequencing alignments into a unified profile. Possible advantages lie in increasing overlapping coverage between runs or leveraging different read lengths. This report takes two 30x WGS results from different vendors and combines a Y Elite from the same test donor.
Sample | Test Type | Callable | Poor Mapping | Histogram |
---|---|---|---|---|
WGS229 | 30x WGS150 | 16,843,063 | 23,364,582 | |
60820188481374 | 30x WGS150 | 16,826,633 | 23,119,073 | |
60x WGS150 | 16,575,004 | 36,830,821 | ||
B6564 | Y Elite | 14,936,429 | 23,749,094 | |
All Reads | 15,998,221 | 40,630,116 |
4KE57JNV is the closet Y-DNA match to the donor of the sample above. Both generations of his Big Y BAMs are included here for comparison.
Sample | Test Type | Callable | Poor Mapping | Histogram |
---|---|---|---|---|
4KE57JNV | Big Y | 9,528,605 | 6,284,314 | |
4KE57JNV | Big Y 700 | 14,511,256 | 7,825,080 | |
All Reads | 14,869,487 | 8,120,772 |
Additional resources
FamilyTree DNA has started a page on ISOGG's Wiki FTT SNP Index. We will use the same coordinate naming for ease of cross-reference.
YSEQ has created a hg38ToCP086569 lift over chain . This should allow tools like liftOver and CrossMap to map the known GRCh38 variants into CP086569.1's coordinate frame.
Liftover Chains
We have created a set of liftover chains for internal development. They are not fully accurate, but allow early experimentation with comparing sites between Y-DNA coordinate spaces.
Known SNP Positions
These VCFs contain all the sites from YBrowse.org that can be positioned on CM034974.1 or CP086569.2. The INFO field contains information when the bases need to be swapped or reverse complimented.