Telomere-to-Telomere Y Chromosome Experiments

The Telomere-to-Telemere (T2T) consortium released the full assembly of NA24385's Y chromosome. This presents some interesting opportunities to attempt to access more of the regions left undefined in GRCh38. To that end we have created a reference combining the previously released as CHM13. The combined reference was then used with bwa-mem2 to align a WGS sample from each major subclade for plotting a callable loci histogram.

Zimin et al. 2021, A reference-quality, fully annotated genome from a Puerto Rican individual added CM034974.1 for HG01243. This individual is R1b-DF27 and therefore closer to the GRCh38 reference.

callertype
Short-read WGS Coverage Histograms
Sample Approximate HG Test Type Accession Callable Poor Mapping Histogram
WGS1213 A00 30x WGS150 CP086569.1 15,254,009 21,161,595
HG01890 A0 40x WGS150 CP086569.1 16,615,534 19,895,879
HG02613 A1a 40x WGS150 CP086569.1 16,586,917 29,939,569
SAMEA3302894 A1b1 40x WGS100 CP086569.1 15,345,990 33,570,375
HG03225 B-FT315355 40x WGS150 CP086569.1 16,704,898 29,629,714
HG00628C-MF2235 40x WGS150 CP086569.1 16,923,490 24,421,698
NA19004 D-CTS131 40x WGS150 CP086569.1 16,982,847 28,296,213
NA20348 E2b-CTS1441 40x WGS150 CP086569.1 16,907,287 30,656,243
HG02040 F1 40x WGS150 CP086569.1 16,970,015 35,699,948
NA20870 G-Z3043 40x WGS150 CP086569.1 17,093,678 24,008,773
HG02686 H-Z34945 40x WGS150 CP086569.1 17,152,735 27,197,525
HG00360 I2a-CTS695 40x WGS150 CP086569.1 17,192,655 32,083,004
NA24385 J1a 100x WGS250 CP086569.1 19,964,587 42,398,216
HG01258 J1b-YP1273 40x WGS150 CP086569.1 17,403,602 28,031,420
HG03790 L-M2355 40x WGS150 CP086569.1 17,030,927 32,256,361
SAMEA3302892 M-Z42281 40x WGS100 CP086569.1 15,618,252 29,181,429
NA18748 N-CTS582 40x WGS150 CP086569.1 16,997,997 29,733,540
NA18647 O1a-MF643967 40x WGS150 CP086569.1 17,056,721 34,581,137
HG03914 Q-YP748 40x WGS150 CP086569.1 17,150,579 24,771,588
WGS229 R1b-CTS4466 30x WGS150 CM034974.1 16,873,503 23,373,816
CP086569.2 16,650,411 23,674,644
SAMEA3302626 S-Z33752 50x WGS100 CP086569.1 15,605,348 35,614,665
NA20758 T-CTS54 40x WGS150 CP086569.1 17,096,414 24,671,518
Coverage Histograms
Sample Subclade Test Type Total Reads Accession Y Reads Callable Poor Mapping Low Coverage No Coverage Homozygous SNPs
HG005 aka NA24143 D-CTS932 PacBio 9,730,321 CM034974.1 78,098 21,033,140 15,500,660 14,131,990 15,067,964 24,755
CP086569.2 78,613 20,784,173 15,571,397 15,083,428 11,021,031 19,815
HG006 aka NA24594 D-CTS932 PacBio 7,592,384 CM034974.1 68,487 21,150,279 18,512,662 14,825,451 7,991,795 37,703
CP086569.2 68,935 20,922,984 19,932,169 14,581,211 7,023,665 28,665
HG02486 E PacBio 6,766,931 CM034974.1 63,201 21,473,376 18,777,624 14,558,065 7,671,122 39,626
CP086569.2 63,380 20,834,760 20,298,306 15,009,225 6,317,738 33,564
HG02572 E1a PacBio 4,959,945 CM034974.1 41,808 21,520,972 16,183,558 15,596,949 9,178,708 41,808
CP086569.2 47,393 20,796,414 17,744,846 15,608,384 8,310,385 31,937
HG03098 E1a PacBio 5,720,080 CM034974.1 52,376 21,862,244 18,054,828 14,651,183 7,911,932 40,996
CP086569.2 52,377 21,053,190 19,022,551 16,070,689 6,313,599 32,677
HG01106 E1b PacBio 6,660,104 CM034974.1 73,992 21,010,710 24,938,293 11,367,766 5,163,418 52,698
CP086569.2 74,681 20,823,985 26,344,273 11,316,395 3,975,376 42,951
HG01109 E1b PacBio 4,795,279 CM034974.1 40,277 21,541,073 15,300,209 16,121,354 9,517,551 34,979
CP086569.2 40,389 21,033,765 16,392,387 16,236,877 8,797,000 30,245
HG02055 E1b PacBio 6,390,548 CM034974.1 56,821 21,681,642 18,068,529 14,891,146 7,838,870 41,986
CP086569.2 56,916 20,806,458 19,510,255 15,850,014 6,293,302 35,737
HG02145 E1b PacBio 6,253,441 CM034974.1 26,558 20,844,546 8,883,824 20,755,633 11,996,184 28,246
CP086569.2 26,765 20,156,734 9,520,859 22,151,520 10,630,916 24,267
HG02717 E1b PacBio 8,437,382 CM034974.1 72,150 21,477,696 18,606,132 14,551,331 7,845,028 39,173
CP086569.2 72,524 20,656,240 20,108,707 15,436,045 6,259,037 31,389
HG03579 submitted by open-genomes.org G-Z1911 PacBio PR1/CM034974.1 68,238 23,122,656 20,969,062 13,337,800 5,050,669 50,946
PR1/CP086569.1 68,034 22,501,167 22,919,714 12,782,183 4,253,768 40,413
HG03579 G-Z1911 PacBio 7,835,850 CHM13/CM034974.1 67,987 23,133,136 21,041,524 13,288,135 5,017,392 51,154
CHM13/CP086569.2 67,971 22,416,564 22,808,334 12,870,510 4,364,621 40,330
HG002 aka NA24385 J1a PacBio 8,963,661 CM034974.1 75,299 28,399,685 15,243,611 12,641,758 6,195,133 33,063
CP086569.2 76,036 39,426,936 12,080,876 10,779,593 172,624 7
HG003 aka NA24149 J1a PacBio 12,896,867 CM034974.1 116,815 30,526,251 21,355,454 7,059,384 3,526,251 56,491
CP086569.2 119,185 44,299,910 13,092,169 4,973,221 94,729 72
HG01258 J1b-YP1273 PacBio 5,736,451 CM034974.1 47,067 25,546,746 16,466,468 12,512,788 7,954,185 49,987
CP086569.2 47,389 27,046,827 16,589,688 11,116,730 7,706,784 22,524
HG00621 O2a PacBio 5,709,327 CM034974.1 46,995 22,667,644 17,123,691 13,766,385 8,922,467 46,921
CP086569.2 47,240 22,433,710 17,801,371 14,000,663 8,224,285 35,117
HG00673 O2a PacBio 6,030,186 CM034974.1 57,411 22,632,708 19,530,346 12,298,086 8,019,047 48,348
CP086569.2 57,804 22,546,893 20,738,629 12,572,166 6,602,341 35,526
HG01928 Q PacBio 4,738,407 CM034974.1 39,425 23,613,031 16,086,870 14,551,296 8,228,990 46,177
CP086569.2 39,810 22,588,475 17,930,022 14,586,712 7,354,820 39,467
HG01952 Q PacBio 6,099,801 CM034974.1 49,783 23,627,288 16,767,891 13,803,453 8,281,555 34,126
CP086569.2 50,461 22,455,141 18,527,028 13,793,919 7,683,941 36,330
HG03492 R1a PacBio 5,160,141 CM034974.1 39,269 24,262,986 14,316,214 16,025,035 7,875,952 34,126
CP086569.2 39,821 22,660,521 16,093,553 17,105,546 6,600,409 32,869
HG01243 R1b-DF27 PacBio 5,584,056 CM034974.1 41,235 29,388,453 9,526,987 9,691,121 13,873,626 11,982
CP086569.2 40,703 22,524,984 13,757,202 13,478,913 12,698,930 28,965
HG01442 R1b-S227 PacBio 6,591,165 CM034974.1 52,923 25,758,400 15,339,027 12,963,387 8,419,373 24,059
CP086569.2 53,312 22,163,767 18,461,738 14,584,180 7,250,344 7,250,344
HG01358 R1b-U152 PacBio 6,510,450 CM034974.1 56,383 26,023,197 16,396,753 12,312,905 7,747,332 27,154
CP086569.2 56,826 22,655,326 19,507,970 14,022,105 6,274,628 36,161
9LKSM R1b-CTS4466 Chromium Linked-Read 733,125,588 CP086569.1 6,056,810 22,523,320 14,984,385 22,512,802 2,436,325 5,347

Combination of Multiple Sequencing Runs

One of the future considerations for YDNA-Warehouse is to begin combining sequencing alignments into a unified profile. Possible advantages lie in increasing overlapping coverage between runs or leveraging different read lengths. This report takes two 30x WGS results from different vendors and combines a Y Elite from the same test donor.

Combined Coverage Histograms
Sample Test Type Callable Poor Mapping Histogram
WGS229 30x WGS150 16,843,063 23,364,582
60820188481374 30x WGS150 16,826,633 23,119,073
60x WGS150 16,575,004 36,830,821
B6564 Y Elite 14,936,429 23,749,094
All Reads 15,998,221 40,630,116

4KE57JNV is the closet Y-DNA match to the donor of the sample above. Both generations of his Big Y BAMs are included here for comparison.

Big Y Combined Coverage Histograms
Sample Test Type Callable Poor Mapping Histogram
4KE57JNV Big Y 9,528,605 6,284,314
4KE57JNV Big Y 700 14,511,256 7,825,080
All Reads 14,869,487 8,120,772

Additional resources

FamilyTree DNA has started a page on ISOGG's Wiki FTT SNP Index. We will use the same coordinate naming for ease of cross-reference.

YSEQ has created a hg38ToCP086569 lift over chain . This should allow tools like liftOver and CrossMap to map the known GRCh38 variants into CP086569.1's coordinate frame.

Liftover Chains

We have created a set of liftover chains for internal development. They are not fully accurate, but allow early experimentation with comparing sites between Y-DNA coordinate spaces.

HG38 Chromosome Y (CM000686.2) to CP086569.1 Over Chain
HG38 Chromosome Y (CM000686.2) to CP086569.2 Over Chain
HG38 Chromosome Y (CM000686.2) to CM034974.1 Over Chain
CP086569.1 to HG38 Chromosome Y (CM000686.2) Over Chain
CP086569.2 to HG38 Chromosome Y (CM000686.2) Over Chain
CM034974.1 to HG38 Chromosome Y (CM000686.2) Over Chain
CP086569.1 to CM034974.1 Over Chain
CP086569.1 to CP086569.2 Over Chain
CP086569.2 to CM034974.1 Over Chain
CP086569.2 to CP086569.1 Over Chain
CM034974.1 to CP086569.1 Over Chain
CM034974.1 to CP086569.2 Over Chain

Known SNP Positions

These VCFs contain all the sites from YBrowse.org that can be positioned on CM034974.1 or CP086569.2. The INFO field contains information when the bases need to be swapped or reverse complimented.

CM034974.1 Known Y SNPs
CP086569.2 Known Y SNPs