Telomere-to-Telomere Y Chromosome Experiments

This page is an archival preservation from early 2022 experiments with the NA24385 and HG01243 gapless Y chromosomes. No updates will be added in the future.

The Telomere-to-Telemere (T2T) consortium released the full assembly of NA24385's Y chromosome. This presents some interesting opportunities to attempt to access more of the regions left undefined in GRCh38. To that end we have created a reference combining the previously released as CHM13. The combined reference was then used with bwa-mem2 to align a WGS sample from each major subclade for plotting a callable loci histogram.

Zimin et al. 2021, A reference-quality, fully annotated genome from a Puerto Rican individual added CM034974.1 for HG01243. This individual is R1b-DF27 and therefore closer to the GRCh38 reference.

callertype

Short-read WGS Coverage Histograms
Sample	Approximate HG	Test Type	Accession	Callable	Poor Mapping
WGS1213	A00	30x WGS150	CP086569.1	15,254,009	21,161,595
HG01890	A0	40x WGS150	CP086569.1	16,615,534	19,895,879
HG02613	A1a	40x WGS150	CP086569.1	16,586,917	29,939,569
SAMEA3302894	A1b1	40x WGS100	CP086569.1	15,345,990	33,570,375
HG03225	B-FT315355	40x WGS150	CP086569.1	16,704,898	29,629,714
HG00628	C-MF2235	40x WGS150	CP086569.1	16,923,490	24,421,698
NA19004	D-CTS131	40x WGS150	CP086569.1	16,982,847	28,296,213
NA20348	E2b-CTS1441	40x WGS150	CP086569.1	16,907,287	30,656,243
HG02040	F1	40x WGS150	CP086569.1	16,970,015	35,699,948
NA20870	G-Z3043	40x WGS150	CP086569.1	17,093,678	24,008,773
HG02686	H-Z34945	40x WGS150	CP086569.1	17,152,735	27,197,525
HG00360	I2a-CTS695	40x WGS150	CP086569.1	17,192,655	32,083,004
NA24385	J1a	100x WGS250	CP086569.1	19,964,587	42,398,216
HG01258	J1b-YP1273	40x WGS150	CP086569.1	17,403,602	28,031,420
HG03790	L-M2355	40x WGS150	CP086569.1	17,030,927	32,256,361
SAMEA3302892	M-Z42281	40x WGS100	CP086569.1	15,618,252	29,181,429
NA18748	N-CTS582	40x WGS150	CP086569.1	16,997,997	29,733,540
NA18647	O1a-MF643967	40x WGS150	CP086569.1	17,056,721	34,581,137
HG03914	Q-YP748	40x WGS150	CP086569.1	17,150,579	24,771,588
WGS229	R1b-CTS4466	30x WGS150	CM034974.1	16,873,503	23,373,816
WGS229	R1b-CTS4466	30x WGS150	CP086569.2	16,650,411	23,674,644
SAMEA3302626	S-Z33752	50x WGS100	CP086569.1	15,605,348	35,614,665
NA20758	T-CTS54	40x WGS150	CP086569.1	17,096,414	24,671,518

Coverage Histograms
Sample	Subclade	Test Type	Total Reads	Accession	Y Reads	Callable	Poor Mapping	Low Coverage	No Coverage	Homozygous SNPs
HG005 aka NA24143	D-CTS932	PacBio	9,730,321	CM034974.1	78,098	21,033,140	15,500,660	14,131,990	15,067,964	24,755
				CM034974.1
				CP086569.2	78,613	20,784,173	15,571,397	15,083,428	11,021,031	19,815
				CP086569.2
HG006 aka NA24594	D-CTS932	PacBio	7,592,384	CM034974.1	68,487	21,150,279	18,512,662	14,825,451	7,991,795	37,703
				CM034974.1
				CP086569.2	68,935	20,922,984	19,932,169	14,581,211	7,023,665	28,665
				CP086569.2
HG02486	E	PacBio	6,766,931	CM034974.1	63,201	21,473,376	18,777,624	14,558,065	7,671,122	39,626
				CM034974.1
				CP086569.2	63,380	20,834,760	20,298,306	15,009,225	6,317,738	33,564
				CP086569.2
HG02572	E1a	PacBio	4,959,945	CM034974.1	41,808	21,520,972	16,183,558	15,596,949	9,178,708	41,808
				CM034974.1
				CP086569.2	47,393	20,796,414	17,744,846	15,608,384	8,310,385	31,937
				CP086569.2
HG03098	E1a	PacBio	5,720,080	CM034974.1	52,376	21,862,244	18,054,828	14,651,183	7,911,932	40,996
				CM034974.1
				CP086569.2	52,377	21,053,190	19,022,551	16,070,689	6,313,599	32,677
				CP086569.2
HG01106	E1b	PacBio	6,660,104	CM034974.1	73,992	21,010,710	24,938,293	11,367,766	5,163,418	52,698
				CM034974.1
				CP086569.2	74,681	20,823,985	26,344,273	11,316,395	3,975,376	42,951
				CP086569.2
HG01109	E1b	PacBio	4,795,279	CM034974.1	40,277	21,541,073	15,300,209	16,121,354	9,517,551	34,979
				CM034974.1
				CP086569.2	40,389	21,033,765	16,392,387	16,236,877	8,797,000	30,245
				CP086569.2
HG02055	E1b	PacBio	6,390,548	CM034974.1	56,821	21,681,642	18,068,529	14,891,146	7,838,870	41,986
				CM034974.1
				CP086569.2	56,916	20,806,458	19,510,255	15,850,014	6,293,302	35,737
				CP086569.2
HG02145	E1b	PacBio	6,253,441	CM034974.1	26,558	20,844,546	8,883,824	20,755,633	11,996,184	28,246
				CM034974.1
				CP086569.2	26,765	20,156,734	9,520,859	22,151,520	10,630,916	24,267
				CP086569.2
HG02717	E1b	PacBio	8,437,382	CM034974.1	72,150	21,477,696	18,606,132	14,551,331	7,845,028	39,173
				CM034974.1
				CP086569.2	72,524	20,656,240	20,108,707	15,436,045	6,259,037	31,389
				CP086569.2
HG03579 submitted by open-genomes.org	G-Z1911	PacBio		PR1/CM034974.1	68,238	23,122,656	20,969,062	13,337,800	5,050,669	50,946
				PR1/CM034974.1
				PR1/CP086569.1	68,034	22,501,167	22,919,714	12,782,183	4,253,768	40,413
				PR1/CP086569.1
HG03579	G-Z1911	PacBio	7,835,850	CHM13/CM034974.1	67,987	23,133,136	21,041,524	13,288,135	5,017,392	51,154
				CHM13/CM034974.1
				CHM13/CP086569.2	67,971	22,416,564	22,808,334	12,870,510	4,364,621	40,330
				CHM13/CP086569.2
HG002 aka NA24385	J1a	PacBio	8,963,661	CM034974.1	75,299	28,399,685	15,243,611	12,641,758	6,195,133	33,063
				CM034974.1
				CP086569.2	76,036	39,426,936	12,080,876	10,779,593	172,624	7
				CP086569.2
HG003 aka NA24149	J1a	PacBio	12,896,867	CM034974.1	116,815	30,526,251	21,355,454	7,059,384	3,526,251	56,491
				CM034974.1
				CP086569.2	119,185	44,299,910	13,092,169	4,973,221	94,729	72
				CP086569.2
HG01258	J1b-YP1273	PacBio	5,736,451	CM034974.1	47,067	25,546,746	16,466,468	12,512,788	7,954,185	49,987
				CM034974.1
				CP086569.2	47,389	27,046,827	16,589,688	11,116,730	7,706,784	22,524
				CP086569.2
HG00621	O2a	PacBio	5,709,327	CM034974.1	46,995	22,667,644	17,123,691	13,766,385	8,922,467	46,921
				CM034974.1
				CP086569.2	47,240	22,433,710	17,801,371	14,000,663	8,224,285	35,117
				CP086569.2
HG00673	O2a	PacBio	6,030,186	CM034974.1	57,411	22,632,708	19,530,346	12,298,086	8,019,047	48,348
				CM034974.1
				CP086569.2	57,804	22,546,893	20,738,629	12,572,166	6,602,341	35,526
				CP086569.2
HG01928	Q	PacBio	4,738,407	CM034974.1	39,425	23,613,031	16,086,870	14,551,296	8,228,990	46,177
				CM034974.1
				CP086569.2	39,810	22,588,475	17,930,022	14,586,712	7,354,820	39,467
				CP086569.2
HG01952	Q	PacBio	6,099,801	CM034974.1	49,783	23,627,288	16,767,891	13,803,453	8,281,555	34,126
				CM034974.1
				CP086569.2	50,461	22,455,141	18,527,028	13,793,919	7,683,941	36,330
				CP086569.2
HG03492	R1a	PacBio	5,160,141	CM034974.1	39,269	24,262,986	14,316,214	16,025,035	7,875,952	34,126
				CM034974.1
				CP086569.2	39,821	22,660,521	16,093,553	17,105,546	6,600,409	32,869
				CP086569.2
HG01243	R1b-DF27	PacBio	5,584,056	CM034974.1	41,235	29,388,453	9,526,987	9,691,121	13,873,626	11,982
				CM034974.1
				CP086569.2	40,703	22,524,984	13,757,202	13,478,913	12,698,930	28,965
				CP086569.2
HG01442	R1b-S227	PacBio	6,591,165	CM034974.1	52,923	25,758,400	15,339,027	12,963,387	8,419,373	24,059
				CM034974.1
				CP086569.2	53,312	22,163,767	18,461,738	14,584,180	7,250,344	7,250,344
				CP086569.2
HG01358	R1b-U152	PacBio	6,510,450	CM034974.1	56,383	26,023,197	16,396,753	12,312,905	7,747,332	27,154
				CM034974.1
				CP086569.2	56,826	22,655,326	19,507,970	14,022,105	6,274,628	36,161
				CP086569.2
9LKSM	R1b-CTS4466	Chromium Linked-Read	733,125,588	CP086569.1	6,056,810	22,523,320	14,984,385	22,512,802	2,436,325	5,347
9LKSM	R1b-CTS4466	Chromium Linked-Read	733,125,588	CP086569.1

Combination of Multiple Sequencing Runs

One of the future considerations for YDNA-Warehouse is to begin combining sequencing alignments into a unified profile. Possible advantages lie in increasing overlapping coverage between runs or leveraging different read lengths. This report takes two 30x WGS results from different vendors and combines a Y Elite from the same test donor.

Combined Coverage Histograms
Sample	Test Type	Callable	Poor Mapping
WGS229	30x WGS150	16,843,063	23,364,582
60820188481374	30x WGS150	16,826,633	23,119,073
	60x WGS150	16,575,004	36,830,821
B6564	Y Elite	14,936,429	23,749,094
	All Reads	15,998,221	40,630,116

4KE57JNV is the closet Y-DNA match to the donor of the sample above. Both generations of his Big Y BAMs are included here for comparison.

Big Y Combined Coverage Histograms
Sample	Test Type	Callable	Poor Mapping
4KE57JNV	Big Y	9,528,605	6,284,314
4KE57JNV	Big Y 700	14,511,256	7,825,080
	All Reads	14,869,487	8,120,772

Additional resources

FamilyTree DNA has started a page on ISOGG's Wiki FTT SNP Index. We will use the same coordinate naming for ease of cross-reference.

YSEQ has created a hg38ToCP086569 lift over chain . This should allow tools like liftOver and CrossMap to map the known GRCh38 variants into CP086569.1's coordinate frame.

Liftover Chains

We have created a set of liftover chains for internal development. They are not fully accurate, but allow early experimentation with comparing sites between Y-DNA coordinate spaces.

HG38 Chromosome Y (CM000686.2) to CP086569.1 Over Chain

HG38 Chromosome Y (CM000686.2) to CP086569.2 Over Chain

HG38 Chromosome Y (CM000686.2) to CM034974.1 Over Chain

CP086569.1 to HG38 Chromosome Y (CM000686.2) Over Chain

CP086569.2 to HG38 Chromosome Y (CM000686.2) Over Chain

CM034974.1 to HG38 Chromosome Y (CM000686.2) Over Chain

CP086569.1 to CM034974.1 Over Chain

CP086569.1 to CP086569.2 Over Chain

CP086569.2 to CM034974.1 Over Chain

CP086569.2 to CP086569.1 Over Chain

CM034974.1 to CP086569.1 Over Chain

CM034974.1 to CP086569.2 Over Chain

Known SNP Positions

These VCFs contain all the sites from YBrowse.org that can be positioned on CM034974.1 or CP086569.2. The INFO field contains information when the bases need to be swapped or reverse complimented.

CM034974.1 Known Y SNPs

CP086569.2 Known Y SNPs