It is surprising that the genetic variation in the upstream regions is still overlooked, considering the fact that the leader regions (L-PART1, L-PART2) are nowadays frequently used as primer binding sites for immunoglobulin repertoire library preparation protocols (31,48,49)

It is surprising that the genetic variation in the upstream regions is still overlooked, considering the fact that the leader regions (L-PART1, L-PART2) are nowadays frequently used as primer binding sites for immunoglobulin repertoire library preparation protocols (31,48,49). The reason for the existence of upstream polymorphisms is unclear, but conceivably such polymorphisms might have APR-246 functional relevance by influencing the stability of the mRNA or by affecting the binding of regulatory proteins (50,51). novel germline IGHV alleles as inferred from rearranged na?ve B cell cDNA repertoires of 98 individuals. Thirteen novel alleles were selected for validation, out of which ten were successfully confirmed by targeted amplification and Sanger sequencing of non-B cell DNA. Moreover, we detected a high degree of variability upstream of the V-REGION in the 5UTR, L-PART1?and L-PART2 sequences, and found that identical V-REGION alleles can differ in upstream sequences. Thus, we have identified a large genetic variation not only in the V-REGION but also in the upstream sequences of IGHV genes. Our findings provide a new perspective for annotating immunoglobulin repertoire sequencing data. INTRODUCTION Immunoglobulins are an important part of the adaptive immune system. They exert their function either as the antigen receptor of B cells that is APR-246 essential for the antigen presentation capacity of these cells (1), or as secreted antibodies that survey extracellular fluids of the body. Immunoglobulins FNDC3A can bind a plethora of antigen epitopes via their paratopes, which are composed of combinations of heavy and light chain variable regions. A huge diversity of APR-246 paratopes is established by recombination of variable (V), diversity (D) (not in light chains) and joining (J) genes, and the pairing of heavy and light chains (2). The genes of the heavy chain are located on chromosome 14 (14q32.33) (3), while the light chain genes are present on two separate loci, kappa and lambda, which are located on chromosome 2 (2p11.2) and chromosome 22 (22q11.2) respectively (4). These loci remain incompletely characterized due to the fact that they contain many repetitive sequence segments with many duplicated genes (5), which makes it difficult to correctly assemble short reads from whole genome sequencing. To this date, a limited number of genomically sequenced (6C8) and inferred (9,10) haplotypes of the heavy chain and the two light chain loci have been described. Different databases exist for genomic immune receptor DNA sequences (IMGT/GENE-DB (11)), putative novel variants from inferred data (IgPdb, https://cgi.cse.unsw.edu.au/ihmmune/IgPdb/information.php) or entire immune receptor repertoires (OGRDB (12)). The usage of immunoglobulin heavy chain variable (IGHV) genes and their mutational status are most frequently studied in relation to cancer (13,14), responses to vaccines (15,16), APR-246 or in autoimmune diseases (17C19). Most IGHV genes have several allelic variants and more alleles are being discovered as a result of adaptive immune receptor repertoire-sequencing (AIRR-seq) (20,21). Software tools such as TIgGER (22,23), IgDiscover (24) and partis (25) allow to infer germline alleles from such repertoire data. Based on these inferred alleles, the data can then be input to other tools that infer haplotypes and repertoire deletions (26). Incorrect annotation could possibly lead to inferring wrong deletions and biased assessments. Therefore, having a full overview of germline variants is essential for studying the adaptive immune response with high accuracy. Some allelic variants have been associated with increased disease susceptibility (27,28), yet the impact of immunoglobulin gene variation on disease risks is still unknown (29). These regions have not been sufficiently covered in the numerous genome wide association studies performed to date. More comprehensive maps of polymorphisms are required for proper analysis. Here, we have used previously generated AIRR-seq data (30) from na?ve B cells of 98 Norwegian individuals to identify novel IGHV alleles, a selection of which we then validated from genomic DNA (gDNA) of non-B cells, i.e. T cells and monocytes. We also analyzed the sequences upstream of the V-REGION, and constructed consensus sequences for the upstream variants present in the cohort. These results expand our knowledge of this APR-246 important locus and deepen our understanding of allelic diversity within the Caucasian population. In addition, the result of this study can be used to improve the accuracy of currently used bioinformatics tools for the analysis of immunoglobulin repertoire sequencing data. MATERIALS AND METHODS AIRR sequencing of na?ve B cells The data was obtained as a part of a previously published study (30) and is available.

By memorial2014
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.