{"id":18,"date":"2011-07-21T07:17:33","date_gmt":"2011-07-21T05:17:33","guid":{"rendered":"http:\/\/team.inria.fr\/deptdkm1\/?page_id=18"},"modified":"2025-05-19T10:50:49","modified_gmt":"2025-05-19T08:50:49","slug":"seminar","status":"publish","type":"page","link":"https:\/\/dept-dkm.irisa.fr\/fr\/seminar\/","title":{"rendered":"(English) Seminar"},"content":{"rendered":"<blockquote>Vous pouvez utiliser notre plugin pour ins\u00e9rer des \u00e9lements de votre rapport d&rsquo;activit\u00e9 <a href=\"http:\/\/raweb.inria.fr\">(raweb)<\/a>. Attention le rapport n&rsquo;existant qu&rsquo;en anglais, la page g\u00e9n\u00e9r\u00e9e ici sera \u00e9galement en anglais.\r\n<h2>La page de pr\u00e9sentation<\/h2>\r\nExemple : <a href=\"http:\/\/raweb.inria.fr\/rapportsactivite\/RA2016\/abs\/uid3.html\">abs<\/a><\/blockquote>\r\n                 <h3>Overall objectives<\/h3>                 <subsection id=\"ABS-RA-2025-uid1\" level=\"1\">                   <h3>Biomolecules and their function(s).<\/h3>                   <p>\u00a0Computational Structural Biology (CSB) is the scientific domain concerned with the development of algorithms and software to understand and predict the structure and function of biological macromolecules. This research field is inherently multi-disciplinary. On the experimental side, biology and medicine provide the objects studied, while biophysics and bioinformatics supply experimental data, which are of two main kinds. On the one hand, genome sequencing projects give supply protein sequences, and ~200 millions of sequences have been archived in <hi rend=\"tt\">UniProtKB<\/hi>\/<hi rend=\"tt\">TrEMBL<\/hi>\u00a0\u2013 which collects the protein sequences yielded by genome sequencing projects. On the other hand, structure determination experiments (notably X-ray crystallography, nuclear magnetic resonance, and cryo-electron microscopy) give access to geometric models of molecules \u2013 atomic coordinates. Alas, only ~150,000 structures have been solved and deposited in the Protein Data Bank (PDB), a number to be compared against the <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mo>\u223c<\/mo><msup><mn>10<\/mn><mn>8<\/mn><\/msup><\/mrow><\/math><\/formula> sequences found in <hi rend=\"tt\">UniProtKB<\/hi>\/<hi rend=\"tt\">TrEMBL<\/hi>. With one structure for ~1000 sequences, we hardly know anything about biological functions at the atomic\/structural level. Complementing experiments, physical chemistry\/chemical physics supply the required models (energies, thermodynamics, etc). More specifically, let us recall that proteins with <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>n<\/mi><\/math><\/formula> atoms has <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>d<\/mi><mo>=<\/mo><mn>3<\/mn><mi>n<\/mi><\/mrow><\/math><\/formula> Cartesian coordinates, and fixing these (up to rigid motions) defines a conformation. As conveyed by the iconic <hi rend=\"it\">lock-and-key<\/hi> metaphor for interacting molecules, Biology is based on the interactions stable conformations make with each other. Turning these intuitive notions into quantitative ones requires delving into statistical physics, as macroscopic properties are average properties computed over ensembles of conformations. Developing effective algorithms to perform accurate simulations is especially challenging for two main reasons. The first one is the high dimension of conformational spaces \u2013 see <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>d<\/mi><mo>=<\/mo><mn>3<\/mn><mi>n<\/mi><\/mrow><\/math><\/formula> above, typically several tens of thousands, and the non linearity of the energy functionals used. The second one is the multiscale nature of the phenomena studied: with biologically relevant time scales beyond the millisecond, and atomic vibrations periods of the order of femto-seconds, simulating such phenomena typically requires <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mo>\u226b<\/mo><msup><mn>10<\/mn><mn>12<\/mn><\/msup><\/mrow><\/math><\/formula> conformations\/frames, a (brute) <hi rend=\"it\">tour de force<\/hi> rarely achieved\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_shaw2010atomic\">38<\/ref>.<\/p>                                    <\/subsection>                 <subsection id=\"ABS-RA-2025-uid2\" level=\"1\">                   <h3>Computational Structural Biology: three main challenges.<\/h3>                   <p>\u00a0The first challenge, <hi rend=\"it\">sequence-to-structure prediction<\/hi>, aims to infer the possible structure(s) of a protein from its amino acid sequence. While recent progress has been made recently using in particular deep learning techniques <ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_senior2020improved\">37<\/ref>, the models obtained so far are static and coarse-grained.<\/p>                   <p>The second one is <hi rend=\"it\">protein function prediction<\/hi>. Given a protein with known structure, <hi rend=\"it\">i.e.<\/hi>, 3D coordinates, the goal is to predict the partners of this protein, in terms of stability and specificity. This understanding is fundamental to biology and medicine, as illustrated by the example of the SARS-CoV-2 virus responsible of the Covid19 pandemic. To infect a host, the virus first fuses its envelope with the membrane of a target cell, and then injects its genetic material into that cell. Fusion is achieved by a so-called class I fusion protein, also found in other viruses (influenza, SARS-CoV-1, HIV, etc). The fusion process is a highly dynamic process involving large amplitude conformational changes of the molecules. It is poorly understood, which hinders our ability to design therapeutics to block it.<\/p>                   <object id=\"ABS-RA-2025_label_fig:fig-baker\"><table><tr><td><ressource xlink:href=\"WEB_IMG\/baker-miniproteins-against-RBD-science-2020-montage-cropped.png\" type=\"figure\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" id=\"ABS-RA-2025-uid3\" media=\"WEB\"\/><\/td><\/tr><\/table><caption>Figure 1: <hi rend=\"bold\">The synergy modeling &#8211; experiments, and challenges faced in CSB: illustration on the problem of designing miniproteins blocking the entry of SARS-CoV-2 into cells. From <ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_cao2020novo\">29<\/ref>.<\/hi> Of note: the first step of the infection by SARS-CoV-2 is the attachment of its receptor binding domain of its spike (RBD, blue molecule), to a target protein found on the membrane of our cells, ACE2 (orange molecule). A strategy to block infection is therefore to engineer a molecule binding the RBD, preventing its attachment to ACE2. <hi rend=\"bold\">(A)<\/hi> Design of a helical protein (orange) mimicking a region of the ACE2 protein. <hi rend=\"bold\">(B)<\/hi> Assessment of binding modes (conformation, binding energies) of candidate miniproteins neutralizing the RBD.<\/caption><\/object>                   <p>Finally, the third one, <hi rend=\"it\">large assembly reconstruction<\/hi>, aims at solving (coarse-grain) structures of molecular machines involving tens or even hundreds of subunits. This research vein was promoted about 15 years back by the work on the nuclear pore complex <ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_aal-manpc-07\">26<\/ref>. It is often referred to as <hi rend=\"it\">reconstruction by data integration<\/hi>, as it necessitates to combine coarse-grain models (notably from cryo-electron microscopy (cryo-EM) and native mass spectrometry) with atomic models of subunits obtained from X ray crystallography. Fitting the latter into the former requires exploring the conformation space of subunits, whence the importance of protein dynamics.<\/p>                   <p>As an illustration of these three challenges, consider the problem of designing proteins blocking the entry of SARS-CoV-2 into our cells (Fig. <ref location=\"intern\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_label_fig-fig-baker\">1<\/ref>). The first challenge is illustrated by the problem of predicting the structure of a blocker protein from its sequence of amino-acids \u2013 a tractable problem here since the mini proteins used only comprise of the order of 50 amino-acids (Fig. <ref location=\"intern\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_label_fig-fig-baker\">1<\/ref>(A), <ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_cao2020novo\">29<\/ref>). The second challenge is illustrated by the calculation of the binding modes and the binding affinity of the designed proteins for the RBD of SARS-CoV-2 (Fig. <ref location=\"intern\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_label_fig-fig-baker\">1<\/ref>(B)). Finally, the last challenge is illustrated by the problem of solving structures of the virus with a cell, to understand how many spikes are involved in the fusion mechanism leading to infection. In <ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_cao2020novo\">29<\/ref>, the promising designs suggested by modeling have been assessed by an array of wet lab experiments (affinity measurements, circular dichroism for thermal stability assessment, structure resolution by cryo-EM). The <hi rend=\"it\">hyperstable<\/hi> minibinders identified provide starting points for SARS-CoV-2 therapeutics\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_cao2020novo\">29<\/ref>. We note in passing that this is truly remarkable work, yet, the designed proteins stem from a template (the <hi rend=\"it\">bottom<\/hi> helix from ACE2), and are rather small.<\/p>                   <object id=\"ABS-RA-2025_label_fig:fig\"><table><tr><td><ressource xlink:href=\"WEB_IMG\/PEL-landscape-artistic-v1-montage-erc--copy.png\" type=\"figure\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" id=\"ABS-RA-2025-uid4\" media=\"WEB\"\/><\/td><\/tr><\/table><caption>Figure 2: The main challenges of molecular simulation: Finding significant local minima of the energy landscape, computing statistical weights of catchment basins by integrating Boltzmann&rsquo;s factor, and identifying transitions. Practically, <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>d<\/mi><mo>&gt;<\/mo><mn>100<\/mn><\/mrow><\/math><\/formula>.<\/caption><\/object>                 <\/subsection>                 <subsection id=\"ABS-RA-2025-uid5\" level=\"1\">                   <h3>Protein dynamics: core CS &#8211; maths challenges.<\/h3>                   <p>To present challenges in structural modeling, let us recall the following ingredients (Fig. <ref location=\"intern\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_label_fig-fig\">2<\/ref>). First, a molecular model with <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>n<\/mi><\/math><\/formula> atoms is parameterized over a conformational space <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>\ud835\udcb3<\/mi><\/math><\/formula> of dimension <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>d<\/mi><mo>=<\/mo><mn>3<\/mn><mi>n<\/mi><\/mrow><\/math><\/formula> in Cartesian coordinates, or <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>d<\/mi><mo>=<\/mo><mn>3<\/mn><mi>n<\/mi><mo>&#8211;<\/mo><mn>6<\/mn><\/mrow><\/math><\/formula> in internal coordinate\u2013upon removing rigid motions, also called degree of freedom (<hi rend=\"it\">d.o.f.<\/hi>). Second, recall that the <hi rend=\"it\">potential energy landscape<\/hi> (PEL) is the mapping <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>V<\/mi><mo>(<\/mo><mo>\u00b7<\/mo><mo>)<\/mo><\/mrow><\/math><\/formula> from <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><msup><mi>\u211d<\/mi><mi>d<\/mi><\/msup><\/math><\/formula> to <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>\u211d<\/mi><\/math><\/formula> providing a potential energy for each conformation\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_wales2003energy\">39<\/ref>, <ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_schon2009prediction\">36<\/ref>. Example potential energies (PE) are <hi rend=\"tt\">CHARMM<\/hi>, <hi rend=\"tt\">AMBER<\/hi>, <hi rend=\"tt\">MARTINI<\/hi>, etc. Such PE belong to the realm of molecular mechanics, and implement atomic or coarse-grain models. They may embark a solvent model, either explicit or implicit. Their definition requires a significant number of parameters (up to <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mo>\u223c<\/mo><mn>1<\/mn><mo>,<\/mo><mn>000<\/mn><\/mrow><\/math><\/formula>), fitted to reproduce physico-chemical properties of (bio-)molecules\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_wang2014building\">40<\/ref>.<\/p>                   <p>These PE are usually considered good enough to study non covalent interactions \u2013 our focus, even though they do not cover the modification of chemical bonds. In any case, we take such a function for granted <ref xlink:href=\"#ABS-RA-2025-note-uid6\" location=\"extern\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\">1<\/ref>.<\/p>                   <p>The PEL codes all <hi rend=\"bold\">structural, thermodynamic<\/hi>, and <hi rend=\"bold\">kinetic<\/hi> properties, which can be obtained by averaging properties of conformations over so-called <hi rend=\"it\">thermodynamic ensembles<\/hi>. The <hi rend=\"bold\">structure<\/hi> of a macromolecular system requires the characterization of active conformations and important intermediates in functional pathways involving significant basins. In assigning occupation probabilities to these conformations by integrating Boltzmann&rsquo;s distribution, one treats <hi rend=\"bold\">thermodynamics<\/hi>. Finally, transitions between the states, modeled, say, by a master equation (a continuous-time Markov process), correspond to <hi rend=\"bold\">kinetics<\/hi>. Classical simulation methods based on molecular dynamics (MD) and Monte Carlo sampling (MC) are developed in the lineage of the seminal work by the 2013 recipients of the Nobel prize in chemistry (Karplus, Levitt, Warshel), which was awarded \u201c<hi rend=\"it\">for the development of multiscale models for complex chemical systems<\/hi>\u201d. However, except for highly specialized cases where massive calculations have been used <ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_shaw2010atomic\">38<\/ref>, neither MD nor MC give access to the aforementioned time scales. In fact, the main limitation of such methods is that they treat structural, thermodynamic and kinetic aspects at once <ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_frenkel2002understanding\">32<\/ref>. The absence of specific insights on these three complementary pieces of the puzzle makes it impossible to optimize simulation methods, and results in general in the inability to obtain converged simulations on biologically relevant time-scales.<\/p>                   <p>The hardness of structural modeling owes to three intertwined reasons.<\/p>                   <p>First, PELs of biomolecules usually exhibit a number of critical points exponential in the dimension\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_ball1999dynamics\">27<\/ref>; fortunately, they enjoy a multi-scale structure\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_energy2016carr\">30<\/ref>. Intuitively, the significant local minima\/basins are those which are <hi rend=\"it\">deep<\/hi> or <hi rend=\"it\">isolated\/wide<\/hi>, two notions which are mathematically qualified by the concepts of persistence and prominence. Mathematically, problems are plagued with the curse of dimensionality and measure concentration phenomena. Second, biomolecular processes are inherently multi-scale, with motions spanning <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mo>\u223c<\/mo><\/math><\/formula> 15 and <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mo>\u223c<\/mo><\/math><\/formula> 4 orders of magnitude in time and amplitude respectively <ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_adcock2006molecular\">25<\/ref>. Developing methods able to exploit this multi-scale structure has remained elusive. Third, macroscopic properties of biomolecules, <hi rend=\"it\">i.e.<\/hi>, observables, are average properties computed over ensembles of conformations, which calls for a multi-scale statistical treatment both of thermodynamics and kinetics.<\/p>                 <\/subsection>                 <subsection id=\"ABS-RA-2025-uid7\" level=\"1\">                   <h3>Validating models.<\/h3>                   <p>A natural and critical question naturally concerns the validation of models proposed in structural bioinformatics. For all three types of questions of interest (structures, thermodynamics, kinetics), there exist experiments to which the models must be confronted \u2013 when the experiments can be conducted.<\/p>                   <p>For structures, the models proposed can readily be compared against experimental results stemming from X ray crystallography, NMR, or cryo electron microscopy. For thermodynamics, which we illustrate here with binding affinities, predictions can be compared against measurements provided by calorimetry or surface plasmon resonance. Lastly, kinetic predictions can also be assessed by various experiments such as binding affinity measurements (for the prediction of <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><msub><mi>K<\/mi><mrow><mi>o<\/mi><mi>n<\/mi><\/mrow><\/msub><\/math><\/formula> and <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><msub><mi>K<\/mi><mrow><mi>o<\/mi><mi>f<\/mi><mi>f<\/mi><\/mrow><\/msub><\/math><\/formula>), or fluorescence based methods (for kinetics of folding).<\/p>                 <\/subsection>               <h3>Last activity report : 2025 <\/h3><ul><li>2025 : <a href=\"http:\/\/radar.inria.fr\/rapportsactivite\/RA2025\/abs\/abs.pdf\">PDF<\/a> &#8211; <a href=\"http:\/\/radar.inria.fr\/report\/2025\/abs\/uid0.html\">HTML<\/a><\/li><li>2024 : <a href=\"http:\/\/radar.inria.fr\/rapportsactivite\/RA2024\/abs\/abs.pdf\">PDF<\/a> &#8211; <a href=\"http:\/\/radar.inria.fr\/report\/2024\/abs\/uid0.html\">HTML<\/a><\/li><li>2023 : <a href=\"http:\/\/radar.inria.fr\/rapportsactivite\/RA2023\/abs\/abs.pdf\">PDF<\/a> &#8211; <a href=\"http:\/\/radar.inria.fr\/report\/2023\/abs\/uid0.html\">HTML<\/a><\/li><li>2022 : <a href=\"http:\/\/radar.inria.fr\/rapportsactivite\/RA2022\/abs\/abs.pdf\">PDF<\/a> &#8211; <a href=\"http:\/\/radar.inria.fr\/report\/2022\/abs\/uid0.html\">HTML<\/a><\/li><li>2021 : <a href=\"http:\/\/radar.inria.fr\/rapportsactivite\/RA2021\/abs\/abs.pdf\">PDF<\/a> &#8211; <a href=\"http:\/\/radar.inria.fr\/report\/2021\/abs\/uid0.html\">HTML<\/a><\/li><li>2020 : <a href=\"http:\/\/radar.inria.fr\/rapportsactivite\/RA2020\/abs\/abs.pdf\">PDF<\/a> &#8211; <a href=\"http:\/\/radar.inria.fr\/report\/2020\/abs\/uid0.html\">HTML<\/a><\/li><li>2019 : <a href=\"http:\/\/radar.inria.fr\/rapportsactivite\/RA2019\/abs\/abs.pdf\">PDF<\/a> &#8211; <a href=\"http:\/\/radar.inria.fr\/report\/2019\/abs\/uid0.html\">HTML<\/a><\/li><li>2018 : <a href=\"http:\/\/radar.inria.fr\/rapportsactivite\/RA2018\/abs\/abs.pdf\">PDF<\/a> &#8211; <a href=\"http:\/\/radar.inria.fr\/report\/2018\/abs\/uid0.html\">HTML<\/a><\/li><li>2017 : <a href=\"http:\/\/radar.inria.fr\/rapportsactivite\/RA2017\/abs\/abs.pdf\">PDF<\/a> &#8211; <a href=\"http:\/\/radar.inria.fr\/report\/2017\/abs\/uid0.html\">HTML<\/a><\/li><li>2016 : <a href=\"http:\/\/radar.inria.fr\/rapportsactivite\/RA2016\/abs\/abs.pdf\">PDF<\/a> &#8211; <a href=\"http:\/\/radar.inria.fr\/report\/2016\/abs\/uid0.html\">HTML<\/a><\/li><\/ul>\r\n<h2>Les r\u00e9sultats<\/h2>\r\n                 <h4>New results<\/h4>                                                   <div class='subsecClass'>                   <h4>Modeling the dynamics of proteins<\/h4>                   <subsection_keyword_list><b>Keywords: <\/b>Protein flexibility, protein conformations, collective coordinates, conformational sampling, loop closure, kinematics, dimensionality reduction.<\/subsection_keyword_list>                   <subsection id=\"ABS-RA-2025-uid29\" level=\"2\">                     <h4>Simpler protein domain identification using spectral clustering<\/h4>                                                               <p>The decomposition of a biomolecular complex into domains is an important step to investigate biological functions and ease structure determination. A successful approach to do so is the <hi rend=\"tt\">SPECTRUS<\/hi> algorithm, which provides a segmentation based on spectral clustering applied to a graph coding inter-atomic fluctuations derived from an elastic network model.<\/p>                     <p>We present \u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_cazals-hal-04504447\">19<\/ref>, which makes three straightforward and useful additions to <hi rend=\"tt\">SPECTRUS<\/hi>. For single structures, we show that high quality partitionings can be obtained from a graph Laplacian derived from pairwise interactions\u2013without normal modes. For sets of homologous structures, we introduce a Multiple Sequence Alignment mode, exploiting both the sequence based information (MSA) and the geometric information embodied in experimental structures. Finally, we propose to analyze the clusters\/domains delivered using the so-called D-Family matching algorithm, which establishes a correspondence between domains yielded by two decompositions, and can be used to handle fragmentation issues.<\/p>                     <p>Our domains compare favorably to those of the original <hi rend=\"tt\">SPECTRUS<\/hi>, and those of the deep learning based method <hi rend=\"tt\">Chainsaw<\/hi>. Using two complex cases, we show in particular that is the only method handling complex conformational changes involving several sub-domains. Finally, a comparison of and <hi rend=\"tt\">Chainsaw<\/hi> on the manually curated domain classification <hi rend=\"tt\">ECOD<\/hi>  as a reference shows that high quality domains are obtained without using any evolutionary related piece of information.<\/p>                     <p>is provided in the Structural Bioinformatics Library, see <ref xlink:href=\"http:\/\/sbl.inria.fr\" location=\"extern\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\">SBL<\/ref> and <ref xlink:href=\"https:\/\/sbl.inria.fr\/doc\/Spectral_domain_explorer-user-manual.html\" location=\"extern\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\">Spectral domain explorer<\/ref>.<\/p>                   <\/div>                 <\/subsection>                 <div class='subsecClass'>                   <h4>Algorithmic foundations<\/h4>                   <subsection_keyword_list><b>Keywords: <\/b>Computational geometry, computational topology, optimization, graph theory, data analysis, statistical physics.<\/subsection_keyword_list>                   <subsection id=\"ABS-RA-2025-uid31\" level=\"2\">                     <h4>Improved seeding strategies for k-means and k-GMM<\/h4>                                                               <p>In <ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_carriere-hal-05441325\">18<\/ref>, we revisit the randomized seeding techniques for <hi rend=\"tt\">k-means<\/hi> clustering and <hi rend=\"tt\">k-GMM<\/hi>  (Gaussian Mixture model fitting with Expectation-Maximization), formalizing their three key ingredients: the metric used for seed sampling, the number of candidate seeds, and the metric used for seed selection. This analysis yields novel families of initialization methods exploiting a <hi rend=\"it\">lookahead<\/hi> principle\u2013conditioning the seed selection to an enhanced coherence with the final metric used to assess the algorithm, and a <hi rend=\"it\">multipass strategy<\/hi> to tame down the effect of randomization.<\/p>                     <p>Experiments show a significant improvement over classical contenders. In particular, for <hi rend=\"tt\">k-means<\/hi>, our methods improve on the recently designed multi-swap strategy (similar results in terms of sum of square errors (SSE), seeding <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mo>\u223c<\/mo><mo>\u00d7<\/mo><mn>6<\/mn><\/mrow><\/math><\/formula> faster), which was the first one to outperform the greedy <hi rend=\"tt\">k-means++<\/hi> seeding.<\/p>                     <p>Our experimental analysis also shed light on subtle properties of <hi rend=\"tt\">k-means<\/hi> often overlooked, including the (lack of) correlations between the SSE upon seeding and the final SSE, the variance reduction phenomena observed in iterative seeding methods, and the sensitivity of the final SSE to the pool size for greedy methods.<\/p>                     <p>Practically, our most effective seeding methods are strong candidates to become one of the\u2013if not the\u2013standard technique(s). From a theoretical perspective, our formalization of seeding opens the door to a new line of analytical approaches.<\/p>                   <\/div>                   <div class='subsecClass'>                     <h4>Modeling high dimensional point clouds with the spherical cluster model<\/h4>                                                               <div class='moreClass'>In collaboration with L. Goldenberg (former Inria intern). <\/div>                                          <p>A parametric cluster model is a statistical model providing geometric insights onto the points defining a cluster. The <hi rend=\"it\">spherical cluster model<\/hi> (SC) approximates a finite point set <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>P<\/mi><mo>\u2282<\/mo><msup><mi>\u211d<\/mi><mi>d<\/mi><\/msup><\/mrow><\/math><\/formula> by a sphere <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>S<\/mi><mo>(<\/mo><mi>c<\/mi><mo>,<\/mo><mi>r<\/mi><mo>)<\/mo><\/mrow><\/math><\/formula> as follows. Taking <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>r<\/mi><\/math><\/formula> as a fraction <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>\u03b7<\/mi><mo>\u2208<\/mo><mo>(<\/mo><mn>0<\/mn><mo>,<\/mo><mn>1<\/mn><mo>)<\/mo><\/mrow><\/math><\/formula> (hyper-parameter) of the standard deviation of distances between the center <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>c<\/mi><\/math><\/formula> and the data points, the cost of the SC model is the sum over all data points lying outside the sphere <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>S<\/mi><\/math><\/formula> of their power distance with respect to <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>S<\/mi><\/math><\/formula>. The center <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>c<\/mi><\/math><\/formula> of the SC model is the point minimizing this cost. Note that <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>\u03b7<\/mi><mo>=<\/mo><mn>0<\/mn><\/mrow><\/math><\/formula> yields the celebrated center of mass used in KMeans clustering. We make three contributions\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_cazals-hal-05442010\">21<\/ref>.<\/p>                     <p>First, we show that fitting a spherical cluster yields a strictly convex but not smooth combinatorial optimization problem. Second, we present an exact solver using the Clarke gradient on a suitable stratified cell complex defined from an arrangement of hyper-spheres. Finally, we present experiments on a variety of datasets ranging in dimension from <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>d<\/mi><mo>=<\/mo><mn>9<\/mn><\/mrow><\/math><\/formula> to <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>d<\/mi><mo>=<\/mo><mn>10<\/mn><mo>,<\/mo><mn>000<\/mn><\/mrow><\/math><\/formula>, with two main observations. First, the exact algorithm is orders of magnitude faster than Broyden-Fletcher-Goldfarb-Shanno (BFGS) based heuristics for datasets of small\/intermediate dimension and small values of <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>\u03b7<\/mi><\/math><\/formula>, and for high dimensional datasets (say <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mrow><mi>d<\/mi><mo>&gt;<\/mo><mn>100<\/mn><\/mrow><\/math><\/formula>) whatever the value of <formula type=\"inline\"><math xmlns=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mi>\u03b7<\/mi><\/math><\/formula>. Second, the center of the SC model behaves as a parameterized high-dimensional median.<\/p>                     <p>The SC model is of direct interest for high dimensional multivariate data analysis, and the application to the design of mixtures of SC will be reported in a companion paper.<\/p>                   <\/div>                 <\/subsection>                 <div class='subsecClass'>                   <h4>Applications in structural bioinformatics and beyond<\/h4>                   <subsection_keyword_list><b>Keywords: <\/b>Docking, scoring, interfaces, protein complexes, phylogeny, evolution.<\/subsection_keyword_list>                   <subsection id=\"ABS-RA-2025-uid34\" level=\"2\">                     <h4>Fold or flop: quality assessment of AlphaFold predictions on whole proteomes<\/h4>                                                               <p>Reliability of <hi rend=\"tt\">AlphaFold<\/hi> predictions is primarily assessed by the method\u2019s self-reported score predicted Local Distance Difference Test (pLDDT). For model organisms, <hi rend=\"tt\">AlphaFold<\/hi> predictions show that 30% to 40% of all amino acids fall into the low-confidence range of pLDDT. Moreover, pLDDT has occasionally failed to flag predictions that are physically implausible. This raises two fundamental questions: can we identify more robust indicators of reliability? And do unreliable predictions exhibit shared structural or biophysical traits?<\/p>                     <p>To address these questions, we introduce semi-global statistics characterizing packing properties at multiple scales, and performing dimensionality reduction and clustering at once\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_sarti-hal-05438855\">23<\/ref>. We use these to perform a systematic whole-proteome structural quality assessment of prediction contained in the AlphaFold Database (AFDB), investigating connections between unreliable predictions, fold classification, and intrinsic disorder propensity.<\/p>                     <p>Our results reveal consistent relationships between low-confidence predictions, clustering of intrinsically disordered regions (IDRs), and distinctive packing properties, thereby highlighting both strengths and limitations of current self-assessment metrics. This work provides a framework for deeper confidence assessment of <hi rend=\"tt\">AlphaFold<\/hi> predictions and offers generalizable strategies for distinguishing reliable from unreliable structural models.<\/p>                   <\/div>                   <div class='subsecClass'>                     <h4>Characterizing the fragmentation of AlphaFold predictions<\/h4>                                                               <p>The Nobel prize winning program <hi rend=\"tt\">AlphaFold<\/hi> computes plausible structures of (well) folded proteins. The main quality assessment is based on the <hi rend=\"it\">predicted Local Distance Difference Test<\/hi> (pLDDT), a per amino acid confidence score. To enhance quality assessment, we provide novel quantitative measures to identify <hi rend=\"it\">coherent<\/hi> amino acid (a.a.) stretches along the sequence in terms of pLDDT values\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_cazals-hal-05438856\">22<\/ref>. These measures, which rely on standard tools from topological data analysis and combinatorics, qualify the coherence \/ fragmentation of <hi rend=\"tt\">AlphaFold<\/hi> predictions. The outcome of our analysis can readily be used to select reliable regions\/domains within proteins whose pLDDT values span the entire pLDDT range.<\/p>                   <\/div>                   <div class='subsecClass'>                     <h4>Orphan genes survey<\/h4>                                                               <p>Orphan genes are protein-coding genes that lack detectable homologs in other species, making them lineage-specific and evolutionarily enigmatic. This review\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_seckin-hal-05455139\">20<\/ref>\u00a0 synthesizes research on orphan genes in animals and fungi, summarizing their prevalence, proposed origins (including divergence and de novo emergence), and biological roles. Orphan genes are implicated in diverse processes such as reproduction, development, adaptation, and disease, highlighting their functional importance. They are especially interesting for computational biology because identifying them challenges homology-based annotation methods and requires novel comparative and statistical approaches. By consolidating scattered knowledge, this work provides a foundation for developing better computational tools to detect, classify, and model the evolution and function of orphan genes.<\/p>                   <\/div>                   <div class='subsecClass'>                     <h4>Orphan genes detection and classification<\/h4>                                                               <p>Building on the broader synthesis of orphan gene prevalence and function, we provide a focused, data-driven case in plant-parasitic nematodes of the genus Meloidogyne. Using comparative genomics across 85 nematode species, we show that orphan genes are not rare anomalies but constitute \u00a018% of the genome, with strong transcriptional support\u00a0<ref location=\"biblio\" xlink:type=\"simple\" xlink:show=\"replace\" xlink:actuate=\"onRequest\" xlink:href=\"#ABS-RA-2025_bibitem_seckin-hal-05438858\">24<\/ref>. By integrating synteny and ancestral sequence reconstruction, the work quantifies the relative contributions of divergence and de novo gene birth, directly addressing questions raised in the earlier review. Proteomic and translatomic evidence further validates these genes as bona fide coding sequences with distinctive molecular features. Together, this study builds a new and effective pipeline for detecting and classifying orphan genes, and exemplifies how computational approaches can move from cataloging orphan genes to dissecting their origins and linking them to lineage-specific adaptations such as parasitism.<\/p>                   <\/div>                 <\/subsection>               \r\n<blockquote>Vous pouvez ajouter du texte format\u00e9 selon vos besoins ici.<\/blockquote>\r\n<ul>\r\n \t<li>\r\n<h2>Axe de recherche 1<\/h2>\r\n&#8230;&#8230;.<\/li>\r\n \t<li>\r\n<h2>Axe de recherche 2<\/h2>\r\n&#8230;&#8230;&#8230;.<\/li>\r\n \t<li>\r\n<h2>Axe de recherche 3<\/h2>\r\n&#8230;&#8230;&#8230;.<\/li>\r\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Vous pouvez utiliser notre plugin pour ins\u00e9rer des \u00e9lements de votre rapport d&rsquo;activit\u00e9 (raweb). Attention le rapport n&rsquo;existant qu&rsquo;en anglais,\u2026<\/p>\n<p> <a class=\"continue-reading-link\" href=\"https:\/\/dept-dkm.irisa.fr\/fr\/seminar\/\"><span>Continue reading<\/span><i class=\"crycon-right-dir\"><\/i><\/a> <\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-18","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/dept-dkm.irisa.fr\/fr\/wp-json\/wp\/v2\/pages\/18","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dept-dkm.irisa.fr\/fr\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/dept-dkm.irisa.fr\/fr\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/dept-dkm.irisa.fr\/fr\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dept-dkm.irisa.fr\/fr\/wp-json\/wp\/v2\/comments?post=18"}],"version-history":[{"count":73,"href":"https:\/\/dept-dkm.irisa.fr\/fr\/wp-json\/wp\/v2\/pages\/18\/revisions"}],"predecessor-version":[{"id":1372,"href":"https:\/\/dept-dkm.irisa.fr\/fr\/wp-json\/wp\/v2\/pages\/18\/revisions\/1372"}],"wp:attachment":[{"href":"https:\/\/dept-dkm.irisa.fr\/fr\/wp-json\/wp\/v2\/media?parent=18"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}