Annual Seminar

Annual Meeting. 11 February 2021.

13h45-14h10 A.Siegel : accueil et présentation du projet du département

 

14h10-14h30. Présentation de deux articles fil rouge

  • Articles présélectionnés  :

 

Thomas Guyet (Lacodam). NegPSpan: efficient extraction of negative sequential patterns with embedding constraints”, KDD 2018.    http://people.irisa.fr/Thomas.Guyet/negativepatterns/

David Gross-Amblart (Druid). Overlapping Hierarchical Clustering (OHC) IDA 2020 https://hal.inria.fr/hal-02452729/file/Overlapping_Hierarchical_Clustering_IDA2020_Camera_Ready_.pdf

 

  • Objectif/critère d’évaluation du travail des groupes

 

14h30-14H45 Pause

 

14h45-16h15 Reproductibilité : exposés invités et positionnement  des équipes

  • (1) N. Rougier : présentation de Re-science
  • (2) A. Legrand: intérêt et utilisation d’un cahier de labo en ligne pour la reproductibilité.
  • (3) Questions
  • (4) O. Collin : Reproductibilité et gestion des données sur la plate-forme GenOuest
  • (5) Table-ronde: politique de reproductibilité des équipes, outils à disposition via genouest.
(1) Nicolas Rougier
Title: Let’s redo Science!
Abstract: ReScience C is a platinum open-access peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research is reproducible. You can read about the ideas behind ReScience C in the article “Sustainable computational science: the ReScience initiative” [1] To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience C lives on GitHub where each new implementation of a computational study is made available together with comments, explanations and tests. Each submission takes the form of an issue that is publicly reviewed and tested in order to guarantee that any researcher can re-use it. If you ever replicated computational results (or failed at) from the literature in your research, ReScience C is the perfect place to publish your new implementation. ReScience C is collaborative and open by design. Everything can be forked and modified. Don’t hesitate to write a submission, join us and to become a reviewer. [1] Rougier, Hinsen et al, “Sustainable computational science: the ReScience initiative”, PeerJ Computer Science, 2017. DOI: 10.7717/peerj-cs.142
(2)Arnaud Legrand
Title: Using Laboratory Notebooks to Improve the Reproducibility of your Research
Abstract: Although laboratory notebooks are commonly used in experimental science to document hypothesis, experimental conditions and initial analysis, the physical version (a stack of numbered pages) is often inappropriate to modern practices which involve a lot of digital information like data, computations and visualizations. Many so-called computational notebooks (e.g., Jupyter or Rstudio with the Rmd format) have thus emerged and provide literate programming capabilities although they do not necessarily encourage journaling as it is commonly done with classical laboratory notebooks. In this presentation, I will quickly demo some of these tools, present how I use them on a daily basis as an electronic laboratory notebook, and how they allow me to write reproducible articles which comprise all the important research information to the readers.
(4) Olivier Collin
Title: Reproducibility and data management on the GenOuest facility
Abstract: In the context of FAIR data and Open Science in Bioinformatics, new environments arose that will help scientists  improve the reproducibility of their computational workflows. During these last years, GenOuest has built environments to help scientists address reproducibility challenges.
(5) Intervenants de la table ronde “Politique de reproductibilité des équipes”

Olivier Dameron (Dyliss)

Charles Deltel ( Genscale)

Luis Galarraga (Lacodam)

Francois Goasdoue (Shaman)

David Gross-Amblard (Druid)

Fabrice Legeai ( Genscale)

16h15-16h30. Pause
16h30-17h30 Travail en groupes sur la reproductibilité
  • Fil rouge : reproduire un des articles sélectionnés, avec un objectif de reproductibilité (+1 indicateur qualité).

 

17h30-18h Restitution et remise du prix.

Annual meeting of ph-D students. 12 February 2021.

9h-9h40  Table-ronde : comment un ou une doctorants doit-il choisir son éditeur / conférence.

Intervenants :

  • Dominique Lavenier (comité national section 6 et CID51)
  • Elisa Fromont (conseil scientifique INS2I)
  • Alexandre Termier (responsable d’équipe)
  • Mireille Ducassé (ex CNU)
  • Arnaud Martin (ex CNU)

9h40-10h Pause

10h-10h30 Retour d’expérience des doctorants. Comment s’est passé le processus de soumission d’un article ?

10h30-11h Pause ice-breaker

https://www.wonder.me/r?id=tpod7z-7kjq7  (à utiliser plutôt avec firefox ou chrome)

11h-11h30  Présentation des doctorants et doctorants en première année sur leur sujet de thèse

11h30-12h Posters des doctorants et doctorants de 2eme année.

  • Séance spécialisée
  • Séance non spécialisée

_________________________________

Annual Meeting. 13 February 2020.

Organisateurs et organisatrices. Johanne Bakalara (Lacodam). Francesco Bariatti (SemLIS). Arnaud Belcour (Dyliss). Kevin Da Silva (Genscale). Ludivine Duroyon (Shaman). Marie Le Roic (Service des assistantes). Anne Siegel (Dyliss). Constance Thierry (Druid).

Slides.

  • Mathieu Guillermin (pdf)
  • Benoit Frenay (pdf)
  • Emmanuelle Becker. (pdf)

 

Présentation DKM (Anne Siegel)

Exposé invité. Mathieu Guillermin (Univ Catholique Lyon). Ethical challenges of AI and big data: the need for bridges

Abstract. Since last decades, the rapid progress of AI and big data processing techniques trigger deep worries as well as optimistic enthusiasm. Steering technological development in these fields becomes a more and more pressing challenge, reinforced by the high pace at which research breakthroughs in the domain of information technologies are transferred to societally implemented applications. In this presentation, I will discuss the form ethical investigation could take to ensure an enlightened development of AI and big data technologies. I’ll mobilize different case studies to highlight the importance of building bridges not only between different types of expertise, but also between experts and users or impacted persons.

Highlights

  • Structuring and analyzing data according to domain knowledge.
    • Zoltan Miklos (Druid): reconstruction automatique de l’histoire des sciences, façon Big Data
    • Olivier Dameron (Dyliss): Increasing life science resources re-usability using Semantic Web technologies
  • Improving performance and expressivity of data management and representation frameworks
    • Pierre Peterlongo (GenScale): A resource-frugal probabilistic dictionary and applications in bioinformatics
  • Guiding the data exploration
    • Sébastien Ferré (SemLIS): Sparklis: an expressive query builder for SPARQL endpoints with guidance in natural language
  • Generic methods for the extraction of complex features and knowledge from data.
    • Christine Largouët (Lacodam): Anomaly detection with extreme value theory
    • Grégory Smits (Shaman): Génération efficace d’estimations fiables de résumés linguistiques

Speed-dating

  • Dirigez vous vers les salles Petri/Markov/Turing.
  • Allez vers le tableau qui correspond à la couleur de la pastille sur votre badge.
  • Trouvez le nom de votre binôme puis identifiez-le parmi les personnes présentes près du tableau.
  • Vous pouvez échanger pendant 10 minutes (une liste de thématiques pour commencer la discussion sera sur la table).
  • Au bout de 10 minutes, allez chercher le nom de votre second binôme et recommencez.

Exposé Invité. Benoit Frenay (Namur). Machine Learning: Getting Back Into the Loop

Abstract. Machine learning provides powerful predictive tools, yet it leaves users with little opportunity to control decision making.  First, models are often hard to understand and decisions are consequently difficult to interpret.  Second, the mechanics of algorithms cannot be easily altered to integrate user feedback (e.g. because he/she is not happy with some predictions).  Third, the statistical nature of machine learning algorithms makes it difficult to enforce domain constraints.  In this talk, I will show three examples of methods to address those limitations.  First, I will discuss recent works on interpretability for dimensionality reduction, in particular with MDS and t-SNE.  Then, I will show how priors can be used to embed user feedback in Bayesian PCA.  Finally, I will present ongoing work on constraint enforcement for decision trees.

Highlights

  • Structuring and analyzing data according to domain knowledge.
    • David Gross-Amblard (Druid), Le crowdsourcing au delà d’Amazon Mechanical Turk
    • Claire Lemaitre (GenScale): Multiple comparative metagenomics using multiset k-mer counting
  • Improving performance and expressivity of data management and representation frameworks
    • François Goasdoué (shaman): Gestion efficace de bases de données en présence de contraintes ontologiques
  • Guiding the data exploration
    • Anne Siegel (Dyliss) Scalable analysis of families of metabolic-based biological systems.
  • Generic methods for the extraction of complex features and knowledge from data.
    • Elisa Fromont (Lacodam): Cost sensitive imbalanced classification
    • Peggy Cellier (SemLIS): Graph mining for knowledge graphs

 

Table-ronde: futur des données de santé. Animatrices. Constance Thierry et Johanne Bakalara.

  • T. Allard (Druid – Confidentialité).  
  • L. D’orazio (Shaman – Calcul à partir de sources distribuées).
  • T. Guyet (Lacodam – Classe des requêtes et représentation des connaissances).
  • D. Lavenier (Genscale – Futures données génomiques de santé).
  • O. Ridoux (SemLIS –  impact énergétique).
  • N. Théret (Dyliss – Enjeux éthiques).

 

Annual meeting of ph-D students. 14 February 2020.

Emmanuelle Becker (Dyliss). Comment (ne pas) rater son poster

Présentations des doctorantes et doctorants.

  • Kévin Da Silva (Genscale) : Identification and quantification of strains in metagenomics sample using variation graphs
  • Hugo Talibart (Dyliss): Protein homology search using residues coevolution
  • Rituraj Singh (Druid): Reducing the Cost of Aggregation in Crowdsourcing
  • Thi To Quyen Tran (Shaman): Filter-based fuzzy big joins
  • Colin Leverger (Lacodam): Toward a framework for seasonal time series forecasting using clustering

Présentations flash des posters (1min/personne, attention au chronomètre).

Présentations des doctorantes et doctorants.

  • Mael Conan (Dyliss): Predictive approach to assess the genotoxicity of environmental contaminants during liver fibrosis
  • Johanne Bakalara (Lacodam) : Temporal Model to explore Medico-Administrative Data
  • Tompoariniaina Andriamilanto (Druid) : Leveraging Browser Fingerprinting for Web Authentification
  • Van Hoang Tran (Shaman) : Performance Analysis of Big Data Management Systems for Cyber Security
  • Francesco Bariatti (SemLIS) : Graph pattern selection based on Minimum Description Length

Buffet avec posters.

  1. Marine Louarn (Dyliss): Increasing life science resources re-usability using Semantic Web technologies
  2. Ian Jeantet (Druid): Building metro maps of science evolution
  3. Ludivine Duroyon (Shaman): A Linked Data Model for Facts, Statements and Beliefs
  4. Méline Wery (Dyliss) : Identification of causal signature using omics data integration and network reasoning-based analysis
  5. Aurélien Lamercerie (SemLIS) : Une algèbre des automates d’acceptation propositionnels déterministes (DPAA) pour la conception de systèmes cyberphysiques (CPS)
  6. Erwan Bourrand (Lacodam) : Discovering Useful Compact Sets of Sequential Rules in a Long Sequence
  7. Lolita Lecompte (Genscale) : SVJedi : Structural variation genotyping using long reads
  8. Nicolas Guillaudeux (Dyliss) : Predicting isoform transcripts: What does the comparison of known transcripts in human, mouse and dog tell us ?
  9. Grégoire Siekaniec (Genscale) : Differential characterization of Streptococcus thermophilus strains by Nanoporesequencing (MinION)
  10. Joris Dugueperoux (Druid): Guaranteed Confidentiality and Efficiency in Crowdsourcing Platforms
  11. Yichang Wang (Lacodam) : Learning interpretable shapelets for time series classification through adversarial regularization
  12. Constance Thierry (Druid): MONITOR what next ?
  13. Heng Zhang (Lacodam) : Multi-spectral object detection for all-day video surveillance