Click here to join Bioinformatics Seminar mailing list.
29.10.2025. 15:15h, Faculty of Mathematics (online)
Extending AlphaFold to Non-globular Proteins: Challenges and Solutions
Dr. László Dobson
Protein Bioinformatics Research Group, Institute of Molecular Life Sciences of the Research Center for Natural Sciences, Budapest, Hungary
A meeting of the Bioinformatics seminar will be held on Thursday, October 29th, starting at 15:15, in online classroom. Teachers and students of doctoral and master studies in computer science, mathematics, biology and other related disciplines are invited to join us.
LINK: https://zoom.us/j/2183428158?pwd=ouAZtpLrbPnOBsKjQiarS9Rh59fyqF.1
Please note that the seminar time has been moved to 15:15 CET this semester.
Abstract
AlphaFold2/3 has revolutionized protein structure prediction, achieving remarkable accuracy for many globular proteins. However, as a supervised learning method, it has limitations when applied to protein families that are underrepresented in its training data, particularly those that are difficult to align, which hinders the use of residue co-variation signals. Despite the broad adoption of AlphaFold2 in hundreds of downstream applications, these limitations propagate to methods built upon it.
Membrane proteins are significantly underrepresented in the Protein Data Bank (PDB), and multiple sequence alignments (MSAs) for them are often sparse or noisy. Intrinsically disordered proteins are also challenging to align, and they are typically excluded from the training set unless they are captured in complex with structured partners. As a result, AlphaFold’s performance is suboptimal for these proteins, and its limitations can skew the predictions of downstream applications. These applications may exhibit artificially high accuracy due to data leakage or misalignment between the training and target protein datasets. We demonstrate how AlphaFold’s training set biases impact prediction quality for these proteins, and discuss strategies to mitigate these effects.
Lecturer
László Dobson received his MSc in Info-Bionics Engineering and earned his PhD in Computational Biology at the Institute of Enzymology in Budapest, focusing on the structural analysis of transmembrane proteins. During his doctoral studies, he developed the CCTOP topology prediction algorithm and the Human Transmembrane Proteome database. Following his PhD, he worked in Zoltán Gáspári’s group on postsynaptic proteins, and later joined Toby Gibson’s team at the European Molecular Biology Laboratory (EMBL), where he investigated signalling systems in Leishmania pathogens. He returned to the Institute of Enzymology in 2023 to join Gábor Tusnády’s group. His current research focuses on non-globular proteins, including membrane protein–related resources and short linear motifs.
Seminar
The organizer of the seminar is BIRBI. The heads of the seminar are Prof. dr Nataša Pržulj and dr Jovana Kovačević.
15.10.2025. 15:15h, Faculty of Mathematics (online)
Synthetic Data in Biomedicine: When it helps, when it hurts, and why it matters?
Dr.sc. Sandi Baressi Šegota
Faculty of Informatics, Juraj Dobrila University of Pula, Croatia
Video: Recorded lecture (MP4, 67min, 270MB)
A meeting of the Bioinformatics seminar will be held on Thursday, October 15th, starting at 15:15, in online classroom. Teachers and students of doctoral and master studies in computer science, mathematics, biology and other related disciplines are invited to join us.
Abstract
Synthetic data has emerged as a powerful tool in biomedical research, offering ways to overcome limitations of real-world datasets. Its origins lie in computer simulations, initially applied in engineering to reproduce rare conditions and test computational models without interrupting production processes. More recently, machine learning methods, particularly generative approaches, have enabled the creation of data “from data” itself, where algorithms learn statistical distributions and generate artificial samples that mimic observed patterns. Techniques such as copula-based modeling, adversarial learning, and autoencoders provide the mathematical and computational foundation for these processes. In biomedicine, synthetic data can accelerate research by addressing sparse datasets, reducing the burden of data collection, and balancing highly skewed class distributions—common, for instance, when healthy patients are underrepresented compared to symptomatic ones. These benefits, however, come with significant pitfalls. Synthetic data is not a substitute for real-world measurements; it inherits biases from the source data, cannot compensate for missing biological complexity, and should never be used as validation material. Statistical descriptors can often distinguish synthetic from real data, and naive reliance on synthetic datasets risks misleading conclusions. Ethical questions also arise regarding ownership, authorship, and responsibility for derived data, especially when patient information is indirectly embedded. Despite these challenges, synthetic data use is expanding rapidly, facilitated by open-source tools and frameworks that lower the technical barrier to entry. This democratization increases both the potential for impactful applications and the risk of misuse. The future of synthetic data in biomedicine depends on careful evaluation, transparent disclosure, and development of best-practice guidelines. By highlighting both its promise and its limitations, this work underscores why synthetic data matters—and why its role in biomedical research must be critically examined.
Lecturer
Dr. sc. Sandi Baressi Šegota, mag. ing. comp. is a teaching assistant at the Department of Automation and Electronics, Faculty of Engineering, University of Rijeka, and a member of the Chair of Electronics, Robotics and Automation, as well as the Laboratory for Automation and Robotics. He earned his PhD in Electrical Engineering in 2025. He graduated with bachelor’s degree (2017) and master’s degree (2019) in Computer Science at the University of Rijeka. That same year, he enrolled in doctoral studies and joined the University as a junior researcher, contributing to major projects including CEKOM SmartCity.4DII and the Scientific Center of Research Excellence in Data Science “DATACROSS.”
Seminar
The organizer of the seminar is BIRBI. The heads of the seminar are Prof. dr Nataša Pržulj and dr Jovana Kovačević.
19.6.2025. 15:15h, Faculty of Mathematics (online and live in room BIM)
Application of Structural Bioinformatics to Analyze a Biomedical Question: The Case of Essential Thrombocythemia
Dr. Alexandre de Brevern
DSIMB Bioinformatics team, INSERM UMR_S 1134, BIGR unit Université Paris Cité & Université de la Réunion, Necker Hospital, Paris, France
Video: Recorded lecture (MP4, 68min, 120MB)
A meeting of the Bioinformatics seminar will be held on Thursday, June 19th, starting at 15:15, in room BIM and in online classroom. Teachers and students of doctoral and master studies in computer science, mathematics, biology and other related disciplines are invited to join us.
Abstract
Essential thrombocythemia (ET) is a blood cancer belonging to the Myeloproliferative Neoplasms (MPNs) family. ET is characterized by an increase in the production of blood platelets. The high level of platelets associated with ET lead to complications such as thrombosis, clots (agglutination of platelets), which could partially or totally obstruct a blood vessel, or even haemorrhages. This disease is uncommon, with 2.3 cases per 100,000 people each year. Several mutations are associated with ET. In 2005, a first mutation was found implying Janus Kinase 2 (JAK2) and V617F mutation, one year later; it was another protein that binds this one, namely MPL with W515L mutation. In 2013. It was calreticulin (CALR) protein that was underlined. These 3 proteins encompass 90% of ET patients. Since few years, we so decided to better characterize them, and sometimes underlying strange properties.
The first case was CALR protein ends with a highly disorder or flexible domain. A novel carboxyl-terminal sequence is generated by a frameshift mutation in CALR implied in ET (named CALR-ET), losing the ER retention peptide. CALR-ET therefore tends to go out of the ER. CALR-ETs mediate intermolecular interactions to form homodimer, bind MPL and activates it, leading to ET phenotype. We have provided a new classification of CALR-ET variants underlying different dynamical properties, but also showing potential sequencing annotation. For JAK2, two domains were analysed. The first one is the JH2 domain with the pathological mutation V617F; it is in fact quite rigid. We underlined interest of a new drug. The second one is JH1 that binds an essential FDA-approved drug, Ruxolitinib. We highlighted why this drug represses the function when it is slightly phosphorylated, but not when it is entirely phosphorylated. Finally, we show very recent results on MPL.
Lecturer
Trained as a Cell Biologist, Alexandre G. de Brevern is a Structural Bioinformatician from Université Paris Cité. Senior Researcher at the French National Institute for Health and Medical Research (INSERM), he is the head of DSIMB, the Bioinformatics team of BIGR unit (12 permanent researchers located in Paris and Saint-Denis de la Reunion). He has two main axes of researches: (i) developing innovate methodologies useful for the scientific community and (ii) specific application to proteins implicated in diseases and pathologies, mainly linked to haematology and transfusion.
Concerning the first axis, he provided 20 tools, webservers and databases. He is a recognized specialist of protein local conformations e.g. extension of definition of -turn classes. He is the designer of the most important structural alphabet able to approximate protein structures, the Protein Blocks. PBs have been used to analyse protein structures, protein dynamics, disordered proteins, binding sites, protein superimposition and prediction.
For prediction purposes, he uses biostatistics and learning approaches ranging from Bayesian approach, to Artificial Neural Networks, Support Vector Machines and Deep Learning.
Concerning the second axis, he used structural modelling and molecular dynamics to analyse Red Blood Cell and platelet proteins. He is specialized in transmembrane proteins and protein implicated in blood group transfusion.
He also extended his work to drug design with collaborations with companies and NGS. He had authored more than 180 publications and one book, is editor in 8 peer-reviewed journals. He is implicated in numerous scientific societies, being awarded by French Molecular Modelling group award (GGMM) and Prix Maurice Nicloux award (SFBBM). He is involved in many institutes evaluation in France, Czech Republic, Finland and Poland and has International collaborations e.g. India, Taiwan, Lebanon and Serbia.
Seminar
The organizer of the seminar is BIRBI. The heads of the seminar are Prof. dr Nataša Pržulj and dr Jovana Kovačević.
