Lecture 2026/05/06 (Disorder prediction)
Aggregazione dei criteri
Assistente AI
Trascrizione
00:00:00Non credo. So. You should be the first mid-term.
00:00:08Have uploaded by the sort of solution or at least
00:00:14my solution admin di the results were very good for everyone.
00:00:21There are just some minor differences
00:00:25have uploaded this file solution figures that is
00:00:29di the figures generated
00:00:32di the result and essential it just the cards di numeriche part.
00:00:45So essential what you see is my distance matrix,
00:00:52my contact map you see di am the number of contact contact that
00:01:03used custom function and the number of contact that have the use
00:01:09the neighbor search function from di by Python and Matthews.
00:01:17The slice around the main diagonal to the number
00:01:22of contact within the age of separation so
00:01:27point six seven and from
00:01:31the two teen and so the other things that you see here is di.
00:01:41So you see the vision and the points for this specific protein
00:01:46and you have the number of us that fall
00:01:50within the solid is this time
00:01:53of the year and di the shadow you see the number of
00:02:00the house that are in the region that is
00:02:04number or highlights identifier to
00:02:09one and zero corrispondente to
00:02:12probability medium probability and outliers here so
00:02:17this number here is the number of this is that fall outside
00:02:23any color region and see the two ones here and other is here.
00:02:32We know what is the same cases so people am
00:02:41rendering is the background region in
00:02:45the way may be flip on the way axis.
00:02:50And some other case di the count of the outliers was
00:02:58Some other cases I think the most problematiche where
00:03:04about the counting of the contact in some cases numbers,
00:03:09where huge so I guess that
00:03:13those cases there was just a one instead of
00:03:18evaluation sequence operation was evaluate different range
00:03:22of this so instead of checking contacts within
00:03:26the point five Amstrong six based use of
00:03:31separation some one count number
00:03:35of contact that six within six Armstrong,
00:03:38Twelve Armstrong,
00:03:39twenty four Armstrong and of course in that case di.
00:03:43The number is not the number of contact,
00:03:47this is not the number of Peirce buttons
00:03:51making minimum contacts, but random Number.
00:03:56Of circulation so you have the sort
00:04:01of figures said so you can check what the numbers that
00:04:05you got what is the name they
00:04:08got and I was very flexible so if there were
00:04:12differences of the end of the content so
00:04:16that my solution is on line with your solutions may be
00:04:22from between six second separation and from six seven and
00:04:28twelve and from the twenty four similar trend cases,
00:04:35you can't contact you can contact us to times because the matrix is
00:04:41simmetriche so instead of counting contact on one of the matrix.
00:04:46You can also the contact on
00:04:48the other half and stop force inaccurate.
00:04:53Adam Alice any one for that and in some other cases.
00:05:00The differences are very minor so.
00:05:03Maybe there were some differences in the evaluation of.
00:05:11May be insertion codes or things that also in.
00:05:20So just to show.
00:05:22I want to show the code that they use.
00:05:30Very quickly I want.
00:05:37So distance matrix Si.
00:05:43Is the usual function identica to
00:05:48the function that we had in class And
00:05:52you see it takes last
00:05:55of the problem here is how to select what are the meaningful atom?
00:06:01I think di mid term I ask to to use carbon bitter.
00:06:09End ok it for RPD so
00:06:16the year old structure kontakt want to see where.
00:06:26You see I select carbon if the you as the carbon beat otherwise it
00:06:35di half carbon so everyday in principle
00:06:39should be consider to school it.
00:06:47And you the first element of
00:06:51the table to be a space so just to distinguished they are.
00:06:59They are not do it is it out and that the way is selected.
00:07:07Distance matrix using that function and I
00:07:14am the find the contact for
00:07:18all those distance is that where below five Amstrong.
00:07:26And and the other function is to get the contact map using
00:07:30the neighbor search so in the end.
00:07:37Of the distance matrix Calcolate using my custom function and
00:07:44contact map is used to calcolate Di
00:07:50The contact the way of the contact for the various reasons am.
00:07:59No so used contact map calculator custom function
00:08:04to get to contact so used di tipo functions,
00:08:12which calcolate angular select
00:08:15the three angular has of matrix and you
00:08:21can define the distance from the main diagonal so
00:08:25for this is set to see you have at the points.
00:08:30You have that is that includes
00:08:34di diagonal six gets everything that's
00:08:40six positions away from
00:08:44the main diagonal so do you see the main the
00:08:49main triangle matrix minus
00:08:54a attraction of death and calcolate di.
00:09:02Used to some sorry all the number of contacts in
00:09:07very easy way can be the same select the matrix after
00:09:13the sixth index after the main diagonal
00:09:16and after diagonal and so on
00:09:21this the three things a things for
00:09:26the various ranges and that's it for what the guards di.
00:09:33The count of the outliers and.
00:09:38The trick that case was to the member.
00:09:56We have access here in the plot where you have zero in
00:10:01the middle class and also for the other one,
00:10:06but the data regarding
00:10:09di di area the visions with the probability the stars from
00:10:15year think so you have coordinate that goes from zero zero and
00:10:19to up to the hundred sixty so.
00:10:24In order to it and to convert di
00:10:29the index is in the same reference you have to add on.
00:10:37To the five angle and also to the other one,
00:10:42but is the opposite so i multipli by minus one
00:10:46so switch di the axis and then can
00:10:52simply count for every sai e fai new index o
00:11:00many times see a specific specific value.
00:11:06This is this value that in my probability
00:11:11am matrix and count you see for every possible probability,
00:11:18which we are just one to one when I
00:11:24find the point in my original matrix and how I count di outliers.
00:11:32Yes, of course. If you made a mistake here your counting would
00:11:39be wrong so and other numeriche things to.
00:11:48Other things or what
00:11:53the gods di questions think most of you answer to all of them.
00:12:03So what is the difference between bond and covered bond again.
00:12:09When you have static interaction between
00:12:12the science of elections and from
00:12:16between two very different elements with
00:12:21very di elettronegativi difference is very high covenant,
00:12:26you actually share elections so you
00:12:29for molecular orbital so elections at actually
00:12:34a orbit on am
00:12:37elections moving within orbital their shared by the elements for
00:12:43more Adams What is the different
00:12:46between weekend strong strong acid dissociate completely we
00:12:51don't and you can can di amount for which assets
00:12:57diesel electric point of protein is the charges zero.
00:13:03What is the native confirmation?
00:13:05Year. Of course is
00:13:09more philosophical information is
00:13:11the functional confirmation of protein.
00:13:15Most of cases corrisponde to
00:13:17di minimum global minimum energy of the possible information.
00:13:26And that's it what is the difference between
00:13:30the biological entity year in between
00:13:36di asimmetriche units and units to the units is actually
00:13:40the element that is in the crystal d asimmetriche
00:13:46is the minimum element that is reported
00:13:51di that actually and that you see in
00:13:55the PDB structure and is the minimum element.
00:13:58You can use of coordinate you can used to the construct di units
00:14:04with cristallo graphic operations
00:14:08and after translate of the biological entity.
00:14:12Of course is the functional version of
00:14:14the complex or di assembly of
00:14:17di what you see in the asimmetriche unit
00:14:21to make it working what are missing
00:14:24use they are those that you don't see in the crystal.
00:14:29They they are not the solve e many cases.
00:14:34Most of they corrisponde to
00:14:37disorder region solutions that are Dynamic and flexible and so they
00:14:41are not in the same way in
00:14:44the crystal lattes and they are not the soft di hem in
00:14:49the scatter plot di am X-Ray experiment
00:14:55So do you have thoughts or
00:14:59questions is there are anything that is not clear.
00:15:08Persons insertion Editions Michael.
00:15:15Gli ha dato think is not accurate science that.
00:15:21In insertion and science in the mean
00:15:26usually you mean something that is he found or
00:15:32describe in the science to other versions of
00:15:36the same jeans for instance in
00:15:37other species is an evolutionary event
00:15:41in what you see what you want answer that it is that
00:15:48what you used to for the experiment
00:15:51is most of the case is not the natural gene,
00:15:55but it is an engineer construct
00:15:58so they something that the beginning they have something the end of
00:16:02protein to it is how sometimes they moves
00:16:07some positions or they some and so
00:16:13you don't say the there are indeed in
00:16:16is an evolutionary process instead when you
00:16:21modify sequence is an ingegner sequence and most of
00:16:25the sequence is are engineered indeed you have field,
00:16:29which is the natural gene and another field where you have
00:16:33di an expression organismi You should also another one that.
00:16:40I think. Different fields for
00:16:45specific what is to organize because
00:16:47you can use bacteria for expression,
00:16:50but you can express a human gene in batteria you can human,
00:16:55but the human gene can be a cui mira maybe
00:16:59from human and from another species you see
00:17:03all this sort of weird things where you
00:17:08have engineered construct am.
00:17:15Questions. Okay, so che considero
00:17:22conclude di The first mid-term you have
00:17:26the figures in di am in the middle and thanks for
00:17:33the survey now I responses to meaning of
00:17:38the classes is provided feedback am so.
00:17:42Today I would like to finish the part to
00:17:45disorder prediction just showing example of
00:17:49you also have di want to
00:17:52you implementation that they did about it is very
00:17:56simple and so you di database is
00:18:02that are the right to this is so think. La stime.
00:18:21So le stime. UE so that.
00:18:31The other proteins.
00:18:39Dice Bye bye bye,
00:18:43especially in di information content or
00:18:46di the complexity of the secret is the.
00:18:53Sequence associated proteins and.
00:18:59When we say information content woman like this so this
00:19:04is sequence or that have that is.
00:19:10Now and you can Italy using
00:19:15the formula so you measure the frequency of
00:19:21light and you with you multipli
00:19:24and you can eat it all possible acids in
00:19:30Addison of the state position of the rise you want to consider
00:19:35because this things are local
00:19:38doesn't make sense to evaluate complexity.
00:19:42Considero sequence other performance and we so that the bias is a
00:19:52particular am a clear
00:19:57when you separate disorders from the global proteins.
00:20:02Consider in siti so that are
00:20:07idrofobiche in region and that are not for the charges,
00:20:15which is another feature in comparison to
00:20:20the complexity so of charge pattern mining,
00:20:28complexity of the sequence and
00:20:31special case problem is the circular channels
00:20:37connected in senso it's
00:20:40the degree of freedom of the freedom of movement operational
00:20:46and some of the bonds is in
00:20:49the year for this type of protein branches for instance alpha is.
00:20:55That you can also classified further this reasons looking at
00:21:01the question of assets in this case
00:21:06for instance di composition is the same so amount charge of
00:21:11this two siemens su stream of the same di.
00:21:19Entropy is different di composition
00:21:24different and also di final results
00:21:28on the structure of Information is different.
00:21:31This is more in this case.
00:21:35You can have sort of transient globulare light stark
00:21:43were di the part of the top.
00:21:49Also introduce challenge critical of.
00:21:56Our lab and where we evaluate the ability of
00:22:02sequence based method to the city of positions.
00:22:08In the seekers to be disabled.
00:22:18And we started to look at a so this is
00:22:22an implementation that is one of
00:22:25di di Moss interpreta ball tool for disorder because it is
00:22:34very simple action about and about information that is this time
00:22:41is what we are to
00:22:43be am interacting less with the rest of the structure.
00:22:48So you can build the right to guess what is
00:22:55di diamante of connection that it is doing this is.
00:23:04So we have this two situations.
00:23:09We can have the structure so we have the same sequence so this is
00:23:13di contact map for this specific protein so we can have.
00:23:20Information and we can calcolate for instance
00:23:24or guess what is the energy associated to
00:23:27every positions by counting o many contracts and type of contracts.
00:23:34This is performing structure so you can for instance di close. Up.
00:23:52Positions Anyway, if you
00:23:55consider from the contact map
00:23:58or from the structure you see that this
00:24:00is this way and other this way you can be
00:24:07pending on the type of power of you that are
00:24:11contact assigned in energy for instance.
00:24:14We know that this is charge,
00:24:17this is for instance.
00:24:21While we know that between and Estimate what
00:24:26is the energy from the PDP for instance
00:24:31some force field and evaluate the state
00:24:35of energy e so you can some of the pending on
00:24:39the type ng and the type of power di di
00:24:42di overall associated to this what we have
00:24:47this dataset or we have down
00:24:49this type of nations we can also do something
00:24:54us so we can for instance with the problem as if this.
00:25:02Is all the neighbor use in the sequence,
00:25:09of course this is not on this is this is incorrect,
00:25:13but we can simply Calcolate
00:25:16what is the statistiche of all possible.
00:25:18This is making with the use and Anche per this frequency
00:25:26is the energy that we get from actual real time.
00:25:35So you can calcolate matrix
00:25:39like this from real detta so you can can
00:25:43be probability or die energy that
00:25:46associated every possible powers and you see the sky.
00:25:50See you have a positive energy,
00:25:52which is which means that some of.
00:25:59This point five indicate a good a max.
00:26:07System, which system is the most objective
00:26:12the most pair and this is
00:26:18good much biological expectations because system this
00:26:23for actual chemical covered bond bonds for the soul for systems.
00:26:35For covered bond and you can find nice expectations
00:26:40also for other layers is not.
00:26:48A good descriptor good indicator of
00:26:51stability e one force actions. There is an exception.
00:26:56There is signal tryptophan difficult to
00:27:00explain you see you also and other asset
00:27:04like they are charged and you see they are
00:27:09favour against all other acid except.
00:27:16This is that's also another charge Compensati di di.
00:27:24Enzo su you can find and
00:27:28speculato on al possible find some sort of expectations
00:27:32sometimes okay so there is another matrix that is
00:27:36the prediction matrix and this di the thing is similar,
00:27:44you can can be energy for
00:27:46your position in the sequence by multiplayer the frequency
00:27:51d d and that you and I have to explain how this is
00:27:56completed by the frequency
00:27:59of a specific player in the neighborhood.
00:28:03Adesso multiplayer number of the players that you have
00:28:09in and similar you some of the contributions,
00:28:14of course in this case.
00:28:16The big roll is play not by the take
00:28:19that di consiglieri di contacts but by the facts that you
00:28:24are aim consiglieri di distribution of assets around in the sea,
00:28:32the pattern and di contribution similar and the science that
00:28:37to see that systems having assist to systems assist,
00:28:42two systems year by in
00:28:46the sequence is an indicator that that is holding.
00:28:51Man in the most of di cells in
00:28:57the matrix are not to indicate difficult to.
00:29:05The contribution.
00:29:07Also you see the sky.
00:29:08The sky is different to the matrix.
00:29:13How this matrix been as is said its comparison with Quindi.
00:29:20So energy can be from the structure will
00:29:26be the matrix are actual energy and then you have di estimate the.
00:29:37Energy indicate in the frequency
00:29:41the probability of having some powers and the good adesso.
00:29:45This is the one two to minimum.
00:29:52This is the salt.
00:29:55I più some proteins and with
00:30:01some Intensità di proteine e di aiuto agli States
00:30:10for the structure proteins.
00:30:15AM They are you see the energy of global proteins
00:30:24is towards negative marginal negative values
00:30:29down minus point minus point zero cinque.
00:30:35Zero or above the year not stable so it's that is how
00:30:44stable what is like the world of being stable for
00:30:47given am stretch of assets.
00:30:53Ehm. Okay, so what the other interesting thing that they is in
00:31:00the paper think paper model to analyse.
00:31:10Values of the matrix so to understand if there is some feature
00:31:17that can explain while matrix so investors.
00:31:23These are the white paper
00:31:29so to this side of negative actors.
00:31:36To be stabilite and if you look at
00:31:41di high end of this and you look that
00:31:45di what is the order of this is this is that is perfect capture,
00:31:53what is the city of everything
00:31:56is the first one is not for each and at the end you have the
00:32:01least as the second vector capture di assist.
00:32:09In compenso you see systems is that
00:32:13is By this factors something that indicate da.
00:32:23Positive actors their to be the stars dei didn't
00:32:30provide for this one so find any biological exploration society.
00:32:40For this one day structure breaking so you see Colin here metionina
00:32:49is it is so mettermi
00:32:53in aminoacidi usually you
00:32:55see at the beginning of protein of course.
00:32:57There are proteins is also in the end the middle,
00:33:02but usually if you see in outside di the initial position,
00:33:10that's not the first sequence in Indicator of something
00:33:15weird and the last one is the charge where you see di.
00:33:22Charles due stream of the scales id and arc they
00:33:32are all charged he is in the arts and.
00:33:44They are. The few examples
00:33:53and then am.
00:33:58You can apply and this is what we will see in
00:34:02the in the practical you can
00:34:05actually the energy by yourself and just using the
00:34:09and the arts and for instance
00:34:12you see I think this is duke KEE is this capture.
00:34:23Tu che sei and its?
00:34:28Example so the beginning,
00:34:31this is the stars of
00:34:33a protein di alpha elixir is present when the protein
00:34:40interact with the brain always you see the results
00:34:44sort of attention of disorder on one of them.
00:34:51So you can calcolate global action,
00:34:55you can can eat local coordination where you
00:34:59see there some reasons why concerned so their this is
00:35:06that you can can so what if you do you
00:35:10slice in considera sliding window calcolate energy
00:35:16for every position sliding window so what regards di energy you
00:35:26can simply count contact and type of contact that are within
00:35:30the window actually even
00:35:34without using window you can just consider the contact.
00:35:37In caso di energy and you see what is the situation just at
00:35:44the beginning and addition in the middle
00:35:47that they responses to this to this.
00:35:54Energy so according to you so it doesn't capture.
00:36:00Very very well as if they are for you see the trend is.
00:36:09So the last is definitely disabled on
00:36:15this this one there is some signal here,
00:36:20but it's not have to go about.
00:36:25So but the end is this is more accurate.
00:36:30This is the second based steel it capture.
00:36:35Di di di di
00:36:39say so the other things you may
00:36:42not this is that you have to lines here,
00:36:45we have and the continuous line dash line is the actual value.
00:36:53And the audience is just a moving average
00:36:59only the other so di output of
00:37:04the two hem population is the Blu dust lane cases and.
00:37:13You see this is quite.
00:37:18Say a common things see in
00:37:23second life analyse sequence
00:37:27this were close the sequence opposite properties.
00:37:34This is beautiful.
00:37:36I think this is something why so also you an example of the city,
00:37:45Una and you see that it up and
00:37:51down common so you can hold capture average behavior.
00:37:57If you moving average
00:37:59some signal processing presentation just to Complete
00:38:06di analysis are you also see what is the conservation among
00:38:12the various aim di items di local and global where compiute
00:38:20the cross models that you have for this protein and structure and
00:38:28you can also analysis of conservation of meni models have the same.
00:38:36Sex and. The study once they are
00:38:43è identico all you can do
00:38:46also you can also analyse di file standard.
00:38:51I think this one. Member it.
00:39:01Is this is standard.
00:39:09Andando per example in this case we have
00:39:13a start where you see you have a few
00:39:17here where you have
00:39:21disorder loops and you see what is the next energy with the matrix,
00:39:28What is the estimate?
00:39:31Is it?
00:39:36What is the probability of being contact?
00:39:40So this is also the reason why it is one of the other one
00:39:44of the best matters over the past then fifty years
00:39:49because it can be so you can say
00:39:53okay high score or value corrisponde
00:39:57to di probability of being able to form
00:40:00contact with the best of the structure That's very for for this.
00:40:05This is that superfast is just ma implementation lines of code.
00:40:13If you want to have to make it just about multiplayer
00:40:17and some things okay order questions.
00:40:24Inside implementation day, but before I want to
00:40:31do you the database is the to database is
00:40:35that are the major resource for disorder.
00:40:41One is more DB so am I
00:40:47think this slide this slideshow di what is the distribution
00:40:52of di assets in comparison to you and you see and you have
00:41:01various databases and his mani curate database
00:41:06you see dei dei claim
00:41:08to capture different things that you see there is
00:41:11that there is a certain amount of similari across them,
00:41:16you see they are or similar way.
00:41:22And. Even to there are some slide differences.
00:41:31I would say that it is the most different and that is also
00:41:37this one of this just bank
00:41:40of proteins for you have disorder notation.
00:41:43That's because this is out of two millions to.
00:41:49Point their curation effort focus on some specific families of
00:41:59proteins And you see they are also edit in system so they
00:42:05were metallo starting point funding
00:42:09for starting family of proteins they were in metallo proteins.
00:42:13Invece in system a coordinate is
00:42:18a meta lines and so this is to different very small database.
00:42:24The test they are sort of similar.
00:42:28And the database that we are all about so.
00:42:36These are the databases that some
00:42:40how have information about this order.
00:42:43So we so probably you know how.
00:42:47B. B. B. Which is
00:42:50the one story chemical shift PDB is the European version of
00:42:56the PDB PC di DB database of specific experiments called
00:43:03circular decrease sas BDB is
00:43:07small angle scattering similar to X-Ray,
00:43:10but side of capture in
00:43:12this cutter plot that you gives you information
00:43:15about how big is protein so the shape of the surface of protein.
00:43:22Intel is like this is a database of models of protein families.
00:43:31And di five year old database and that they are focus on the.
00:43:42Other things that it is di
00:43:46human protein organization is an organization that responsible for
00:43:52the final standards for instance defined in
00:43:56the name of the jeans for human organismo su every jeans in
00:44:00our specific and specific name that
00:44:04has been standard used over the years and it's not to do for
00:44:09all the organize and it has some working group that
00:44:13are focusing on the finish standards for experiments
00:44:17like molecular interaction and there is one for
00:44:20intensive proteins and not gene ontology
00:44:27is huge consortium that defined
00:44:31di descriptor for products for jeans is the evidence
00:44:37and conclusion ontology so it
00:44:39enumerate all possible experiments that you
00:44:43have just some projects and some communities that are add on di.
00:44:50AIM that is that have some focus on penso che Moby A.
00:45:02Number of user per per
00:45:06per Monster per dei this is per dei this about
00:45:11this user per but dei now is more and you have different type of
00:45:22annotation that are integrated to mob mob it integrated
00:45:26data from both primary databases database is
00:45:32that capture experimental data that internet capture
00:45:39by the database is to they
00:45:43are linked and they are integrated in to be.
00:45:48You also have to the all that are not in
00:45:53this database are search against Unicode and
00:45:59if there is a position and information is transfer to
00:46:02the omologo protein you can have
00:46:05the right information so if you
00:46:07have structured information or protein
00:46:10for instance for health and you can for instance check.
00:46:15If protein is product with low or confidence and
00:46:21use that values to to this order from the PDB.
00:46:28You can check it is missing so if the end,
00:46:34of course you can ad una large scale and
00:46:38prediction for all you so what about to sports.
00:46:48And then it is also more complicate
00:46:52because it's not just about this older,
00:46:54but also defined what are the binding within
00:46:58disorder widgets and I can also something about
00:47:03the function of the various reasons so I think I
00:47:07can you an example on the website.
00:47:21And for instance you can click on di main example this
00:47:26is fifty three so example that we that you are
00:47:31you have a lot of
00:47:33disorder everywhere and you see the alpha for prediction so
00:47:38there is an central region that is not to
00:47:41fold and form complex with other copy of
00:47:47the same protein and you have the terminal and
00:47:50the terminal reasons that Set completi di can interact with
00:47:54hundred of protein partners and in an overview
00:48:02so the idea is that it aggregate information
00:48:06from various sources click on disorder you see at the top.
00:48:11What is the consensus of things that are integrated from
00:48:15other sources and you can see for
00:48:18you can from Curate or curate databases.
00:48:22For you see this is dead year,
00:48:25you see the missed in the pdb so this gives you and idea
00:48:32how different PDB startups
00:48:37are available in the PDB so you see you have observer position
00:48:42and missing you see different PDB is
00:48:45different situations and this is it
00:48:51to manage of course because maybe
00:48:54you are you see missed because this protein was in
00:48:57isolation may be here you see
00:49:00the full structure just because they were partner associated.
00:49:05We can check this is.
00:49:24The paper was this is interesting.
00:49:30This is the same setting.
00:49:42And that we can other cases like this is one,
00:49:48this is not the truth in contact with something.
00:49:54Okay, so you see in this case.
00:50:06Again. You see missing they
00:50:11are parts of protein connected with indicate in where
00:50:16is the continuation of
00:50:18the sequence or the chain and you see their some parts that
00:50:22are structured here so I know how they
00:50:26managed to to get them structured probably addensanti,
00:50:29but we can the title dei.
00:50:33Big three and fifty protein complex so
00:50:38this is the protein in complex with something else.
00:50:43Sì, Easton.
00:50:47Hundred. And tumor antigene we were looking for you
00:50:58see you have this alpha elixir
00:51:02actually di structure part of
00:51:06the of the complex and this is capture year.
00:51:12And you see for this protein that's very very well did you see
00:51:18o many pdb different pdb You have
00:51:22so different papers that try
00:51:24to the structure in different conditions.
00:51:29Situations with different partners and so on.
00:51:35You can do the same for and
00:51:37more structures and calcolate what are the regions that where you
00:51:41see di most of the division across
00:51:44models and you have in this space,
00:51:48we like that you see you also have
00:51:51you comes to flavor iPad long and
00:51:57short where the different is just simply di
00:52:01sliding Windows or some of the hyper parameter and the way.
00:52:07You see di mobile life is essential a consensus of all this is
00:52:17and you see the probability di times
00:52:23different editors say Edison is disorders to the best.
00:52:32You have so you have to this core so this
00:52:38to the two region that are addicted
00:52:41by high confidence, actually their threads.
00:52:45and these are the visions where you
00:52:48observations exposure in so you see.
00:52:53This is almost complementari to the evaluation,
00:52:58but not always you see for.
00:53:11You can eat this is ambigue streams.
00:53:16This is di disorder part of you also have binding editors.
00:53:24Again you can created life from the PDB or edited by Dichter.
00:53:31You have interaction in this case they
00:53:35are the us from the PDB so you have.
00:53:39Would PDB is that are math or
00:53:43the Partners proteins So you see you can have
00:53:46interaction with the same protein you actually in PDB is showing.
00:53:52Fifty fifty the body
00:53:55can be other things like this one if you click.
00:54:00You should see the structure showing what contacts
00:54:06that is that are in contact with your protein fifty in blue.
00:54:13And function this is completi complete different story for
00:54:18every position every reason you have
00:54:22di gene ontology terms associated for instance are you have protein
00:54:26binding here molecular function activate activity.
00:54:32Protein binding specific type of binding self inibizione which
00:54:38is specific for Protein and so on.
00:54:43Okay, so don't want to confuse you more with
00:54:48mob you this very quickly.
00:54:56So the difference between and this is fully focused
00:55:03on a papers and experimental evidence so.
00:55:17What?
00:55:31Is one?
00:55:35So you don't have Nice about this you simply
00:55:44have a mapping between visions here and
00:55:48the meaning of the reasons for this case says its disorder.
00:55:53And a paper describe this type evidence so
00:56:00if you click here like you see year old edition Napoli year.
00:56:07You see you have describe what you talking about this that is
00:56:14this is this the position
00:56:17and the evidence could so you see the description here.
00:56:22Is cleavage essay evidence using
00:56:27manual session just give you idea what
00:56:30ico ontology is this one for
00:56:33instance nuclear magnetic evidence using manual session.
00:56:40Year X-Ray Graphic Crystal Graphic structure model
00:56:46with missing you coordinate using session.
00:56:54And if you can see how every term is located within the graph
00:57:05so you see the same term is children
00:57:09of multiple parents and you see in
00:57:15this you have you can have structured termination evidence and
00:57:23the other branch you see evidence used in manual because it's
00:57:27actually a curator actually transfer that information, but also.
00:57:36They are. main two things so it at
00:57:40the same time capture fact
00:57:42that the manual session and on the same time.
00:57:45It's a questa graphic evidence.
00:57:49This is the end of this is the one
00:57:54actually did it assign or identifying what is
00:58:00di pattern that better describe
00:58:03di the functions and other turn online ontology that
00:58:08describe the type of evidence and the reference of course
00:58:13the paper and sometimes they also
00:58:16that the statement so were in the paper.
00:58:19This information is provided so you see the sentence.
00:58:22The snippet of text that is
00:58:24the sky and the state of evidence and you see.
00:58:27This is it is written by the others of paper
00:58:31only the use of the domain which also identifier
00:58:36by method were visible in the density maps
00:58:41before and after density modification so this region,
00:58:47which is before this fragment that
00:58:53is a clear statement saying This is order of
00:58:57course editors can do things in different ways of this is
00:59:02this order could say this is not evidence of this region.
00:59:11Eh sì. You have also transitions
00:59:15evidence disorders to order so this is
00:59:18not to switch from a disorder information and order information.
00:59:26Event and you see of di
00:59:34Twenty six bit catering using the crystal
00:59:38only a level segment is order in the structure so this is saying.
00:59:49And also is written on
00:59:52the double fosforilata is the destruction motif
00:59:57probably fosforilazione plays at in di the fold switch.
01:00:04This is this this is about this is the number.
01:00:21One to one entries mob as productions
01:00:26for all proteins like to more than millions products.
01:00:32Uhm, okay.
01:00:34This is this project.
01:00:40And I think I only have few slides describe.
01:00:49A couple of things about mobility and
01:00:54especially how function is products may be
01:00:58interesting for you so we so that we can
01:01:02use omologhe space prediction transfer.
01:01:12Over. Time to identity you can transfer
01:01:20the structure information for function is more complicate from
01:01:25that function is more in this concept is not just about the shape,
01:01:33but also about how proteins what are the types of
01:01:38complex forces and also about differences across species.
01:01:43So the same species perform other things if you look at
01:01:51a high level functions like
01:01:54regulations biological processes their more
01:01:58than simple molecular singularity so is that since.
01:02:09You can use sequence to transfer function,
01:02:14but the constraints constraints are way
01:02:19more restricted so you can not functions identity.
01:02:26You can probably safe transfer
01:02:28function between to protect that night
01:02:32Adesso Dio is to to exploit same things o similari proteins,
01:02:42but inside of their consequences,
01:02:44you can compare and so you can convert
01:02:49sequence into space encoding of transformer,
01:02:56you get the vector for education.
01:03:00And you compare and so in BDB why we do a very simple thing with
01:03:10you for the fun the start and positions of
01:03:15the religions with the embedding of visions actually.
01:03:22In ipotesi di embedding di video E io
01:03:28so this is of course semplificativa,
01:03:32but if for every days you have vector one thousand twenty for.
01:03:39The nations you calcolate
01:03:44average calcolate across the various aminoacidi
01:03:47so you can still one.
01:03:49Victor Orban thousand for
01:03:51the nations and that single vector representative
01:03:58so we have batters associated to have reasons in B in proteins
01:04:07begins and you can start
01:04:10comparing those factors so for every reasons you have
01:04:13one vector same length and you can apply some distance measures.
01:04:20Across the state of factors and see who is
01:04:24similar to you so what we
01:04:27have That contains functional information
01:04:31for all of proteins that you can compare or your vector.
01:04:37The presente intelligence against
01:04:41the vector that you get from this.
01:04:42E si dice am.
01:04:47Il dì in am latin space
01:04:53so what you can do if you can simply do KEE nn so you
01:04:58your vector you measure
01:05:02to you place your vector and the space and you see what are
01:05:05the neighbor and what is the function of
01:05:09the neighbour of the other protein and therefore you can
01:05:13transfer function on the amount of
01:05:17protein opinions that have the same function in the neighborhood,
01:05:22you can defined the minimal number of proteins.
01:05:24Number of parameters use.
01:05:27and also you can define different ways to to
01:05:31measure the distance between this is the two things are
01:05:35partisan distance and cose similari dei they were they were way.
01:05:47To the internet things di amount of function
01:05:52that is starting from aim
01:05:57for this function that you if
01:06:01addison we have the current proteins.
01:06:09You can transfer this information using blueprint to
01:06:13the one thousand proteins using you explain to
01:06:1923 million proteins and this is not that you can eat using
01:06:29this method is of course is
01:06:32your number is very low because also am the constraints
01:06:39are very strict here to you
01:06:42don't want to propagate information is not in
01:06:45databases of what you
01:06:48see is that is better to propagate information
01:06:52where this is the table rather than having a larger coverage,
01:07:00but with a number of false positive so users in database.
01:07:05What you want to do is to minimis for positive at the cost
01:07:09of am getting more.
01:07:17In this is an example and interesting
01:07:21after doing this experiments What you can see
01:07:25for instance these are di back to
01:07:27the present al this notation di Jones and you have
01:07:34just annotation presented year one is saying
01:07:38Addison is disorder and then in termini out of this
01:07:44is the beginning of this is for the end of the end of protein
01:07:51and linker linker definition in
01:07:55the middle and you see how they are coloured in the space.
01:08:00So this is present approssimate distance.
01:08:06Between reasons as embedding factors and you see that they are
01:08:15separate quite while that we are here the Di Final regions.
01:08:25Are you are but you see also that we have
01:08:28some cases where you have to the year in
01:08:32the blue reasons why those annotation and we
01:08:36found out that they were about this is also away to.
01:08:42Do we want back to the curator
01:08:45is the voice does annotation because they are.
01:08:55You see as flexible seat terminal tail so that,
01:09:02but the fragment goes from the first to the first to you so it's
01:09:08the beginning of the clear a mistake
01:09:11should be and terminal and this is the same.
01:09:14This is annotation terminal notation.
01:09:17The division position is that the beginning.
01:09:21lo porti su.
01:09:28Di. Lui o su di.
01:09:36Esso di this is
01:09:38di national curation and manual curation of the regions so
01:09:46the classes are the colors of
01:09:50their manuali associated proteins
01:09:53and also this to teams are manuali
01:09:56down di position of the points
01:10:01is the closing similari using di embedding presentation of regions.
01:10:07Is for.
01:10:10You calcolate di of the new.
01:10:15Year. Of gli schizzi sia da Morrison This is an example of
01:10:25this three types of annotation and is very
01:10:29clear that there are others easy to
01:10:33its more complicate where you have function like
01:10:35activate regulator difficult to say it's easy to see.
01:10:43There is a network because you have positions so you can
01:10:45compare the classification actual position of
01:10:48the vision for abstract abstract functions is more difficult,
01:10:56but you can still ready to.
01:11:02This is the last show,
01:11:04you about this is what we have to this is the collaboration with
01:11:15another lab in the Dai employment a model to
01:11:23a classified reasons based on companies
01:11:27so as is the same so number of
01:11:32pictures sequence features for the sorge pattern and
01:11:39that the simulate the behaviour
01:11:44the structure behavior of some proteins using fields so some
01:11:50sort simulation simulation a dataset
01:11:56of proteins and compare di properties sequence properties.
01:12:05Associated to contact and sequence properties
01:12:11associated two extended so
01:12:14the day fund of parameters that can separate
01:12:18these two classes and immobili B
01:12:21now you have the state of classification.
01:12:25We can you if di
01:12:28extended year if this region extended or if its compact.
01:12:36On this page you can see you
01:12:41see the number of things that are presented year.
01:12:45So you have the type of the feature the source and for extended and
01:12:53compact you see for
01:12:57a compact we have one point millions proteins for extended.
01:13:02We have twenty six million proteins
01:13:05so seems like the two things are
01:13:06not the same at the same,
01:13:11but it's probably expected so this is applied
01:13:15to be that on average and database level.
01:13:22We have more extended to compact wants to be a bias of the others,
01:13:30of course, but we don't know that yet.
01:13:35Ehm, okay, so i finish here di.
01:13:45Questions. I know how It's a lot I don't
01:13:51expect you to the member of the details about Moby DB and.
01:13:58That this. Is an aggregati database that includes also editors.
01:14:07I contains huge amount of information is for aim.
01:14:14scientist that works on a proteins.
01:14:23And other interesting think about it is that it
01:14:27has different levels of evidence so you have
01:14:30prediction data and the data
01:14:34from the PDP and different aspects of disorders.
01:14:40The connection between what is
01:14:43the this is what is the function of reason,
01:14:47which is by definition understand
01:14:51and public character so for instance.
01:14:55If you go to this the number of function.
01:15:00So number of functions comes from the ontology,
01:15:05but you see the number of functions that are
01:15:08specific aim for this or just
01:15:13does just this that you see year is
01:15:18just about twenty different types functions
01:15:21and the most of them are just.
01:15:27For instance in tropic chain.
01:15:29You have flexible seat terminal tail flexible and terminal tail
01:15:33flexible on and you have you see
01:15:37complicate functions like
01:15:40nonstop metric molecular recognition molecular recognition process
01:15:46that fixed
01:15:47documents and by
01:15:50multi variant dynamic or fasi interaction and you see
01:15:54condensate Chandler these new terms and there
01:15:58are very few example in the database by terms.
01:16:04This is something that Typekit
01:16:07recently by steel the number of terms
01:16:12is limited is not likely ontology that
01:16:15contains for forti thousand different terms.
01:16:20Talking about forti at the most so this is really
01:16:26disorder function is understanding
01:16:29in databases and in general in the.
01:16:34So lasting that I want to you probably also continue to morrow a.
01:16:45Open di notebook now.
01:16:56Can copy this notebook.
01:17:01Prediction. IT is essential di implementation of A.
01:17:23News codes is just a few slides if you sex and the first three
01:17:34just about the usual things so connecting to
01:17:39drive setting di working directory install.
01:17:46A few dependence.
01:17:52In this case bio Python site and lib.
01:17:56And it is just for for
01:18:01clothing and their just
01:18:04for clothing and bio Python just for process in the structure.
01:18:08So the call for the corporations really simple.
01:18:16So the first thing that I do and you need to do it is where
01:18:22is to download or get this file so.
01:18:31Please check that you have access to this man.
01:18:36In the middle.
01:18:46In folder.
01:18:54This is provider di set level this data
01:19:00actually this is just di his way that they found to import.
01:19:07a variable so there is nothing special init.
01:19:13So download it.
01:19:29Ok. This is the way to this is to various variables.
01:19:36We have the matrix,
01:19:37which is list of values matrix and
01:19:41the matrix here and it's the matrix.
01:19:46I got them just reading the paper from
01:19:48the PDF just copy past values and.
01:19:53In this file and this is just do list of
01:19:57amminoacidi reference to sort acid in the matrix and in the way.
01:20:05So you see the way I import
01:20:09variables in this is possible this probably
01:20:12not the most tonic is by doing
01:20:16this file this access file excel file command.
01:20:24Built in command from Python and
01:20:27you see the variable that are important
01:20:30as is just the list of the matrix,
01:20:34which is a two dimensional a dei and matrix two dimensional a day.
01:20:43Ok in D.
01:20:52Usual methods.
01:20:57Parser and.
01:21:05Okay. Indi in this block I just
01:21:11color and under the time is nothing special here just a comparison.
01:21:19And you see dei look similar some not differences,
01:21:25but to see the time is the
01:21:29same and the pattern is very similar may be the matrix has
01:21:34more negative or stability interaction is in
01:21:39the matrix interaction are less.
01:21:46Less important Ok so.
01:21:55And here we have the implementation
01:21:58so the implementation is just about
01:22:00some things and we have
01:22:04sequence operations the size of the window where ricalcolate
01:22:08di actual the frequency
01:22:13of the possible powers and smoothing parameters
01:22:18which is the size of di sliding window in dí in dí,
01:22:24in dí, in dí,
01:22:26in dí output di initial.
01:22:29Ehm. Okay, so first Calcolate prediction.
01:22:37Calcolate in the matrix associated
01:22:41possible powers and then
01:22:43calcolate smooth version of this energy when
01:22:46I calcoli di moving average so I defined index so year di di
01:22:53complicazione is just that using
01:22:55matrix operations to make the circulation more efficient.
01:23:00IT would be efficient if you do all
01:23:04di inside sequence because this is just
01:23:09talking about just matrix a small matrix and a sequence
01:23:14that still small sequence am.
01:23:22So for every index.
01:23:26In our acid list calcolate slice goes between the mind and
01:23:36the parameters here I got from the paper so
01:23:40window size of a hundred is actually the window size that is using
01:23:44the in the paper and also the minimum sequence separation of
01:23:49two is the actual values describe the paper and
01:23:54the implementation we are late so probably better explain
01:24:04it to morrow without trying to make mistake so explain tomorrow,
01:24:13but you can have a look at the end tomorrow to use with couple of
01:24:17example and then probably all time also to continue the next topic,
01:24:24which is the nation of molecular dynamics simulation and
01:24:27how calcolate what force fields how force fields are
01:24:31employment and how they can capture a physical properties
01:24:36of simulate physical properties of molecular interaction.
01:24:42Questions. Okay, so tomorrow if you want to
01:24:48see this piece of God on your laptop is bring with you.
01:24:53Ok, thank you.