Arizona State University College of Liberal Arts and Sciences

Experiment III

Bioinformatics

 

The amount of DNA sequence information is staggering.  The genome sequence of hundreds of organisms is known, and the total amount of sequence information numbers in the hundreds of billions of nucleotides.  All this information would be pretty useless unless we have excellent tools to search through and utilize this info.  And we do.

The purpose of this “in-silico experiment” is for you to become familiar with some of the tools to utilize genomic and proteomic information.  And really the only way to become familiar is by practice.  For this reason, you will be mostly on your own on this (the TA will be there to explain and assist, but obviously not to answer the questions for you); you can explore for yourself, and see what all is available for interpretation of genomic information.  “Help” menus are an awesome resource if you are using web pages and tools for the first time, so please make frequent use of them.  If you don’t quite get done with the “experiments” during the lab period, just continue with them elsewhere; the web tools are accessible from any computer connected to the internet.

As with other experiments, there are Prelab questions that are due at the start of each lab.  Moreover, as each week of Experiment III pretty much stands on its own, your report updates are due weekly for this experiment.  In your report updates you must address all questions asked in the narrative of the Experiment of that week, and from what you include in your lab updates it must be clear that you went through the experiment point by point, and got the right results.  This may be best done by having each sub-question/part of the experiment (a, b, c, etc.) be addressed in a separate paragraph in the Results.  It will be most efficient if you write the skeleton of the Results updates while you are going through the experiment.  You may want to bring your memory stick or equivalent to the lab in order to be able to copy the fruits of your labor on to it. 

The Discussion section of your report serves to integrate what you learned in this experiment and to discuss general questions that are in the Experiment III narrative.  In addition, there are discussion points at the end of each week’s narrative that you need to include in your weekly report update as well.

Experiment III-9

Prelab question:

What can BLAST be used for?  What is the input (DNA or protein sequence, or both)?  What is the output?  What do e-values mean?  If you need a refresher on this, go to http://www.ncbi.nlm.nih.gov/BLAST/ and look at the “help” pages.

The “experiment” for today:

a. Using BLAST (accessed via http://www.ncbi.nlm.nih.gov/) determine what a protein with the following sequence is, and what organism it is coming from:

MVTLLENPFRTGLRQERTPEPLILTIFGASGDLTQRKLVPAIYQMKRERRLPPELTVVGFARRDWSHDHFREQMRKGI
EEFSTGIGSEDLWNEFAQGLFYCSGNMDDPESYLKLKNFLGELDEKRNTRGNRVFYLAVSPNFFPPGIKQLGAAGMLS
DPVKSRIVIEKPFGRDLSSAQSLNRVVQSVCKENQVYRIDHYLGKETVQNLMVFRFANAIFEPLWNRQFVDHVQITVA
ETVGVEERAGYYESAGALRDMVQNHLMQLFCLTAMDPPNAIDADSIRNEKVKVLQATRLADINNLENAGIRGQYKAGW
MGGKPVPGYREEPGVDPSSTTPTFAALKLMVDNWRWQGVPFYLRTGKRMPKKVSEIAIQFRQVPLLIFQSVAHQANPN
VLSLRIQPNEGISLRFEAKMPGSELRTRTVDMDFSYGSSFGVAAADAYHRLLLDCMLGDQTLFTRADEVEEAWRVVTP
VLSAWDAPSDPLSMPLYEAGTWEPAEAEWLINKDGRRWRRL

b. You hopefully figured out successfully what enzyme this protein is, and in what organism it occurs.  The next question is where in the metabolic pathway this enzyme is functioning.  A very useful website with information on which enzymes do what is KEGG, the Kyoto Encyclopedia of Genes and Genomes (http://www.genome.ad.jp/kegg/).  KEGG Overview and KEGG Databases (see links on left margin) provide you with an overview of what’s all on the site.  What do you think is a good way to find out in which KEGG Pathway Module the enzyme in a. is located?

c. Based on your research in b., which pathway(s) in the KEGG Pathway database (http://www.genome.ad.jp/kegg/pathway.html) is/are the one(s) that include(s) your enzyme?  Click on the corresponding pathway in Section 1.1 (Carbohydrate Metabolism).  What do you find?  Is there more than one way from D-glucose to 6-phospho-D-gluconate (remember that the first step of glycolysis is conversion of glucose to glucose-6-phosphate)?  And what do you think those numbers (e.g., 1.1.1.49) mean?

d. What you need to know is what pathways may actually exist in the particular organism that your enzyme came from.  Near the top of the screen, select the organism you found in a. that your protein was coming from.  You will see some color changes on your screen.  What do the green boxes mean, you think?  Can you determine from this screen whether there is a potentially full pathway to go from glucose-6-phosphate to glyceraldehyde-3-phosphate?

e. You hopefully remember from earlier classes that glucose-6-phosphate and glyceraldehyde-3-phosphate are in the glycolysis pathway.  Can you figure out whether in this organism there are genes coding for all required steps of glycolysis as well?

f. Now look at glycolysis in Geobacter sulfurreducens, a common soil bacterium.  Does this organism have all the genes required to do glycolysis?  If not, how do you think it survives?

g. The genome sequence provides a wonderfully predictive way of what reactions the organism may be able to perform.  Discuss what are some of the caveats in the predictions: do you think it is possible that an organism cannot perform a certain function even though it has the gene for it?  Conversely, would it be possible that an organism can catalyze a certain enzymatic conversion even though it does not seem to have the appropriate genes?

In your weekly update (and therefore also in your lab report), not only address all points and questions (ag) above, but also provide a narrative that integrates the questions.  The Discussion section of your report is particularly suitable for this purpose.  Examples of materials to include in your Discussion section of your weekly update this week:

1. Discuss the significance of using the alignment properties of the BLAST algorithm.

2. Point out and discuss the major differences shown between the glycolysis/gluconeogenesis pathways for Synechocystis sp PCC6803 and Geobacter sulfurreducens.

Experiment III-10

Prelab questions:

Continuing with what we did in Experiment III-9, this week we will look some more at metabolic pathways and their presence in specific types of organisms.

1. Does the presence of a gene in an organism mean that the metabolic reaction step catalyzed by the corresponding enzyme is efficient?  Explain. 

2. Does the presence of a metabolic reaction step that requires a specific enzyme in an organism imply that the gene for such an enzyme is actually present in the organism?  Explain. 

3. In prokaryotes the operon arrangement of genes often has functional significance.  Describe briefly in your own words what that functional significance is: how would you be able to predict the function of one of the gene products in the operon if you knew the function of one of the other gene products in the operon?

The “experiment” for today:

First we will continue with the information available on KEGG.  Last week you saw that there was a real difference between organisms in how they do even the most basic reactions such as conversion of glucose to pyruvate.  You probably have gotten the impression from your textbooks that glycolysis is the only game in town to do this, but many prokaryotes do just fine without it, and use other pathways.  It would be good to be able to find out which organisms do what, and compare.  So we will explore this in some more detail below.

a. Go to http://www.genome.jp/kegg/metabolism.html, and click “glycolysis/gluconeogenesis”. This will lead you to the glycolysis pathway and a bunch of related pathways, similar to what you saw last week. Now click on “Ortholog table” at the top of the screen (orthologs are related proteins of similar function, but from different organisms).  At the top of the table you will see the enzyme numbers and the enzyme names, and the first column represents different organisms.  You can click on “Select” and enter specific three-letter codes to see which organisms the codes correspond to.  Do all organisms have a gene for phosphofructokinase (a key enzyme for glycolysis)?  If not, select five organisms that do not have a phosphofructokinase gene; you will be finding out in a subsequent question in this experiment what they use instead.

b. First a little more about this ortholog table.  By clicking on a link in a table matrix you will get information on the sequence of the gene and protein of a specific function in a specific organism.  In the left column of the table, clicking on “P” behind an organism code will lead to a color-coded chart of metabolic steps that the organism appears to have according to its genome (see last week’s experiment). “G” will give you the location of the various genes in this chart on the genome, and “T” gives information regarding the proteins and genes in this table for the selected organism (including their sequence in FASTA format). Get 10 phosphofructokinase protein sequences from ten prokaryotes in the table in FASTA format, also get the ones from Homo sapiens, Xenopus laevis, Caenorhabditis elegans, Drosophila melanogaster, and Strongylocentrotus purpuratus (sea urchin).  Align the 15 different sequences using the ClustalW program at http://www.ebi.ac.uk/Tools/clustalw/ or http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_clustalw.html.  Which ones are closest to each other? Do you think this makes sense?  Do you think that the very different N-termini for the eukaryotes (and perhaps also the prokaryotes) are real?

c. Now go back to the five organisms you selected that did not have phosphofructokinase.  Find out what they do instead (if anything) to go from glucose-6-phosphate to pyruvate.  By now you should know what to “click” in order to get the information needed.

However, KEGG is not “the only game in town” to provide this type of information.  Another useful site is BioCyc, with the related EcoCyc and MetaCyc toolkits.  EcoCyc (http://ecocyc.org) essentially is an E. coli "encyclopedia"; E. coli has been chosen for this as there is much information that has accumulated regarding this model organism over the past decades, but similar pages (perhaps less detailed) exist for other organisms.  Look over the EcoCyc overview on http://ecocyc.org/background.shtml.

d. On the Project Overview site, click on "metabolic pathways" and then find on "glycolysis" and "glycolysis I" (quite a long list of stuff, eh?).  The link will lead you to the reaction steps, and mousing over the steps you will find more detail regarding the reactions, enzyme names, activators and inhibitors, etc.  More detail than on the KEGG site.

e. Scroll down to "Locations of Mapped Genes": What do you think the circle represents? What happens if you click on the little purple lines on the circle? What do you think those purple lines are?

f. Click on all purple lines on the circle, and for each of them note the chromosomal location and the function of the corresponding gene. Do you find any operons?  Reconstruct the glycolysis reactions with the enzymes coded for by the different genes. Why do you think some steps require more than one protein? What do you think is the numerical code under “gene-reaction schematic”?

g. On the “glycolysis I” chart, go to the “Genetic Regulation Schematic”.  Explain in your own words what this chart is telling you.
If you are interested in a specific metabolic pathway, we have seen that in KEGG you can find out in which organisms genes for particular enzymes in this pathway are found.  We will now look at another site (MetaCyc, related to EcoCyc) that provides complementary information.

h. Go to MetaCyc (http://metacyc.org). Click on “Database Search”. On the resulting BioCyc Query Page, select “MetaCyc” as dataset, and then click on “Choose from a list of all pathways”. The list you now get is a lot longer than the list of metabolic pathways in EcoCyc. Why? Find “glycolysis” again. You will see that there is a link to “glycolysis I”, “glycolysis II”, “glycolysis III”, “glycolysis IV”, and “glycolysis V”. What are the differences between the various ones? Do they occur in the same organism?

i. On the glycolysis I page in MetaCyc, click on the arrow leading from fructose-6-phosphate to fructose-1,6-bisphosphate. You will find a list of about eleven phosphofructokinases. Click on each. Looking at the info on these pages, discuss the properties of the various types of phosphofructokinases found in nature.

j. On the same page as where your eleven phosphofructokinases were listed, click on “cross-species comparison” at the top of the page.  Then select all species.  Do all species have a phosphofructokinase gene according to this list?  Specifically look for Synechocystis sp. PCC 6803, and note what is listed there.

k. Now go to CyanoBase, click on Synechocystis sp. PCC 6803, and search for phosphofructokinase.  What do you find?  Are they “real”?  (How would you find out?  Hint: do a sequence alignment).  The moral of the story?

In your weekly update (and therefore also in your lab report), not only address all points and questions (ak) above, but also provide a narrative that integrates the questions.  The Discussion section of your report is particularly suitable for this purpose.  An example of materials to include in your Discussion section of your weekly update this week:

Experiment III-11

Prelab questions:

Today we will explore what information can be gleaned from comparing complete genome sequences.  Just to get into the spirit, please visit listings of prokaryotic strains and species with sequenced genomes at JGI (Joint Genome Institute) (http://genome.jgi-psf.org/mic_home.html) and TIGR (The Institute for Genomic Research; now part of the J.Craig Venter Institutes) (http://cmr.tigr.org/tigr-scripts/CMR/shared/Genomes.cgi?crumbs=genomes); also look at some of the eukaryotes with a sequenced genome on http://genome.jgi-psf.org/euk_cur1.html and on the sub-pages of http://www.tigr.org/db.shtml.

1.  About how many sequenced prokaryotes did you find?  And how many sequenced eukaryotes?  Why are these numbers so different, you think?

2.  About how many genes total do you think the sequenced prokaryotes represent?  And how many genes the eukaryotes?  Explain how you got to your numbers.

3.  Go to http://www.nature.com/nature/journal/v437/n7055/full/nature04072.html (there’s also a copy of this chimpanzee sequence paper on Blackboard).  In the first part of this paper, what can you glean regarding the similarity between humans and chimpanzees?

The “experiment” for today:

The Integrated Microbial Genomes resource on the JGI (Joint Genome Institute) website is at http://genome.jgi-psf.org/mic_home.html and is quite a useful site. 

a. Go to the Genome Browser at http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=FindGenomes&page=findGenomes, with the aim of comparing genomes from Escherichia coli K12 and Escherichia coli O157:H7 EDL 933.  At IMG Home, which tab on top do you think you need to click for this purpose?  When you do that, what do you see?  Is it just comparing the ones you want?  Hint: what about “find genomes” first?  Now, try again...

b. What are the “genome statistics” on the comparison page telling you?  What do they represent?  Just the statistics of one of the two E. coli strains?  If you compare genomes from two E. coli strains, how similar do you expect them to be?  Why is that?

c. Let’s see whether reality agrees with your intuition. On the bottom of the Genome Statistics page, go to “Breakdown by selected genomes, general statistics” and “Breakdown by selected genomes and COG function categories”.  By modifying the information requested on these pages, see how similar these two Escherichia coli strains actually are.  Include information on the number of genes, genome length, % of the genome that is codons, % of genes with a functional prediction, % of genes coding for enzymes. 

d. Now go to “VISTA” (can be reached from the Genome Statistics page) for a more visual comparison of the two sequences.  When you click on one of the strain names, what do you see?  What do you think the “15 kb” on the top of the page means?  How much do you actually want to compare in one view?  How do you think you can get there?  (Hint: try a right-click on the mouse when you see a horizontal double arrow on the bar that indicates the alignment length; the re-alignment will take a while as there’s a ton of data of adjust.)  With any of these web-based tools, there’s a bit of trial and error when using the tools for the first time.  How similar are the two genomes to each other over as much of the genome as you can find?  Is the similarity pretty much identical over the entire genome?  What are your thoughts on this?

e. To get a feel for how similar unrelated prokaryotes may be to each other, compare two prokaryotes from different genera with each other in the same way you have compared the two Escherichia coli strains.  Discuss your results in your weekly update and report.

f. You will see that eukaryotes are absent from the VISTA pages.  One might think that that’s simply because we are looking at the Integrated Microbial Genomes page here, but there’s more behind it:  What do you think would happen if you were going to compare the genomes of a human and a chimpanzee and a mouse using a tool like VISTA?  Where do you think the primary areas of similarity are going to be?  There is a bit on this in the chimpanzee sequence paper listed above, and some more on this in one of the mouse genome papers (on Blackboard; Waterston et al. (2002) Initial sequencing and comparative analysis of the mouse genome 
Nature 420, 520-562).  Discuss this question of prokaryotic vs. eukaryotic alignments in the Discussion section of your weekly lab update and report.

In your weekly update (and therefore also in your lab report), not only address all points and questions (af) above, but also provide a narrative that integrates the questions.  The Discussion section of your report is particularly suitable for this purpose.  Examples of materials to include in your Discussion section of your weekly update this week:

1.  Keeping in mind the results you obtained from the comparison of the two E. coli strains vs. the two unrelated organisms you chose, what can you say about prokaryotic “species”?  How do you think strains can obtain large amounts of new DNA, and what would be evolutionary pressures to retain the new DNA?  What does this tell you about how gradual or non-gradual evolution can be?

2.  Explain possible reasons why VISTA is a tool solely designed for the comparison of prokaryotic genomes.

 

Experiment III-13

Prelab question:

In most organisms, at least a quarter or a third of the open reading frames in genome sequences do not code for known proteins.  It is important to try to predict whether one of those open reading frames codes for a “real” protein of as yet unknown function, or the open reading frame most likely is just a “fluke”.  What are three experimental or in silico criteria (you may name more if you like) that could help you determine whether a specific open reading frame found in the genome sequence of an organism most likely is coding for a real protein?

The “experiment” for today:

Via the NCBI website you used before, get the protein sequence of Slr0408, annotated as a hypothetical protein from Synechocystis.  We will be trying to find out some of its properties and what it is related to, so that we may get some clues about its function.

a. How many amino acid residues does the protein have?  What, therefore, is its approximate size (in kDa)?  Relative to other proteins you know of, is it large or small?  What is the chance of a random sequence of the corresponding DNA being an open reading frame of this length (in other words, what is the chance of having a string of codons of this length that does not contain a stop codon)?

b. Do a BLAST on this protein to see whether there are proteins in Synechocystis or in other organisms that look like it, and that maybe have an assigned function.  BLAST on this protein is going to take a while (discuss why?), so continue with the next sections while you keep BLAST running in the background.  Copy and paste the BLAST results in a file that you save on your memory stick (or into a new document that you email to yourself) to use in your report.  Make sure to discuss the results: do other proteins align along the entire length of Slr0408?  What do you think that may mean?  What is the organism from which the two best hits come?  What do you think that implies?  What are some of the assigned functions (protein names) for the best 50 hits?  Can you give a quick summary of what those proteins are or do?

c. Visit Expasy (http://us.expasy.org/).  This site has a lot of useful bioinformatic protein analysis tools.  Focus on “Tools and software packages”, and click on “Proteomics and sequence analysis tools”.  First, we want to know whether this is a membrane protein or a soluble protein (why would this be useful information?).  To get some information that may help to answer this question, scroll down to the “Topology prediction section”, click on “TMPred”, paste in the Slr0408 sequence, and run the program.  What does it tell you?  Again, good to download the results on your memory stick, or email them to yourself, so you can interpret and discuss them in your weekly update and report.

d. Another useful piece of info may be whether this protein has repeated domains.  This can be done using the REPRO tool that is under “Primary structure analysis” on Expasy.  There is a lot of info to sort through for a computer, and this by necessity leads to a slow processing time.  Note the address of the web page where your results will appear, and access this information later at home.  In any case, what do you think is striking in the results of the analysis, and how would you interpret the data?  What do you think is the statistical probability that two amino acids in an aligned sequence are identical?  How do gaps in the sequence alignment influence this probability?

e. Yet another way to get potentially useful information is a search for functionally relevant motifs in the primary structure of the protein.  This is done by InterPro Scan, again accessible from the link on the Expasy website.  In this case, it will provide information that may be hard for you all to fathom what it means exactly, but it does not hurt to just paste in your sequence, and see what InterPro Scan comes up with.  As time permits and if you are interested, please read the Help files.

f. PROSITE Scan (again accessible via a link on Expasy) is another good one.  Have a look at all the motifs it recognizes.  Note that the presence of any motifs (glycosylation, etc.) does not mean that such a post-translational modification must occur, but it gives you something to consider.  Also note that many of the modification motifs have quite a bit of degeneracy (how can you see that in your results?), so for a protein the size of Slr0408 there is some chance that hits are random.  For this reason, there are “randomized probabilities” listed in the PROSITE Scan results:  Interpret your data with these probabilities in mind.

g. It will be important to know whether the protein is ever found “in real life”.  There is no published evidence yet that Slr0408 has been found in proteomic studies (where proteins present in an extract or preparation are identified).  However, this is not surprising as large proteins are easy to miss in gel-based approaches (why?).  An alternative approach is to see whether transcripts of the gene are found.  In some microarray studies, transcripts for this gene have been detected, and therefore it is likely to be “real”.  How does this relate to your discussion re. point a.?

When you integrate this information in your weekly update and report, try to think what this protein may do.  You most likely will not come up with a detailed hypothesis necessarily (after all, the gene is annotated as coding for an “unknown protein”), but you should think about and comment on what are some of the things you can find out about proteins from their sequence, and how some of the tools that are available to you can give you ideas about possible function.

In your weekly update (and therefore also in your lab report), not only address all points and questions (ag) above, but also provide a narrative that integrates the questions.  The Discussion section of your report is particularly suitable for this purpose.  An example of materials to include in your Discussion section of your weekly update this week:

  1. Explain the likelihood of the actual existence of the Slr0408 protein.
  2. How is the composite of the websites you used capable of giving you integrative information on your protein?  Is there other info on this protein that you feel must exist but is lacking from these sites?

 

Experiment III-14

An issue that comes up frequently in proteomics is how one identifies an isolated protein (for example, excised after 2D gel electrophoresis).  A convenient method is to look at the mass of a protein.  Mass spectrometry is very precise and requires only small quantities of material.  However, knowing the mass of a complete protein isolated from an organism is not necessarily going to give you an unambiguous identification of the protein because often the mass does not match anything that is in the database of predicted protein masses (calculated from adding up the masses of the residues according to the predicted sequence, and taking into account natural isotope abundance).

Prelab question:

What are some of the reasons (try to think of three different ones) why the experimentally determined mass (determined by mass spectrometry on the isolated protein) may differ from the mass calculated from adding up the masses of the residues according to the predicted sequence, and taking into account natural isotope abundance?

The “experiment” for today:

To counter the issues you hopefully identified in the prelab question, it is usually good to do a trypsin digestion of the protein (or of a protein mixture with limited complexity, i.e., a mixture with not too many different proteins), and then to determine (using a MALDI-TOF mass spectrometer) the masses of tryptic fragments of your protein(s) (tryptic fragments means that these are fragments that resulted after treatment of the protein with trypsin).  Once you have these data, you can figure out what are good matches.

Assume you have isolated a protein or a protein complex from blue-green-looking bacteria that may or may not be an axenic (pure) culture.  Listed below are the 343 monoisotopic sizes (m/z) of the tryptic fragments that you have found upon MALDI-TOF analysis.  You know that trypsin may not necessarily cut at all sites all the time, so it is possible that it misses cleavage once in a while.  Questions to you:

a. Using MS-Fit that is part of ProteinProspector (see the Expasy website for the link), find the protein(s) that matches best, and interpret in your own words what the web tool is telling you. 

b. The protein complex came from an environmental sample with lots of different “strains” from a microbial mat.  Can you determine which strain this sample most likely came from?

c. To get an idea how many mass fragments actually are needed for an unambiguous identification of the protein complex, take 10, 50 and 175 masses from the list below (randomly selected), and redo your MS-Fit search.  What do you find?  Interpretation?

d. You may not have all masses in your MALDI-TOF that you expected for the protein, and you may have other masses in your MALDI-TOF results that were not predicted bioinformatically.  Where would these discrepancies come from, and are such discrepancies a major problem?  Explain.

In your weekly update (and therefore also in your lab report), not only address all points and questions (ad) above, but also provide a narrative that integrates the questions.  The Discussion section of your report is particularly suitable for this purpose.  An example of materials to include in your Discussion section of your weekly update this week:

1.  Explain the rationale behind doing a trypsin digestion of a protein to be identified, and why this approach is most commonly used to fragment target proteins.

2.  What kind of data are inputs and outputs when using ProteinProspector tools?  How would one get the input data, and what can the output data tell us?

 

Monoisotopic m/z values:


347.1925

359.2401

361.183

374.2034

393.2245

418.2045

423.2602

446.247

450.2195

460.2514

460.2514

521.3194

521.3194

536.3191

549.3144

571.3198

582.3068

598.3017

604.3049

606.3206

645.3566

658.3883

676.3988

718.3883

739.341

744.4614

747.3995

747.4182

762.3781

763.4131

763.4825

764.3937

770.362

779.4046

801.4577

832.4999

833.4304

846.5043

874.4894

875.4945

889.4738

922.4604

928.4734

935.5343

938.4553

951.5292

964.4523

967.536

978.5037

981.4789

987.5218

991.532

994.4986

1030.542

1059.562

1075.557

1155.591

1169.559

1197.659

1205.627

1206.659

1210.69

1227.6

1243.679

1279.633

1295.628

1296.659

1299.72

1302.626

1311.622

1311.692

1312.654

1315.715

1318.621

1325.66

1328.649

1393.679

1407.635

1423.63

1424.776

1444.711

1454.796

1459.733

1480.71

1496.705

1544.79

1547.686

1547.69

1549.78

1651.847

1664.782

1667.842

1700.891

1718.88

1807.948

1823.943

1848.868

1853.983

1864.863

1876.992

1889.882

1945.981

1975.007

1987.163

2002.912

2018.085

2018.907

2021.072

2034.08

2043.999

2044.928

2052.923

2059.994

2060.923

2075.989

2110.102

2126.096

2188.009

2218.087

2229.076

2232.054

2245.071

2248.049

2253.107

2286.093

2296.045

2302.088

2318.083

2352.042

2357.062

2374.189

2390.225

2442.194

2458.189

2472.198

2474.184

2480.137

2488.193

2527.287

2543.282

2574.295

2588.234

2604.229

2651.271

2667.266

2715.509

2731.504

2756.273

2759.372

2769.457

2770.409

2775.367

2793.275

2833.516

2837.425

2849.511

2958.631

2974.626

2981.489

2985.464

2996.595

3001.459

3006.457

3017.454

3022.452

3037.443

3165.538

3227.518

3278.659

3297.611

3303.794

3313.606

3316.547

3319.789

3332.542

3335.784

3354.727

3366.642

3382.637

3459.895

3468.644

3475.89

3491.885

3522.743

3538.738

3570.7

3586.695

3593.903

3609.898

3628.724

3643.78

3691.815

3707.81

3716.678

3723.805

3857.903

3873.898

3971.056

3987.051

4081.195

4097.189

4286.16

4302.155

4318.15

4334.145

4437.23

4453.225

4469.22

4512.376

4528.371

4544.366

4699.403

4715.398

4731.392

4777.452

4791.364

4793.447

4807.359

4809.442

4823.354

4996.642

5012.636

5102.632

5118.627

5157.553

5173.548

5189.543

5258.734

5274.729

5473.801

5489.796

5505.791

5521.786

5563.801

5579.796

5595.791

5611.786

5627.781

5643.776

5740.002

5755.997

5757.028

5773.023

5892.876

5908.871

5924.866

6016.025

6032.02

6048.015

6117.081

6133.076

6232.079

6248.074

6264.069

6393.103

6409.098

6419.238

6425.093

6435.233

6545.286

6549.204

6565.199

6581.194

6792.289

6808.284

6824.279

7070.777

7086.772

7216.621

7232.616

7556.81

7805.901

7821.896

7822.928

7837.891

7838.923

7854.917

7918.787

7934.782

7950.776

7966.771

8179.082

8195.077

8343.166

8359.161

8614.294

8630.289

9237.997

9239.731

9253.992

9255.726

9269.987

9271.721

9486.755

9502.75

9518.745

9534.74

9550.735

9566.73

9582.725

9643.974

9659.969

9675.964

9963.44

9979.435

9995.43

10251.25

10267.25

10283.24

10432.19

10448.18

10449.22

10464.18

10465.21

10480.17

10481.21

10496.17

10497.2

10512.16

10513.2

10528.16

10529.19

10545.19

10870.5

10886.5

10902.49

10918.48

11015.42

11031.42

11047.41

11063.41

11079.4

11095.4

11111.39

11157.66

11173.66

11189.65

11271.9

11278.8

11287.9

11294.79

11303.89

11310.79


 

Experiment III-15

We have explored some useful websites regarding genomes and protein analysis, but just as important is understanding which protein in a cell may interact with which other protein(s), as this provides a next level of insight into regulation of proteins, protein complex formation, protein assembly, etc.  However, protein interactions are sometimes hard to detect experimentally (some interactions are loose or temporary), and there is often a host of literature on the subject without much of a consensus.  To help getting an overview on what protein-protein interactions are feasible and which ones are likely and potentially functionally relevant, today we will be exploring the web tool STRING (http://string.embl.de/) that “predicts” protein-protein interactions for a particular protein from an organism with a fully sequenced genome.  The predictions are often less-than-correct, but usually one can find out what the basis for the predictions was, and in some cases the predictions can provide ideas on what a protein may be doing or what it may be interacting with.  So, knowing its limitations, it’s actually quite a useful program as long as the results are viewed as in silico predictions rather than experimental truth.

Prelab question (or experiment, actually):

Write down three proteins from sequenced species that are of interest to you and that you hopefully know a little bit about, and prioritize them.  Each student will work on a different protein.  Then go to string.embl.de, type in the protein and choose the organism, choose “proteins” for interactors wanted, and verify that entries indeed are available for your top three choices.  If not, adjust your choices and repeat the procedure.  Make sure to show up at the lab with three choices that you are prepared to go into more detail for and possibly read a bit about.

The “experiment” for today:

First, the group will compile which protein(s) each of you would like to query, and compare it to lists of earlier lab sections, and the TA will assign to each of you your “own” protein and organism based on your choices and on what other students already did or are doing.

Now, go to STRING, type in your protein and organism (again, “proteins” for interactors) to see which 20 proteins (not the default setting; can be changed once you did the initial search) your protein is most likely interacting with.  You will see an interaction matrix (neighborhood, gene fusion, co-occurrence, etc.) with dots (black if clear, grey if in grey area) representing a “hit” for that.  You can get more information on these hits by clicking on the corresponding rectangle on the “Views” line.

Evaluate the information that you obtain (make sure you visit most or all of the “Views” as well as the “Summary Network”), and discuss in your weekly update and report at least the following aspects:

1.      What is the nature of the interaction of your protein with at least three of the proteins on your list (do they form a complex, etc.)?

2.      Are there any functional interactions that you expected and that did not show up, or are there putative interactions that surprised you?

3.      What is the summary network telling you?

4.      Explain what type of information is provided in all the rectangles on the Views line. 

Happy thinking!

 


Center for Bioenergy & Photosynthesis

Arizona State University

Box 871604

Room PSD 209

Tempe, AZ 85287-1604

 

28 August 2007

phone: (480) 965-1963

fax: (480) 965-2747

Contact Webmaster Larry Orr

Accessibility | Privacy

Copyright and Trademark Statement