STRING/Cytoscape Exercises We are going to use the stringApp for Cytoscape (http://apps.cytoscape.org/apps/stringapp) to query the DISEASES database (http://diseases.jensenlab.org) to retrieve a network of proteins that are associated with diabetes from the STRING database (http://string-db.org). We will investigate this network in Cytoscape and make some observations about the proteins and processes involved in the disease as well as compare it to another related disease (cardiovascular system disease). 1. Retrieve network for diabetes mellitus Open a new session in Cytoscape from the menu File -> New -> Session and go to the menu File -> Import -> Network -> Public Databases. In the import dialog, choose ‘STRING: disease query’ as the ‘Data Source’ and insert ‘diabetes mellitus’ into the ‘Enter disease term’ field. The next dialog shows all the matches that the stringApp finds for your query and selects the first one. Make sure to select the right one and continue with the import by pressing the Import button. 2. Browse the node attributes table Note that the retrieved network contains a lot of additional information associated with the nodes and edges, such as the protein sequence, tissue expression data, disease score as well as the confidence scores for the different interaction evidences. In the following, we will explore these data using Cytoscape. Browse through the node attributes table and find the disease score column. Sort it by descending values to see the highest disease scores. You can highlight the corresponding nodes by selecting the rows in the table, bringing up the context menu (right-click the selected rows) and choosing the ‘Select nodes from selected rows’ option. Give an example for a node with a disease score of 5 and one with a disease score below 4. Rename the ‘disease score’ column to ‘diabetes score’ by right-clicking the name and choosing the ‘Rename column’ option. We will need this column later in the exercise. 3. Inspect tissue expression data The stringApp automatically retrieves Information about which tissues the proteins are expressed from the TISSUES database, which we will take a look at first to better understand the data. Go to http://tissues.jensenlab.org/ and enter insulin (gene name: INS) into the search box. The resulting page will show you an overall representation of where in the body insulin is located, and below you can see tables containing the specific lines of evidence that contribute to the overall score. What tissues is insulin present in with a confidence of 4 or above? What source do these interactions come from? 1
4. Style the network Cytoscape allows you to map properties of the nodes and edges to visual parameters such as node colour and edge width. We will map the pancreas tissue expression data to the node colour. From the left panel top menu, select 'Style' (it's to the right of 'Network'). Then click on the triangle to the right of the property you want to change, for example ‘Fill Color’. Next, set the 'Column' to the node column containing the data that you want to use ( tissue pancreas ). Since this is a numeric value, we will use the 'Continuous Mapping' as the ‘Mapping Type’, and set a color gradient for how likely each protein is expressed in pancreas. Double click on the rectangle to the right of ‘Current Mapping’ in order to bring up the 'Continuous Mapping Editor' window and edit the colors for the continuous mapping. For example, you can use a white-to- black gradient for low-to-high tissue expression confidence. In the mapping editor dialog, the colour that will be used for the minimum value is on the left, and the max is on the right. Double click on the triangles on the top and sides of the gradient to change the colours. The triangles on the top represent the values at which the data will be clipped -- anything above the right triangle will be set to the max value. This is useful if you have a small number of values that are significantly higher than the median. As you move the triangles and change the colour, the display in the network pane will automatically update -- so this is all easier to do than to explain! If at any point it doesn't seem to work as expected, it's easiest to just delete the styling and start again. Many proteins will not be strongly associated with the pancreas -- they will remain white or light grey. Are there proteins in the network that do not have a confidence score for pancreas tissue expression? How many? 5. Select pancreas-associated proteins We will create a subnetwork of just the proteins that are strongly associated with the pancreas. One way to do this is to create a selection filter in the Select tab (it’s to the right of ‘Style’). Click the + button and choose ‘Column filter’ from the drop-down menu. Then, find and select the attribute ‘Node: tissue pancreas’. To select all nodes with a tissue expression value of 4 and up, set the low bound to 4 by entering the number or using the slider. How many proteins are found in pancreas tissue with a confidence of 4 and above? And in blood? 6. Use clusterMaker to find clusters clusterMaker2 is a Cytoscape App that provides several clustering algorithms. If clusterMaker is not installed, go to Apps -> App Manager and type clusterMaker2 into the search box. Select it from the middle pane, and click install. You will need to be connected to the internet for this step to work. When this is done close the App Manager, and you will be able to find clusterMaker under the Apps menu. 2
To start the app, go to Apps -> clusterMaker and choose the MCODE clustering algorithm. When it is done, it will have added another column to each node containing the number of the cluster the node has been assigned to. To actually visualize the network, go to Apps -> clusterMaker Visualizations -> Create network from clusters. How many clusters are generated? (Do not count nodes with degree 0.) Look at the cluster containing INSR. Which proteins in this cluster are most strongly related to the pancreas? Look up the functions of the proteins if you don't know them. Hint: The stringApp retrieves some additional information about each protein and visualizes it in the Results panel when the corresponding node is selected and this functionality is enabled from the menu Apps -> STRING -> Show results panel. 7. Merge the network with another disease network Now, we will retrieve a STRING network for a related disease by querying the term ‘cardiovascular system disease’ in the same way as for ‘diabetes mellitus’ (File -> Import -> Network -> Public Databases … -> STRING: disease query). Next, find the disease score column in the ‘cardiovascular system disease’ network and rename it to ‘CVD score’, for example (right-click the column name and choose the ‘Rename column’ option). Cytoscape provides functionality to merge to networks by building their union, intersection or difference, which can be found in the menu Tools -> Merge -> Networks. Select the two disease networks in the ‘Available Networks’ list and move them to the ‘Networks to Merge’ list by clicking the > button. Make sure the Union button is selected and click the Merge button to initiate the network merge. How does the resulting network look and why? Note that you can select one of the many layouts to visualize the merged network better. 8. Explore related diseases In this advanced exercise, we are going to retrieve all interactions between the top 100 diabetes mellitus and cardiovascular system disease proteins from STRING and identify clusters in the merged network. Go back to the merged network from exercise 7. Since this network was not created by the stringApp, we have to manually set it to be a String network from the stringApp menu (Apps -> STRING -> Set as STRING network). Next, we will use the Apps -> STRING -> Change confidence menu to increase the confidence to 1, which will effectively remove all edges from the network. Then, we will bring up the Change confidence window again and set the confidence to 0.4, which will prompt the stringApp to retrieve all interactions with a confidence score above 0.4 between all proteins in the current network from STRING. How many nodes and edges are in the new network and why? 3
Recommend
More recommend