inforsense com outline
play

www.inforsense.com Outline Part I Import & Export Import - PDF document

www.inforsense.com Outline Part I Import & Export Import data from different data sources and file types Query from a relational database Export data to different locations Part II Pre-processing Description of


  1. www.inforsense.com

  2. Outline Part I – Import & Export � Import data from different data sources and file types � Query from a relational database � Export data to different locations Part II – Pre-processing � Description of pre-processing nodes � Build using table editor Part III – Introduction to Interactive Browser � Filtering � Visualizing and Deployment of Results

  3. Outline Part IV – Introduction to Chemscience � Import and Export SD files � Clean SD files and Table nodes � Convert Molecular format nodes � Calculate Molecular properties � Create Fingerprint Matrix � Export to SD file Part V – Structure and Similarity Searching � Introduction to Sketch Pad � Substructure filter and match nodes � Similarity filter and score nodes Part VI – Predictive Modelling � Identifying actives and inactive compounds using similarity � Predicting activity using classification algorithms using chemical fingerprints

  4. Project Windows A project window is the area where you capture knowledge processes by building, editing and running data-mining tasks. � Capture theories and analytical processes and their information relationships � Organize and plan complex processes � I ntegrate data and applications in a single process representing integrative knowledge � Capture process provenance and auditable history � Share and reuse workflows through templates � Manage and deploy workflow to share process knowledge

  5. Anatomy of the workspace Menu bar Userspace Project window Navigator Tool bar, Task Properties manager and editor Inforsense.net

  6. Importing data Data must be inserted to be used in Kensington models. The compatible formats are: � Oracle, Access, My SQL databases � Importing form a text file (flat file or tall-slim format) � From the clipboard. Ideal for copy and pasting data from MS excel or Access. � Import using a bookmark file I mporting data can be done by selecting the appropriate option under the File � I mport Data � menu. Right clicking on a users workspace or folder icon allows you to create a new database bookmark.

  7. Database bookmark manager Node Icons � A new bookmark is created by launching the Database bookmark manager. � I n this example, an Oracle database is connected to the KDE. The servers host name and port number, an oracle user name and password and the name of the database is required. Please consult with your database administrator for details. � Access and My SQL databases are also supported. � I f you want to test the database connection (to check if the parameters supplied are correct) you can press the "Test Connection" button. � Once the required details have been filled in, clicking the "Save" button launches a dialog asking where the database bookmark is to be stored in the user space

  8. Database manager � “Database Manager” window can be launched by double-clicking on the database bookmark icon or by selection on a database node. � Data can be selected by using either the table-browser mode or by performing a manual query. � Object types that appear in the list are tables, views, alias, or synonym. � The column name and type appear allowing the user a choice of selection. � The number of rows to load is also an option. � I mporting data creates a table in the userspace.

  9. Import Group � The DB Procedure node, specific to Oracle databases, provides another method for importing data. � Data can be imported from a database using the SQL node. The purpose is to allow the user to specify the SQL SELECT query, which produces a table to be loaded into the system. � I n DB Procedure and SQL nodes, execution is delayed until the data is needed, enabling the mining procedure to be defined prior to any Oracle processing. � Export to File produces a comma or tab delimited file to be used in other applications. � Export to Userspace produces a table in a specific Userspace. � Export to Database allows you to create, update or re-create a table of data in an Oracle database.

  10. Import & Export Create a simple workflow that explains the process of importing and exporting data Our example illustrates: � Import data from a database (e.g. Oracle, Access) � Perform a SQL query � Export data to the user space and file � Perform a preprocess task � Export data back to a database To get started, we will add the previously created Training_Material_DB bookmark into a new project.

  11. SQL Node SQL node allows the user to perform an SQL SELECT query on the database � Step 1 Illustrates the connectivity of the nodes. The exclamation mark indicates that the properties of the SQL node need to be defined. � Step 2 : Query can be typed or loaded from a text file into the properties editor. � Step 3 : The output metadata needs to be specified, either by retrieving it from the database (by pressing the "Load metadata from database" button in the "Output" panel) or by typing it in manually in the "Output" panel before using. � Step 4 Illustrates the final connections

  12. Export to Userspace & File Export to Userspace and Export to File nodes produces a data table. � Step 1 Illustrates the connectivity of the nodes. � Step 2 : The table is exported in a user defined name & location within a userspace. � Step 3 : The table is exported as a tab delimited file outside of KDE. Files can be exported as comma, space, semi-colon delimited files. The delimiter can be set to none, single or double quotes. � Step 4 Illustrates the final connections after both nodes are executed. Tables were created in the appropriate user spaces and file location.

  13. Export to Database Node The preprocess delete node modifies the queried table then the export to database node creates a new table within the database � Step 1 Illustrates node connectivity � Step 2 : Columns to be deleted are selected and moved to the right hand side of the properties editor. � Step 3 : Illustrates that the properties editor for the export to database node needs to be edited. In this case, the wizard will be executed. Screenshots appear on the next two slides. � Step 4 : The export to database properties entered after running the wizard. The red exclamation point will then disappear in step 3.

  14. Table Export Wizard: Steps 1 – 4 2 4 1 3

  15. Table Export Wizard: Steps 5 & 6 5 � Step 1 Allows the selection of Create, Update or Recreate table options. � Step 2 : A table name is given. � Step 3 : Allows the selection of data columns for export � Step 4 : SQL to create table as illustrated in 6 step 3. Step 3 can be skipped and SQL could be directly placed into step 4. � Step 5 : Column Mapping for updating tables. I n our example, we are creating a table so the user defined columns are empty. � Step 6 : SQL to update a table as illustrated in step 5.

  16. Import Text Two formats of data can be imported into KDE � Text (standard flat text file) - a normalized table format. � Tall-slim - an unnormalised table that contains a set of rows, each of which contains a repeated series of cases. The import wizards illustrated on the next two slide takes us through the steps for importing data from a file and the clipboard. I n each case, the text table exported to file in the previous example was used.

  17. 2 4 Steps for Importing Text 1 3

  18. Steps for Copy & Paste Text 2 4 3 1

  19. Preprocessing � Data pre-processing and transformation operations enable you to clean and prepare your data before executing data mining tasks. � I nforsense KDE supports a wide range of pre-processing functions that operate either on single or multiple tables. The input to a pre-processing component is one (or more) tables and the output is a table. Some Examples of single-table pre-processing functions include : � Deleting columns from a table. � Deriving and insert new columns. � Filter data records in a table (based on some condition). � Rename or Replace columns of data.

  20. Preprocessing in the Table Editor � Preprocessing nodes are best described in examples. We will build upon the previous workflow querying Select & data from the training material Right click database. � Preprocessing nodes can be assembled to modify data by drag & drop onto the project workspace or by using the table editor. � The Table Editor is used to examine values in a data table. I t also provides an easy interface to perform data pre- processing operations interactively by updating the display to show the effects of performing any of these operations. � I n the example, the SQL can be executed by viewing the table editor.

  21. Anatomy of the table editor Pre-Processing Menu Toolbar Table statistics Editing Data Area Table

  22. Stats Summary � The Stats Summary pane in the table editor provides a statistical summary for the sample rows used as an input to the table editor in the active table � The "Calculate" buttons calculate the stats for the columns to which they correspond and return the relevant info. � The following screenshot shows a typical initial view, containing information that has been generated from the sample dataset. Columns with no duplicates will not show any statistics

Recommend


More recommend