Introduction to Data Analysis with Orange
Amy Larner Giroux, PhD
UCF Center for Humanities & Digital Research NEH Digital Culture Summer Institute
Introduction to Data Analysis with Orange Amy Larner Giroux, PhD - - PowerPoint PPT Presentation
Introduction to Data Analysis with Orange Amy Larner Giroux, PhD UCF Center for Humanities & Digital Research NEH Digital Culture Summer Institute Welcome The tutorial outlined in this slide deck will step you through your first workflow
Amy Larner Giroux, PhD
UCF Center for Humanities & Digital Research NEH Digital Culture Summer Institute
The tutorial outlined in this slide deck will step you through your first workflow in Orange. This is the short and sweet
Tweet Analysis Tutorial” PDF available in Google Classroom. That document covers both tutorials for this week in minute detail.
discussion only, not to be done on your data this week)
your data file to producing a word cloud
2
3
If you have not already installed Orange and the Text Add-on, please see the detailed directions in the PDF.
4
project using data and tweets are no exception.
data, you need to become intimately familiar with the content, regardless of the size of the dataset.
tweet dataset, but you should sample the data for a set to read closely and choose ways to categorize your data for more comparative analyses.
You are not e expected t to comple lete t this l level o l of data preparatio ion o
data f for this is tutoria ial.
Read through t the information t to understand i it in terms o
sample d e dat ataset b but d do not t try t y to perform thes ese s steps w with your o
a until af after er the I e Institute i e is over er.
5
The dataset for the screen shots is from a project where we wanted to focus on groups with
criteria. Dataset contains 1,121 tweets from this criteria that were retweeted 100 or more times. The sample screens in this tutorial are from a dataset that we will use in the second tutorial. Thr hroughout t the c he current tutorial, p plea ease u e use e the he twee eet data s sent t to you p u prior t to the I he Institute.
6
As you r u read t d thr hrough t the he reaso soni ning f for the he categ egorization c chosen sen for t the s he sampl ple e datase set, t thi hink a about h how y your ur twee eet d dataset c coul uld b be c categ egorized ed.
in the dataset
categorize the users in three ways:
When you are ready to work on your data after the Institute, please see the detailed directions in the PDF on how to use pivot tables and VLOOKUP in Excel to set up your data categorization.
7
Like a e any p piec ece o e of software, e, O Orange h e has n s nua uances i es in t the p he process ess tha hat c can s n sometimes b be confusin
The f follo
list a are s e som
things to be aware of
you
recogni nize what i is h happe ppeni ning ng in t the p program.
In this example “data” is used between the first 4 widgets and then the Corpus widget converts the “data” to “corpus” and that is used in the last two widgets.
connecting line will turn red to notify you of the issue.
8
Depe pendi ding o
the he size o e of the he dataset, proces essi sing o
he step eps i s in a an O Orange w e workflow can t n take s e some e time.
The d he default for a all t the w he widg dgets i is to p process a ess aut utomatically w when en connected i into t the w workflow. At t times es thi his s can cause t use the p he program t to s show ( (Not R Respo sponding) i in n the t he title e bar and nd grey itsel self o
ut as in t thi his s example.
he red ed dot n next t to t the T he Twee eet P Profiler er w widg dget s shows tha hat it i is processi essing data. B Be patien ent a and l nd let et t the p he process ess fini nish. sh.
9
Most wi widgets have a an o
that y you c can u uncheck t to p prevent t them f from r running automa matical ally. Thi his s exampl ple f e from t the T he Twee eet Profiler er h has s a chec heckbox n next t to t the he “Commit Aut utomatically” b but
Unchec ecking t the b he box a allows y s you u to r run t n the w he widg dget when en y you u want nt to b by clicking ng t the b button. n. The t he ter ermino nology of the b he but uttons i is inc nconsi
For e example, e, i in n Sen entiment A Analysi sis s the he chec heckbox makes es t the b he but utton h n have e “Autocommi mmit is on.” J Just b be a aware o
f the concept and l look for t the c checkbo box/but button c n combi bina nations ns.
10
You u can rena name t e the he widg dgets t s to m make n e notes es to y your ursel elf a about t the p he process ess.
Either right-click on the widget and select Rename from the popup menu, or click once on the widget to select it and press F2.
11
Ther here a e are e thr hree e options t to o
en a a saved ved w workflow.
To prevent the automated workflow from running immediately, you can use the Open and Freeze.
1.
ble-cl click ck on the .ows file in File Explorer
rl-O or File ile -> O Open n
rl-Al Alt-O or File e -> Open a and Freeze ze
12
Sometimes you will leave a window open while you do other tasks and watch how those tasks affect your output (e.g. Word Cloud).
safe to close a window.
clicking on the work area or selecting from the widget window.
13
For this hands-on portion, please use the tweet data that was emailed to you that contained the hashtags and date range you requested. Open your CSV file in Excel:
slx)
Your columns should look like this.
14
Widget toolbox
The widget toolbox is defaulted to be open. As you get familiar with Orange you can minimize it to have more room in the work area.
Use Ctrl-S or File -> Save to save your workflow. Do this fairly often to ensure you don’t lose your work.
15
The red X just means a file hasn’t been loaded yet.
Click on File ile in the Data section of the widget window to add a File widget to the work area.
16
If you have opened files in other workflows, the last file may be
that file still exists. In this example the Tweet-Profiled- ReadyForOrange spreadsheet was not found.
Doubl ble-cl click ck on the File widget to open it.
17
The columns of the spreadsheet will be displayed with the data type, role, and values determined by Orange. In some instances Orange chooses a different type than you want and you can double-click on the type column to change it. Additionally on this screen it will show you the number of rows (1121 instances), the number
categorical) and the number of text fields (meta). If you change data in the underlying spreadsheet, use the Reload button to reimport the data.
Using the op
button, browse and choose your spreadsheet
After opening your data file and examining the column information, you can close this window.