Mining Lectures Marcel Caraciolo - @marcelcaraciolo 1
Who’s me ? Marcel Pinheiro Caraciolo Brazilian, lover of crabs Director of P&D - brazilian startup Orygens M.S.C Candidate at Data Mining and Recommender Systems Current moderator of the Local Python User Group at Pernambuco Interested at machine learning, recommender systems and mobile computing Blogging about machine learning with Python since 2008 http://aimotion.blogspot.com Young apprentice with Python programming since 2008. 2
How I started this analysis? 24 hours ago... 3
Question How were the topics distributed around the Scipy Conference General Sessions ? 4
Scrapping of Scipy Conference Small Web-Crawler for extracting the approved lectures urllib2, re, BeautifulSoap... 5
Resume 41 Lectures 820 minutes length 6
It means... =~ 4100 tweets posted. 7
Or watch... Star Wars Trilogy 2x 8
Or finish Super Mario Game... 82 x! 9
Or open the Eclipse 2 x! 10
Most popular Authors Dharhas Pothina - 3 Wes McKinney - 2 All the others - 1 11
Playing with the text... The most frequent words at the conference nltk, re 12
But let’s take a deeper look. I used the clustering algorithm K-Means Tool used for visualization Ubigraph 13
Distribution of the Lectures Basic Frameworks matplotlib, ipython B u i l d i n g f r a m e w o r k s performance, models, web services Parallelism performance, gpu, statistical V i s u a l i z a t i o n Numpy data analysis, statistical toolkits using Numpy 14
To sum up... Mining english text is so much easier!!! Submit your work also! Spread the scientific python over the community I expect to be back to Scipy next year! 15
https://github.com/marcelcaraciolo/clustering_scipy Mining Lectures Marcel Caraciolo - @marcelcaraciolo 16
Recommend
More recommend