numpy vectorize your brain k nearest neighbors
play

Numpy: Vectorize your brain K nearest neighbors - PowerPoint PPT Presentation

Ekaterina Tuzova Numpy: Vectorize your brain K nearest neighbors https://archive.ics.uci.edu/ml/datasets/Wine NumPy What is NumPy? Numpy is the fundamental package for scientific computing with Python. IPython Python and Performance Python


  1. Ekaterina Tuzova Numpy: Vectorize your brain

  2. K nearest neighbors https://archive.ics.uci.edu/ml/datasets/Wine

  3. NumPy

  4. What is NumPy? Numpy is the fundamental package for scientific computing with Python.

  5. IPython

  6. Python and Performance

  7. Python is fast

  8. Python is slow

  9. Euclidian distance

  10. “Magic” timeit

  11. Euclidian distance. C

  12. Euclidian distance. C

  13. Euclidian distance

  14. line_profiler and “magic” lprun

  15. Euclidian distance

  16. Compiled languages

  17. Interpreted languages

  18. What can be done?

  19. NumPy

  20. Ufuncs

  21. U niversal func tions Special type of function defined within a numpy library and it operate element-wise on arrays.

  22. Arithmetic operations

  23. Arithmetic operations

  24. Arithmetic operations

  25. Arithmetic operations

  26. Ufuncs available - Arithmetic - Bitwise - Comparison - Trigonometric - Floating …

  27. Slicing and indexing

  28. Slicing and indexing

  29. Slicing and indexing

  30. Multidimensional arrays

  31. Multidimensional arrays

  32. Index arrays

  33. Index arrays

  34. Index arrays

  35. Masking

  36. Masking

  37. Test train split

  38. Test train split

  39. Broadcasting

  40. Broadcasting Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.

  41. Broadcasting rules 1. If two arrays differ in their number of dimension, the shape of the array with the fewer dimensions is padded with ones on it’s leading(left) size. 2. If the shape of two arrays doesn’t match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape. 3. If these conditions are not met, raise a ValueError: operands could not be broadcast together with shapes

  42. Broadcasting. Example

  43. np.newaxis

  44. np.newaxis

  45. np.newaxis

  46. Aggregations

  47. Aggregations

  48. Aggregations

  49. NumPy resume Basic ideas to make you code faster: - Ufuncs - Slicing and indexing - Broadcasting - Aggregations

  50. k-means

  51. Algorithm 1. Clusters the data into k groups where k is predefined. 2. Select k points at random as cluster centers. 3. Assign objects to their closest cluster center according to the Euclidean distance function. 4. Calculate the centroid or mean of all objects in each cluster. 5. Repeat steps 2, 3 and 4 until the same points are assigned to each cluster in consecutive rounds.

  52. Synthetic data

  53. Vectorized euclidian distance

  54. k-means

  55. Thank you. @ktisha

Recommend


More recommend