Anima Anandkumar ROLE OF TENSORS IN ML
TRINITY OF AI/ML ALGORITHMS COMPUTE DATA 2
EXAMPLE AI TASK: IMAGE CLASSIFICATION Maple Tree Villa Backyard Plant Potted Plant Garden Swimming Pool Water 3
DATA: LABELED IMAGES FOR TRAINING AI Ø 14 million images and 1000 categories. Ø Images in Fish category. Ø Largest database of labeled images. Ø Captures variations of fish. Picture credits: Image-net.org, ZDnet.com 4
MODEL: CONVOLUTIONAL NEURAL NETWORK .02 p(cat) .85 p(dog) Ø Deep learning: Many layers give large capacity for model to learn from data Ø Inductive bias: Prior knowledge about natural images. 5
DEEP LEARNING: LAYERS OF PROCESSING Picture credits: zeiler et al 6
COMPUTE INFRASTRUCTURE FOR AI: GPU Ø More than a billion operations per image. Ø NVIDIA GPUs enable parallel operations. Ø Enables Large-Scale AI. MOORE’S LAW: A SUPERCHARGED LAW 7
RISE OF GPU COMPUTING 1000X GPU-Computing perf 10 7 by 1.5X per year APPLICATIONS 2025 10 6 ALGORITHMS 1.1X per year 10 5 10 4 SYSTEMS 10 3 CUDA 1.5X per year 10 2 Single-threaded perf ARCHITECTURE 1980 1990 2000 2010 2020 Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp 8
HOW GPU ACCELERATION WORKS GPUs and CPUs Work Together Rest of Sequential CPU Code Compute-Intensive Functions 5% of Code Uses GPU to Parallelize GPU CPU GPU CPU Many thousand smaller cores (>5,000) Only a few “fat” cores (8-20 typical) Throughput optimized Latency oriented design Targets maximum throughput of Targets minimal latency of a single thread many threads 9
PROGRESS IN TRAINING IMAGENET Error in making 5 guesses about the image category 40 30 20 10 0 2010 2011 2012 2013 2014 Human 2015 Need Trinity of AI : Data + Algorithms + Compute Statista: Statistics Portal 10
TENSORS PLAY A CENTRAL ROLE ALGORITHMS COMPUTE DATA 11
TENSOR : EXTENSION OF MATRIX 12
TENSORS FOR DATA ENCODE MULTI-DIMENSIONALITY Image: 3 dimensions Video: 4 dimensions Width * Height * Channels Width * Height * Channels * Time 13
<latexit sha1_base64="2eITqAtLsDBvNjunWpDZ+nFc4I=">ACXicbVC7SgNBFL0bXzG+Vi1tRqOQgIRdG7UQgiJYRnBNIAnL7GQSJ5l9MDMrCUtqG3/FxiKleAf2Pkh9k4ehSYeuHA4517uvceLOJPKsr6M1Nz8wuJSejmzsrq2vmFubt3KMBaEOiTkoah4WFLOAuopjitRIJi3+O07HUuhn75ngrJwuBG9SJa93ErYE1GsNKSa+5e5rqoFirmU4m6eTdh+0+OkNadhnqu28a2atgjUCmiX2hGSL+9+DdwAoueZnrRGS2KeBIhxLWbWtSNUTLBQjnPYztVjSCJMObtGqpgHWq+vJ6JU+OtBKAzVDoStQaKT+nkiwL2XP93Snj9WdnPaG4n9eNVbNk3rCgihWNCDjRc2YIxWiYS6owQlivc0wUQwfSsid1hgonR6GR2CPf3yLHGOCqcF+9rOFs9hjDTswB7kwIZjKMIVlMABAg/wBAN4MR6NZ+PVeBu3pozJzDb8gfHxA2vZm1I=</latexit> <latexit sha1_base64="Z62pO/bNG/brJtyDqRHgvhM2dx0=">ACXicbVC7SgNBFJ2Nr5j4WLW0GY1CAhJ2bdRCIpgGcE1gWRZiezySzD2ZmQ8KS2sZfsbGIYps/sPNDtHbyKDTxwIXDOfdy7z1uxKiQhvGpZaWV1bX0uZ7Mbm1ra+s/sgwphjYuGQhbzqIkEYDYglqWSkGnGCfJeRitu5HvuVLuGChsG97EfE9lEzoB7FSCrJ0Q9u8j1YDyX1iYC9gpPQk/YAXkIlOxT2nHbB0XNG0ZgALhJzRnKlo6/hqJv9Ljv6R70R4tgngcQMCVEzjUjaCeKSYkYGmXosSIRwBzVJTdEAqdV2MnlAI+V0oBeyFUFEk7U3xMJ8oXo+67q9JFsiXlvLP7n1WLpndsJDaJYkgBPF3kxgzKE41xg3KCJesrgjCn6laIW4gjLFV6GRWCOf/yIrFOixdF87Mla7AFGmwDw5BHpjgDJTALSgDC2DwCJ7BELxqT9qL9qa9T1tT2mxmD/yBNvoBYv+czA=</latexit> <latexit sha1_base64="Z62pO/bNG/brJtyDqRHgvhM2dx0=">ACXicbVC7SgNBFJ2Nr5j4WLW0GY1CAhJ2bdRCIpgGcE1gWRZiezySzD2ZmQ8KS2sZfsbGIYps/sPNDtHbyKDTxwIXDOfdy7z1uxKiQhvGpZaWV1bX0uZ7Mbm1ra+s/sgwphjYuGQhbzqIkEYDYglqWSkGnGCfJeRitu5HvuVLuGChsG97EfE9lEzoB7FSCrJ0Q9u8j1YDyX1iYC9gpPQk/YAXkIlOxT2nHbB0XNG0ZgALhJzRnKlo6/hqJv9Ljv6R70R4tgngcQMCVEzjUjaCeKSYkYGmXosSIRwBzVJTdEAqdV2MnlAI+V0oBeyFUFEk7U3xMJ8oXo+67q9JFsiXlvLP7n1WLpndsJDaJYkgBPF3kxgzKE41xg3KCJesrgjCn6laIW4gjLFV6GRWCOf/yIrFOixdF87Mla7AFGmwDw5BHpjgDJTALSgDC2DwCJ7BELxqT9qL9qa9T1tT2mxmD/yBNvoBYv+czA=</latexit> <latexit sha1_base64="z0VFJx39qzgBKuwuG9PHNcYZVLk=">ACXicbVDLSgMxFM34rPU16tJNtAgtSJlxoy6EoguKzi20A5DJs20aTPJkGSkZejajb/ixoWKW/AnX9j+lho64ELh3Pu5d57woRpR3n21pYXFpeWc2t5dc3Nre27Z3deyVSiYmHBROyHiJFGOXE01QzUk8kQXHISC3sXY382gORigp+pwcJ8WPU5jSiGkjBfbBdbEPm0LTmCjYLwUZPe4O4QU0ckBhP+iWArvglJ0x4Dxp6QApqgG9lezJXAaE64xQ0o1XCfRfoakpiRYb6ZKpIg3ENt0jCUI7Paz8avDOGRUVowEtIU13Cs/p7IUKzUIA5NZ4x0R816I/E/r5Hq6MzPKE9STieLIpSBrWAo1xgi0qCNRsYgrCk5laIO0girE16eROCO/vyPFOyudl9YtVC6naeTAPjgEReCU1ABN6AKPIDBI3gGr+DNerJerHfrY9K6YE1n9sAfWJ8/pDiYgw=</latexit> <latexit sha1_base64="gPrhvw/BpevQcRrtwkbcP7qBM3k=">ACGHicbVBLS0JBGP2uvcxeVs2QxYoiNzbploEUgQtDTIFlcvcdTxzn0wMzeUi3+jTf+jVYtaVLR1w9p3/gSjvweGc8/E9nJAzqUz0gsLC4tryRXU2vrG5tb6e2dWxlEgtAyCXgqg6WlDOflhVTnFZDQbHncFpx3IuRX7mjQrLAv1H9kDY83PZixGstGSnzctsD9UDxTwqUe+H5OyY5bt5d4DOkE7YDPXsri43Z6czZsEcA80Ta0oyxYOvx2cAKNnpYb0ZkMijviIcS1mzFA1YiwUI5wOUvVI0hATF7dpTVMf6wUa8fiyATrUShO1AqHLV2is/u6IsSdl3N0sOqI2e9kfifV4tU6QRMz+MFPXJZFAr4kgFaPQm1GSCEsX7mAimN4VkQ4WmCj9zJR+gjV78jwpHxVOC9a1lSmewRJ2IN9yIFx1CEKyhBGQjcwxO8wpvxYLwY78bHJowpj278AfG8BvB3qFm</latexit> <latexit sha1_base64="oVskQI3WosMi/W547O1y7m73EoY=">ACGHicbVDLSsNAFJ34rK2PqEs3g1VoZTEjboQiK4rGBsoS1hMp20wezExKS+hvuPE/XLkRUXHbnR+ia6cPRFsPXDicy734USMCmkYH9rC4tLympqLZ1Z39jc0rd3bkUYc0wsHLKQVx0kCKMBsSVjFQjTpDvMFJxvIuRX+kSLmgY3Mh+RBo+agXUpRhJdm6cZnrwXoqU8E7P2QvJ3QqfgDeAZVAmbwp7dUeXlbT1rFI0x4DwxpyRbOvh8eO5mvsq2Pqw3Qxz7JCYISFqphHJRoK4pJiRQboeCxIh7KEWqSkaILVAIxlfNoCHSmlCN+SqAgnH6u+OBPlC9H1HJX0k2LWG4n/ebVYuieNhAZRLEmAJ4PcmEZwtGbYJNygiXrK4Iwp2pXiNuIyzVM9PqCebsyfPEOiqeFs1rM1s6BxOkwB7YBzlgmNQAlegDCyAwR14BC/gVbvXnrQ37X0SXdCmPbvgD7ThN7kEouA=</latexit> <latexit sha1_base64="oVskQI3WosMi/W547O1y7m73EoY=">ACGHicbVDLSsNAFJ34rK2PqEs3g1VoZTEjboQiK4rGBsoS1hMp20wezExKS+hvuPE/XLkRUXHbnR+ia6cPRFsPXDicy734USMCmkYH9rC4tLympqLZ1Z39jc0rd3bkUYc0wsHLKQVx0kCKMBsSVjFQjTpDvMFJxvIuRX+kSLmgY3Mh+RBo+agXUpRhJdm6cZnrwXoqU8E7P2QvJ3QqfgDeAZVAmbwp7dUeXlbT1rFI0x4DwxpyRbOvh8eO5mvsq2Pqw3Qxz7JCYISFqphHJRoK4pJiRQboeCxIh7KEWqSkaILVAIxlfNoCHSmlCN+SqAgnH6u+OBPlC9H1HJX0k2LWG4n/ebVYuieNhAZRLEmAJ4PcmEZwtGbYJNygiXrK4Iwp2pXiNuIyzVM9PqCebsyfPEOiqeFs1rM1s6BxOkwB7YBzlgmNQAlegDCyAwR14BC/gVbvXnrQ37X0SXdCmPbvgD7ThN7kEouA=</latexit> <latexit sha1_base64="LQ+wCP+FitJXNyBgl/s1JuAkFHI=">ACGHicbVBNS8NAEN3Ur1q/oh69LBahVISL+pBKIrgsYKxhTaEzXbTbrvZhN2NtIT+DS/+FS8eVLz25r9x2wbR1gcDj/dmJnx4xKZVlfRm5ldW19I79Z2Nre2d0z9w8eZJQITBwcsUg0fSQJo5w4ipGmrEgKPQZafiD6nfeCRC0ojfq1FM3B1OQ0oRkpLnmndlIawHSkaEgmHP6TspbTSrwzG8BLqDo/CodfXNSh7ZtGqWjPAZWJnpAgy1D1z0u5EOAkJV5ghKVu2FSs3RUJRzMi40E4kiREeoC5pacqRPsBNZ5+N4YlWOjCIhC6u4Ez9PZGiUMpR6OvOEKmeXPSm4n9eK1HBuZtSHieKcDxfFCQMqghOY4IdKghWbKQJwoLqWyHuIYGw0mEWdAj24svLxDmtXlTtO7tYu8rSyIMjcAxKwAZnoAZuQR04AIMn8ALewLvxbLwaH8bnvDVnZDOH4A+MyTf6PZ6X</latexit> TENSORS FOR ML ALGORITHMS ENCODE HIGHER ORDER MOMENTS Pairwise correlations E ( x ⊗ x ) i,j = E ( x i x j ) Third order correlations E ( x ⊗ x ⊗ x ) i,j,k = E ( x i x j x k )
TENSORS FOR MODELS STANDARD CNN USE LINEAR ALGEBRA 15
TENSORS FOR MODELS TENSORIZED NEURAL NETWORKS Jean Kossaifi, Zack Chase Lipton, Aran Khanna, Tommaso Furlanello, A Jupyters notebook: https://github.com/JeanKossaifi/tensorly-notebooks 16
SPACE SAVING IN DEEP TENSORIZED NETWORKS 17
TENSORS FOR LONG-TERM FORECASTING Difficulties in long term forecasting: Long-term dependencies • High-order correlations • Error propagation • 18 18
RNNS: FIRST-ORDER MARKOV MODELS Input state 𝑦 # , hidden state ℎ # , output 𝑧 # , ℎ # = 𝑔 𝑦 # , ℎ #)* ;𝜄 ; 𝑧 # = ( ℎ # ;𝜄)
TENSOR-TRAIN RNNS AND LSTMS Seq2seq architecture TT-LSTM cells
TENSOR LSTM FOR LONG-TERM FORECASTING T r a f f i c d a t a s et C l i m a te d a t a s et
UNSUPERVISED LEARNING TOPIC MODELS THROUGH TENSORS Topics Justice Education Sports 22
TENSORS FOR MODELING: TOPIC DETECTION IN TEXT Co-occurrence T opic 2 T opic 1 of word triplets 23
TENSOR-BASED TOPIC MODELING IS FASTER Training time for NYTimes Training time for PubMed 90.00 250.00 Spectral Time (minutes) Mallet Time (minutes) Spectral Time(minutes) Mallet Time (minutes) 80.00 Time in minutes Time in minutes 200.00 70.00 60.00 150.00 50.00 40.00 100.00 30.00 22x faster on average 20.00 50.00 12x faster on average 10.00 0.00 0.00 5 10 15 20 25 30 50 75 100 5 10 15 20 25 50 100 Number of Topics Number of Topics 300000 documents 8 million documents • Mallet is an open-source framework for topic modeling • Benchmarks on AWS SageMaker Platform • Bulit into AWS Comprehend NLP service.
T E N S O R L Y : H I G H - L E V E L A P I F O R T E N S O R A L G E B R A • Python programming • User-friendly API • Multiple backends: flexible + scalable • Example notebooks in repository 25
TENSORLY WITH PYTORCH BACKEND import tensorly as tl from tensorly.random import tucker_tensor tl.set_backend(‘pytorch’) Set Pytorch backend core, factors = tucker_tensor((5, 5, 5), rank=(3, 3, 3)) Tucker Tensor form core = Variable(core, requires_grad=True) Attach gradients factors = [Variable(f, requires_grad=True) for f in factors] optimiser = torch.optim.Adam([core]+factors, lr=lr) Set optimizer for i in range(1, n_iter): optimiser.zero_grad() rec = tucker_to_tensor(core, factors) loss = (rec - tensor).pow(2).sum() for f in factors: loss = loss + 0.01*f.pow(2).sum() loss.backward() optimiser.step()
TENSORS FOR COMPUTE TENSOR CONTRACTION PRIMITIVE Extends the notion of matrix product Matrix product Tensor Contraction � � T ( u, v, · ) = u i v j T i,j, : Mv = v j M j i,j j = + = + + + 27
Recommend
More recommend