Matching fashion products with image similarity
Eddie Bell eddie@lyst.com @ejlbell 2
we collect the world of fashion into a customisable shopping experience 3 3
A python data startup that happens to be in fashion 4
What makes us di fg erent? All data is scraped from retailers 400 spiders (scrapy), 9000 designers Almost everything is automated SEO, recommendation, classification, sales This architecture comes with a few problems 5
Duplicates 6
Why do we get duplicates There is no ISBN for fashion inter-retailer burberry selfridges intra-retailer intra-retailer 7
How we used to fj nd duplicates Lucene fuzzy string matching doesn’t really work yoox.com 3000 products called ‘dress’ 7000 products called ‘shirt’ 8
How we detect duplicates now BRISK image descriptors Leutenegger, Chli and Siegwart BRISK: Binary Robust Invariant Scalable Keypoints. ICCV 2011: 2548-2555. 9
How we detect duplicates now Brisk Octaves 10
From descriptors to image similarity n x 64 n x 1 k-means bag of words 1 x k 11
Architecture Started in Storm / Java very painful Ended up in Celery much nicer Matching is done in elastic search 12
Architecture 13
Results 14
Bonus Colour variants 15
Bonus Matching sets 16
Bonus Model detection 17
Moderation 18
What’s next? Reverse image search Similar textual features Dual image / text vector embeddings 19
thank you
Recommend
More recommend