Log-Linear Relation: . . . This Empirical . . . How to Estimate . . . How to Explain Log-Linear Towards an Explanation Resulting Explanation Relation Between Amount of Conclusions Computations and References Home Page Effectiveness of the Result – Title Page a Relation that Motivates ◭◭ ◮◮ the Need for Big Data ◭ ◮ Page 1 of 8 Francisco Zapata, Olga Kosheleva, and Vladik Kreinovich Go Back University of Texas at El Paso, El Paso, TX 79968 fazg74@gmail.com, olgak@utep.edu, vladik@utep.edu Full Screen Close Quit
Log-Linear Relation: . . . 1. Log-Linear Relation: A Brief Description This Empirical . . . • It is known that: How to Estimate . . . Towards an Explanation – the more computations we perform, Resulting Explanation – the more efficient the decisions and designs result- Conclusions ing from these computations. References • Empirical data shows that there is a log-linear depen- Home Page dence between: Title Page – the effectiveness e of an application and ◭◭ ◮◮ – the amount d of computations that led to this ap- ◭ ◮ plication. Page 2 of 8 • Specifically, we have e = a + b · ln( d ) for some constants Go Back a and b . Full Screen Close Quit
Log-Linear Relation: . . . 2. This Empirical Relation Explains Why We This Empirical . . . Need Big Data How to Estimate . . . • Reminder: the formula e = a + b · ln( d ) describes the Towards an Explanation relation between: Resulting Explanation Conclusions – the effectiveness e of an application and References – the amount d of computations that led to this ap- Home Page plication. Title Page • This empirical relation can be reformulated as ◭◭ ◮◮ d ∼ exp(const · e ) . ◭ ◮ • This reformulation explains why we need big data: Page 3 of 8 – every time we want to increase efficiency by one unit, Go Back – we need to double the amount of processed data. Full Screen • What we do: we provide an explanation for the empir- Close ical log-linear dependence. Quit
Log-Linear Relation: . . . 3. How to Estimate Effectiveness This Empirical . . . • The effectiveness e of an application is proportional to How to Estimate . . . the number m of useful features that this design has. Towards an Explanation Resulting Explanation • For example, let us look at a headache medicine. Conclusions • Its first – and most important – feature is that it should References cure headaches. Home Page • If it also avoids negative effects on the stomach, this is Title Page better. ◭◭ ◮◮ • If it also clears your sinuses, even better, etc. ◭ ◮ • Let us denote the average probability that a randomly Page 4 of 8 selected substance (or design) has a feature by p . Go Back • The features are usually independent. Full Screen • So, the probability that a randomly selected design has Close m features is p m . Quit
Log-Linear Relation: . . . 4. Towards an Explanation This Empirical . . . • The probability that a randomly selected design has m How to Estimate . . . features is p m . Towards an Explanation Resulting Explanation • According to statistics: Conclusions – if a rare event has probability q , References – then we need, on average, a sample of size ≈ 1 q to Home Page observe at least one such event. Title Page • So, to find a design with m features, we need to test ◭◭ ◮◮ 1 p m different designs. ◭ ◮ Page 5 of 8 • The resulting amount of computations d is propor- tional to the number of tested designs, i.e., to 1 Go Back p m : Full Screen � m � 1 d = c ′ · . Close p Quit
Log-Linear Relation: . . . 5. Resulting Explanation This Empirical . . . • The amount of computations is How to Estimate . . . Towards an Explanation � 1 � m d = c ′ · . Resulting Explanation p Conclusions • By taking logarithms of both sides, we get References � 1 � Home Page ln( d ) = ln( c ′ ) + m · ln . p Title Page • So m = A + B · ln( d ), where ◭◭ ◮◮ 1 � and A = − ln( c ′ ) ◭ ◮ B = ln( d ) . � 1 Page 6 of 8 ln p Go Back • On the other hand, the effectiveness e of a design is Full Screen proportional to m : e = c ′ · m . Close • Hence e = a + b · ln( d ), where a = c ′ · A and b = c ′ · B . Quit
Log-Linear Relation: . . . 6. Conclusions This Empirical . . . • Thus, we indeed get a log-linear dependence between: How to Estimate . . . Towards an Explanation – the effectiveness e of an application and Resulting Explanation – the amount d of computations that led to this ap- Conclusions plication. References Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 7 of 8 Go Back Full Screen Close Quit
7. References Log-Linear Relation: . . . This Empirical . . . • M. Banko and E. Brill, “Scaling to very very large cor- How to Estimate . . . Towards an Explanation pora for natural language disambiguation”, Proceed- Resulting Explanation ings of the 39th Annual Meeting of the Association for Conclusions Computational Linguistics , 2001, pp. 26–33. References • T. Brants et al., “Large language models in machine Home Page translation”, Proceedings of the 2007 Joint Conference Title Page on Empirical Methods in Natural Language Process- ing and Computational Natural Language Processing , ◭◭ ◮◮ 2007, pp. 858–867. ◭ ◮ • J. Lin, “Is big data a transient problem?”, IEEE Page 8 of 8 Internet Computing , 2015, September/October 2015, Go Back pp. 86–90. Full Screen Close Quit
Recommend
More recommend