Bo osting Neural Net w orks pap er No ���� Holger Sc h w enk LIMSI�CNRS� bat ���� BP ���� ����� Orsa y cedex� FRANCE Y osh ua Bengio DIR O� Univ ersit y of Mon tr � eal� Succ� Cen tre�Ville� CP ���� Mon tr � eal� Qc� H�C �J�� CANAD A T o app ear in Neural Computation Abstract �Bo osting� is a general metho d for impro ving the p erformance of learning algorithms� A recen tly prop osed b o osting algorithm is A daBo ost � It has b een applied with great success to sev eral b enc hmark mac hine learning problems using mainly decision trees as base classi�ers� In this pap er w e in v estigate whether AdaBo ost also w orks as w ell with neural net w orks� and w e discuss the adv an tages and dra wbac ks of di�eren t v ersions of the AdaBo ost algorithm� In particular� w e compare training metho ds based on sampling the training set and w eigh ting the cost function� The results suggest that random resampling of the training data is not the main explanation of the success of the impro v emen ts brough t b y AdaBo ost� This is in con trast to Bagging whic h directly aims at reducing v ariance and for whic h random resampling is essen tial to obtain the reduction in generalization error� Our system ac hiev es ab out ���� error on a data set of online handwritten digits from more than ��� writers� A b o osted m ulti�la y er net w ork ac hiev ed ���� error on the UCI Letters and ���� error on the UCI satellite data set� whic h is signi�can tly b etter than b o osted decision trees� Keyw ords� AdaBo ost� b o osting� Bagging� ensem ble learning� m ulti�la y er neural net w orks� generalization �
� In tro duction �Bo osting� is a general metho d for impro ving the p erformance of a learning algorithm� It is a metho d for �nding a highly accurate classi�er on the training set� b y com bining �w eak h yp otheses� �Sc hapire� ������ eac h of whic h needs only to b e mo derately accurate on the training set� See an earlier o v erview of di�eren t w a ys to com bine neural net w orks in �P errone� ������ A recen tly prop osed b o osting algorithm is A daBo ost �F reund� ������ whic h stands for �Adaptiv e Bo osting�� During the last t w o y ears� man y empirical studies ha v e b een published that use decision trees as base classi�ers for AdaBo ost �Breiman� ����� Druc k er and Cortes� ����� F reund and Sc hapire� ����a� Quinlan� ����� Maclin and Opitz� ����� Bauer and Koha vi� ����� Dietteric h� ����b� Gro v e and Sc h uurmans� ������ All these exp erimen ts ha v e sho wn impressiv e impro v emen ts in the generalization b eha vior and suggest that AdaBo ost tends to b e robust to o v er�tting� In fact� in man y exp erimen ts it has b een observ ed that the generalization error con tin ues to decrease to w ards an apparen t asymptote after the training error has reac hed zero� �Sc hapire et al�� ����� suggest a p ossible explanation for this un usual b eha vior based on the de�nition of the mar gin of classi�c ation � Other attemps to understand b o osting theoretically can b e found in �Sc hapire et al�� ����� Breiman� ����a� Breiman� ����� F riedman et al�� ����� Sc hapire� ������ AdaBo ost has also b een link ed with game theory �F reund and Sc hapire� ����b� Breiman� ����b� Gro v e and Sc h uurmans� ����� F reund and Sc hapire� ����� in order to understand the b eha vior of AdaBo ost and to prop ose alternativ e algorithms� �Mason and Baxter� ����� prop ose a new v arian t of b o osting based on the direct optimization of margins� Additionally � there is recen t evidence that AdaBo ost ma y v ery w ell o v er�t if w e com bine sev eral h undred thousand classi�ers �Gro v e and Sc h uurmans� ������ It also seems that the p erformance of AdaBo ost degrades a lot in the presence of signi�can t amoun ts of noise �Dietteric h� ����b� R� atsc h et al�� ������ Although m uc h useful w ork has b een done� b oth theoretically and exp erimen tally � there is still a lot that is not w ell understo o d ab out the impressiv e generalization b eha vior of AdaBo ost� T o the b est of our kno wledge� applications of AdaBo ost ha v e all b een to decision trees� and no applications to m ulti�la y er arti�cial neural net w orks ha v e b een rep orted in the literature� This pap er extends and pro vides a deep er exp erimen tal analysis of our �rst exp erimen ts with the application of AdaBo ost to neural net w orks �Sc h w enk and Bengio� ����� Sc h w enk and Bengio� ������ In this pap er w e consider the follo wing questions� do es AdaBo ost w ork as w ell for neural net w orks as for decision trees� short answ er� y es� sometimes ev en b etter� Do es it b eha v e in a similar w a y �as w as observ ed previously in the literature�� short answ er� y es� F urthermore� are there particulars in the w a y neural net w orks are trained with gradien t bac k�propagation whic h should b e tak en in to accoun t when c ho osing a particular v ersion of AdaBo ost� short answ er� y es� b ecause it is p ossible to directly w eigh t the cost function of neural net w orks� Is o v er�tting of the individual neural net w orks a concern� short answ er� not as m uc h as when not using b o osting� Is the random resampling used in previous implemen tations of AdaBo ost critical or can w e get similar p erformances b y w eighing the training criterion �whic h can easily b e done with neural net w orks�� short answ er� it is not critical for generalization but helps �
Recommend
More recommend