Ioannis Caragiannis University of Patras Joint work with George Krimpas and Alexandros Voudouris
massive : available to a large number of people (16-18 million students) online : through the internet/web open : no cost for the students courses : series of lectures on a subject
www.edx.org www.coursera.org www.udacity.com > 100 employees each business model: verified certificates , head- hunting ( connecting students to industry ), specializations, corporate collaborations
400+ universities 2400+ courses 22 out of the top-25 US universities 3000+ instructors TAs, video assistants 13 languages (80% english, 8.5% spanish, french, chinese) subjects: humanities, computer science, business & management
Daphne Koller, Andrew Ng (Coursera founders): “… courses in the humanities and social sciences - in which the material is more open to interpretation - have proven more complicated to translate into an online format, especially when it came to the assessment and grading of the students.”
What? Should result in quantitative information successfully completed her class, achieved a 9/10 (A+), ranked in the top 1% of her class of 100,000, etc Why? Information in the verified certificate, important for employers (new revenue source) Who? Experts (graders, TAs) are costly A common solution: automatic grading (multiple choice questions)
Highly unsatisfactory when evaluating the students’ ability of proving a mathematical statement expressing their critical thinking over an issue demonstrating their creative writing skills In these cases, assessment and grading is a human computation task Alternative solution: peer grading outsource the grading task to the students
How does it work? each student grades some of the other students’ assignments (as part of her own assignment) Allowing the students to grade using cardinal scores is risky: not experienced in assessing their peers’ performance in absolute terms have strong incentives to assign low scores Solution: ordinal peer grading
Cardinal peer grading Piech, Huang, Chen, Do, Ng, & Koller (2013) Kulkarni, Wei, Le, Chia, Papadopoulos, Cheng, Koller, & Klemmer (2013) Walsh (2014) de Alfaro & Shavlovsky (2014) www.crowdgrader.org Ordinal peer grading Raman & Joachims (2014) Shah, Bradley, Parekh, Wainwright, & Ramachandran (2014)
n students (exam papers) Distributing the exam papers : each student gets k << n exam papers to grade so that each exam paper is given to k students Grading : each student ranks the exam papers assigned to her Rank aggregation : compute a global ranking from the partial ranks Goal : to come up with a global ranking that is “as correct as possible”
Similarities: on input a profile of rankings, compute a final full ranking Differences: each student is simultaneously an alternative and a voter voters do not have to rank all alternatives the alternatives to be ranked are decided externally
( n , k )-bundle graph : k -regular bipartite graph G=(U,V,E) with |U|=|V|= n U: exam papers (randomly assigned to nodes) V: graders Edge ( u , v ) with u in U and v in V indicates that exam paper u will be given to student v Warning! Nodes corresponding to a grader and her exam paper should not be connected
The students participate in the exam and submit their papers Scenario I : the instructor announces indicative solutions and grading instructions the students use this info when grading Scenario II : no info by the instructor students’ grading performance is similar to their performance in the exam
Basic assumption: there is a ground truth ranking of the exam papers Perfect grading : each grader ranks the k exam papers she gets consistently to the ground truth
Quality measure : number of pairs of exam papers which compare in the global ranking as in the ground truth .. or total number of pairs minus the kendall-tau distance (bad) example: a random permutation recovers correctly 50% of pairwise relations on average
Find the minimum -degree ( n , k )-bundle graph that guarantees that the whole ground truth is always recovered if perfect grading is used 1 2 3 4 5 6 7 graders k = Θ( n 1/2 ) exam papers 1 2 3 4 5 6 7
Find the minimum -degree ( n , k )-bundle graph that guarantees that the whole ground truth is always recovered if perfect grading is used 1 2 3 4 5 6 7 graders Find a minimum-degree diameter-3 bipartite graph k = Θ( n 1/2 ) Miller and Siran (2013) exam papers 1 2 3 4 5 6 7
Use much simpler bundle graphs E.g., any k -regular bip. graph for small values of k even by putting together K k , k ’s or a k -regular bip. graph not containing a 4-cycle Aggregation rules plurality, approval Borda Random serial dictatorship Markov-chain-based aggregation rules
Each grader gives k+i-1 points to the exam paper she ranks i-th Global ranking is obtained by sorting the exam papers in terms of non-increasing number of total points ( Borda score ) Ties are broken randomly
Theorem: When Borda is applied on partial rankings that are consistent to the ground truth , the expected fraction of correctly recovered pairwise relations is at least 1-O(1/k) when the bundle graph is 4-cycle-free and at least 1-O(1/k 1/2 ) in general
Students have qualities in [1/2,1] ability to compare correctly two exam papers (probability to find the correct outcome) Qualities define the ground truth ranking σ * Grading according to a Mallows noise model for generating random rankings each grader of quality p ranks each pair among the k exam papers she gets as in σ * with prob. p and incorrectly with prob. 1- p if no ranking is defined, she repeats C., Procaccia, & Shah (2013)
Comparison of Borda and RSD in 500 executions ( n = 1000, k = 8)
Theory: Is a 1-O(1/k 2 ) fraction (or better) possible? Upper bounds? Analysis for noisy grading? Impact of incentives? Practice: Which is the most realistic noise model for grading? How do the methods considered perform in practice (with real students)?
0 2 4 6 8 10 0 2 4 6 8 10 12 14
Recommend
More recommend