cse 416 section 1
play

CSE 416, Section 1 Semester Project Approach Session Objectives - PDF document

Session 3 Project Approach CSE 416, Section 1 Semester Project Approach Session Objectives Understand the analysis approach taken by MGGG in their analysis of the effects of racial Gerrymandering in the Virginia House of Delegates


  1. Session 3 – Project Approach CSE 416, Section 1 Semester Project Approach Session Objectives � Understand the analysis approach taken by MGGG in their analysis of the effects of racial Gerrymandering in the Virginia House of Delegates � Understand your high-level approach to the project � Begin to think about design choices � Begin to understand data requirements to support analysis 2 � Robert Kelly, 2020 1 11/9/2020 � Robert Kelly, 2020

  2. Session 3 – Project Approach Reading � Comparison of Districting Plans for the Virginia House of Delegates, Metric Geometry and Gerrymandering Group, MGG, https://mggg.org/VA-report.pdf � Wikipedia - https://en.wikipedia.org/wiki/Graph_partition - Basic background Lots of approaches to graph partitioning, but we are not concerned with minimizing edges and generating equality of nodes, so we use a multi-level method 3 � Robert Kelly, 2020 Project Background � Project based on MGGG analysis for the court � Virginia House of Delegates � Eleven state districts were ruled unconstitutional � Analysis examined original (unconstitutional) plan and possible replacement plans (e.g., Republican suggested plan) � Analysis method “highlights and quantifies the dilutive effects of packing Black Voting Age Population (BVAP)” (>55% in 11 districts) � Analysis looked at the 11 districts and immediate neighbors (total of 33) Study remarks that 37% BVAP is the empirical line for African- American representation, >55% considered excessive 4 � Robert Kelly, 2020 2 11/9/2020 � Robert Kelly, 2020

  3. Session 3 – Project Approach Analysis Results Typical box and whisker plot � Districts sorted by (lowest to highest) BVAP � Each district shows the range of BVAP values in ensemble 5 � Robert Kelly, 2020 Box And Whisker Plot 99 th percentile Target BVAP range Evaluated plan result Median 6 � Robert Kelly, 2020 3 11/9/2020 � Robert Kelly, 2020

  4. Session 3 – Project Approach How Do You Generate a “Random” Districting Plan? � Recall that a districting plan is a partition of the k-node precinct graph into n subgraphs, each of which is 1) connected and 2) adheres to state districting requirements (e.g., equal population, compact, fewer counties, etc.) � 2-stage process in which initially each node is considered a cluster � 1. Recursively combine neighboring clusters until the overall graph reaches n clusters � 2. Recursively balance pairs of clusters until eventually each cluster achieves the compactness goal and the population distribution goal 7 � Robert Kelly, 2020 How Do We Form n Clusters � In each iteration � For each cluster, select a random neighboring cluster, and combine the two clusters into a new cluster � Update neighbors for each newly formed cluster � Terminate when number of clusters equals n There will likely be optional use cases in which you can try out variations on this algorithm 8 � Robert Kelly, 2020 4 11/9/2020 � Robert Kelly, 2020

  5. Session 3 – Project Approach How Do We Balance the Clusters? � For each cluster � Select a neighboring cluster at random to rebalance � Generate a spanning tree of the combined cluster (you can try different approaches) � Form the set of edges to cut that will “improve” some combination of 1) compactness and 2) population equality A tree is a connected undirected graph with no cycles. It is a spanning tree of a graph G if it spans G (that is, it includes every vertex of G) and is a subgraph of G (every edge in the tree belongs to G). A spanning tree of a connected graph G can also be defined as a maximal set of edges of G that contains no cycle, or as a minimal set of edges that connect all vertices. - Wikipedia 9 � Robert Kelly, 2020 When Do We Terminate the Algorithm? � Terminate when the redistricting plan has � 1. population difference between the most populous cluster (district) and the least populous cluster (district) in the state is less than a user provided threshold � Compactness measure for each district is less than a user provided threshold Will this approach provide “random” districting plans? How do you determine if the plans appear random? 10 � Robert Kelly, 2020 5 11/9/2020 � Robert Kelly, 2020

  6. Session 3 – Project Approach Begin to Think about Implications � How do we represent a node (i.e., precinct)? � How do we represent a cluster (i.e., district)? � How do we calculate neighbors? � How do we measure compactness? � How do we display a “random” district plan to the user? � How do we verify that the results are random? 11 � Robert Kelly, 2020 Multiple District Plans � You will generate multiple district plans (usually referred to as districtings) � Run multiple Python processes, each of which will generate a plan � Run a small number of processes on your laptop/desktop, but a larger number on the SeaWulf � Each process might run multiple algorithms in sequence until desired number of plans are generated 12 � Robert Kelly, 2020 6 11/9/2020 � Robert Kelly, 2020

  7. Session 3 – Project Approach Graph Node Geography � Project building block unit is the geography of a precinct � Boundary data is generally available, but may not be totally accurate � Combine/split of a cluster requires calculating the new cluster boundaries � BVAP (and other minority) population data is generally not available for a precinct (you will need to map census data to it) Note: MGGG paper uses census blocks as the building block 13 � Robert Kelly, 2020 Reports � Your project will generate summary data for a variety of runs � For example � Independence of results from seed � Change in population balance vs. iteration � Ideal Markov chain length � Single seed districting plan vs. one for each random district � Comparison with Gerrymandering measure results 14 � Robert Kelly, 2020 7 11/9/2020 � Robert Kelly, 2020

  8. Session 3 – Project Approach Differences Between Project and MGGG Report This comparison is meant to help you when reading the MGGG paper MGGG CSE416 � State districting � Congressional Districting � � Virginia Multiple states � Analysis of 33 VA house districts � Complete state analysis � � Markov chain No concern for Markov chain validity � 100 seed plans � Phase 1 graph partition approach � � Census block building blocks Precinct building blocks � Seed plan population balanced � Seed plans unbalanced by population � Spanning tree cuts balance � Spanning tree cuts reduce pop. Disparity � Phase 2: Flip, ReCom, and Mix � Phase 2: Modified ReCom only � Specific compactness measure � Multiple compactness measures 15 � Robert Kelly, 2020 Things to Think About � What data is needed to run the algorithm? � What data is needed to display the results in the Web client? � How and when do you transmit data from the client to the server? � Can you store partial results when building a districting ensemble? � How are results passed from the SeaWulf to your server? � What does the GUI look like? � How does the user request a run of multiple districting plans? � How do you display summary data from a run? � What debugging features should be built into the GUI? � How do you keep track of design options/decisions/questions? 16 � Robert Kelly, 2020 8 11/9/2020 � Robert Kelly, 2020

  9. Session 3 – Project Approach Top-Level System Architecture Server Logic SeaWulf GUI (Java) (Java) (JavaScript) Data Population Resource DB (Python) Project DB Data sources 17 � Robert Kelly, 2017-2020 Have You Satisfied the Objectives? � Understand the analysis approach taken by MGGG in their analysis of the effects of racial Gerrymandering in the Virginia House of Delegates � Understand your high-level approach to the project � Begin to think about design choices � Begin to understand data requirements to support analysis 18 � Robert Kelly, 2020 9 11/9/2020 � Robert Kelly, 2020

Recommend


More recommend