Stuart Sierra Program on Law & Technology Columbia Law School - PowerPoint PPT Presentation

Feb 01, 2023 •327 likes •514 views

Stuart Sierra Program on Law & Technology Columbia Law School http://altlaw.org/ - the site http://lawcommons.org/ - wiki & mailing list http://columbialawtech.org/ - my employer Talking Points AltLaw History, motivation

Stuart Sierra Program on Law & Technology Columbia Law School http://altlaw.org/ - the site http://lawcommons.org/ - wiki & mailing list http://columbialawtech.org/ - my employer
Talking Points ● AltLaw – History, motivation – Data sources – Back-end ● Semantic Web – What I've done – What I want – Problems I see
Front-end
Data Sources – Large Corpora ● Paul Ohm's corpus, http://bulk.altlaw.org/ – 7 GB, 200,000+ files harvested from court web sites ● Cornell U.S. Code – 748 MB of XML ● http://bulk.resource.org/courts.gov/c/ – 2 GB, 700,000+ federal cases, XHTML ● http://pacer.resource.org/ – 736 GB, 2.7 million PDFs, 1.8 million HTML files
Data Sources – Court Web Sites www.supremecourtus.gov ● 20-40 new cases daily www.ca1.uscourts.gov ● PDF, WordPerfect, HTML, www.ca2.uscourts.gov www.ca3.uscourts.gov plain text www.ca4.uscourts.gov www.ca5.uscourts.gov www.ca6.uscourts.gov . . . 14 appeals courts total 94 district courts ?? state courts ?? local/other courts
Back-end (1) Large Corpora Common Big Data Daily Crawls Merge Model
Back-end (2) Citation Graph Ranking Clustering Common Enhanced Big Data Common Model Duplicate Data Merge Detection Model Entity Extraction Semantic Analysis
Scaling Stuart ● Java ● ● Ruby ● ● Clojure
The Grand Unified Data Model ● Key-value pairs? (files, Berkeley DB) ● Documents? (Solr/Lucene, CouchDB) ● Trees? (XML, JSON, Objects) ● Graphs? (RDF) ● Tables? (SQL)
● “Disk is the new tape.” – NO random access – NO disk seeks – Run at full disk transfer rate, not seek rate ● Data must be splittable ● Process each record in isolation
Secret Weapons ● Hadoop – open-source MapReduce ● Amazon EC2 – cluster by the hour ● Clojure – Lisp on the JVM ● Solr – full-text search + document storage; no SQL database! ● Ruby on Rails
The Grand Unified Data Model ● Key-value pairs? (files, Berkeley DB) ● Documents? (Solr/Lucene, CouchDB) ● Trees? (XML, JSON, Objects) ● Graphs? (RDF) ● Tables? (SQL)
Mismatch ● Hadoop ● RDF – Disk is the new tape – Normalized – Flat key/value files – Random access – Isolated records – Graph structure ● Solr / Lucene – Linked records – Denormalized – Flat documents
Semantic Web – What I Want ● Publish linked data for others ● Accept new data without writing new parsers/scrapers ● Richer internal data model ● Inference over multiple data sources
AltLaw on the Semantic Web ● Persistent URIs for federal courts – e.g. http://id.altlaw.org/courts/us/fed/app/3 – 303 redirects to HTML/RDF ● Beginnings of an ontology – http://github.com/lawcommons/altlaw-vocab – Extension of Dublin Core & Bibliontology ● Semantic web crawler – Output uses “HTTP Vocabulary in RDF”
Questions ● What's in it for you? – How do you want my data? ● Bulk RDF/XML downloads ● RDFa embedded in HTML ● SPARQL endpoint – What would you do with it? ● What's in it for me? – Universal data model – Less data transformation

Recommend

Cloud Computing, Web Services, and the New Web Stack June 19, 2009 Boulder, Colorado Stuart

Cloud Computing, Web Services, and the New Web Stack June 19, 2009 Boulder, Colorado Stuart Sierra Columbia Law School Program on Law & Technology columbialawtech.org altlaw.org The LAMP Stack The LAMP Stack Physical File-based

395 views • 20 slides

Hadoop, Clojure, and the Properties Pattern NoSQL NYC Monday, October 5, 2009 Stuart Sierra,

Hadoop, Clojure, and the Properties Pattern NoSQL NYC Monday, October 5, 2009 Stuart Sierra, AltLaw.org Data Sources Large Corpora Paul Ohm's corpus, http://bulk.altlaw.org/ 7 GB, 200,000+ files harvested from court web sites

420 views • 23 slides

Building Flexible Systems with Clojure and Datomic Stuart Sierra Cognitect We dont want

Building Flexible Systems with Clojure and Datomic Stuart Sierra Cognitect We dont want to paint ourselves into a corner Clojure Flexible Systems Fact-based Context-free Non-exclusive Observable Fact Based

1.18k views • 80 slides

Sun Corridor Inc. Presentation to Sierra Vista City Council Sierra Vista Technical Assistance

Sun Corridor Inc. Presentation to Sierra Vista City Council Sierra Vista Technical Assistance Program Update March 27, 2018 SVTAP Purpose Facilitate and encourage commercial diversification of area defense contractors through programs,

287 views • 5 slides

Reclaiming the Sierra Elizabeth Izzy Martin CEO The Sierra Fund Original feather picture

Reclaiming the Sierra Elizabeth Izzy Martin CEO The Sierra Fund Original feather picture Miner and Mercury Flask detail from brass seal on west side of Capitol Mining left a lasting legacy from the Sierra to the Sea Abandoned Mines

266 views • 25 slides

Statistics Sierra Leone Statistics Sierra Leone PRESENTATION : Compilation process of Sierra

UNITED NATIONS DEPARTMENT OF ECONOMIC AND SOCIAL AFFAIRS STATISTICS DIVISION Workshop on compilation of international merchandise trade statistics, Abuja, Nigeria, 30 aot - 2 septembre 2005 Country Presentation Statistics Sierra Leone

468 views • 12 slides

Municipal Building Project Cynthia Stuart | Stuart Consulting Introductions Cynthia Stuart,

www.barnetmunicipalvt.org Municipal Building Project Cynthia Stuart | Stuart Consulting Introductions Cynthia Stuart, Stuart Consulting Andrea Brohcu, NCIC Format Presentation Regarding Two Options Cynthia Stuart (questions as we go

690 views • 35 slides

Welcome Stuart Henderson April 28, 2017 USQCD and Jefferson Lab Program Welcome! Were

Welcome Stuart Henderson April 28, 2017 USQCD and Jefferson Lab Program Welcome! Were very pleased to host this meeting! Understanding QCD and hadron structure is one of the key missions of the Jefferson Lab scientific program,

196 views • 7 slides

Mike Goulden UC Irvine (mgoulden@uci.edu) Ill talk about two projects in the Sierra National

Mike Goulden UC Irvine (mgoulden@uci.edu) Ill talk about two projects in the Sierra National Forest (above Fresno, near Shaver Lake) Sierra Nevada Critical Zone Observatory (Sierra CZO) Funded by NSF Roger Bales PI (UC Merced)

416 views • 12 slides

2019 MAYOR'S STATE OF THE CITY ADDRESS City of Sierra Madre The Golden Age of Sierra Madre

2019 MAYOR'S STATE OF THE CITY ADDRESS City of Sierra Madre The Golden Age of Sierra Madre Finances City Services Public Safety Library Public Works (Water, Sewer) AGENDA Stewardship Clean Power Alliance The Golden Age of Sierra Madre

409 views • 30 slides

Gas Fireplaces The Sierra Flame Advantage Quali lity ty w workmansh ship All Sierra Flame

Gas Fireplaces The Sierra Flame Advantage Quali lity ty w workmansh ship All Sierra Flame units are manufactured with the highest quality parts available, from the smallest component to the 18-20 gauge body and stainless steel burners. Our

944 views • 45 slides

INTEGRATED PROGRAM REVIEW Columbia Generating Station June 19, 2018 B O N N E V I L L

INTEGRATED PROGRAM REVIEW Columbia Generating Station June 19, 2018 B O N N E V I L L E P O W E R A D M I N I S T R A T I O N Columbia O&M Background Columbia costs are included in the revenue requirements

101 views • 7 slides

Sierra Leone Legal Information Institute Can it be a tool for promoting the rule of law? Law via

Sierra Leone Legal Information Institute Can it be a tool for promoting the rule of law? Law via the Internet 2011 Maria WARREN Hongkong Mohamed A B TIMBO Background Sierra Leone Special Court for Sierra Leone Incentive for

935 views • 14 slides

1 http://www.nature.com/nature/journal/v486/n7401/full/nature11018.html Economy B.C.s interior

Presentation to the Special Committee on Timber Supply, Vancouver, July 11, 2012 Jens Wieting, Forest Campaigner, Sierra Club BC Sierra Club BC Background Sierra Club BC is a non- profit environmental organization whose mission is to protect

241 views • 3 slides

Mac OS 10.12 Sierra Introduction: ! Sierra 10.12 is the latest Macintosh operating system from

Mac OS 10.12 Sierra Introduction: ! Sierra 10.12 is the latest Macintosh operating system from Apple. ! Previous Systems: ! OSX 10.5 Leopard ! OSX 10.6 Snow Leopard ! OSX 10.7 Lion ! OSX 10.8 Mountain Lion ! OS X 10.9 Mavericks ! OS X10.10

350 views • 18 slides

of Molecular Chain Length Sean Parlia Columbia University, Dispersion Technology Inc. Dr.

Rheology of non-Newtonian liquid Mixtures and the Role of Molecular Chain Length Sean Parlia Columbia University, Dispersion Technology Inc. Dr. Ponisseril Somasundaran Columbia University Dr. Andrei Dukhin Dispersion Technology Inc. Center

336 views • 16 slides

The Meson Spectroscopy Program Using the Forward Tagger with CLAS12 at Jefferson Lab Stuart

The Meson Spectroscopy Program Using the Forward Tagger with CLAS12 at Jefferson Lab Stuart Fegan INFN Genova (for the CLAS Collaboration) MESON2014, Krakw, Poland May 30 th , 2014 Outline 2 Introduction QCD and quark models

466 views • 22 slides

2Year Feather River Land Trust Project BACHAND & ASSOCIATES Sustainable Water Resources

Sierra Valley Board Presentation 3/18/2019 Advancing Groundwater Sustainability in Sierra Valley: Key Messages from the Sierra Valley Groundwater Study and the GSP Planning Grant Effort P.A.M. Bachand, Ph.D. 1 ; S.M. Bachand, MS, ME, PE 1 ; K.

304 views • 14 slides

Mac OS 10.13 High Sierra Introduction: ! High Sierra 10.13 is the latest Macintosh operating

Mac OS 10.13 High Sierra Introduction: ! High Sierra 10.13 is the latest Macintosh operating system from Apple. ! Previous Systems: OSX 10.5 Leopard ! OSX 10.6 Snow Leopard ! OSX 10.7 Lion ! OSX 10.8 Mountain Lion ! OS X 10.9 Mavericks ! OS

538 views • 19 slides

Julien Vermillard - Sierra Wireless Eclipsecon France 2014 Software Engineer at Sierra Wireless,

Julien Vermillard - Sierra Wireless Eclipsecon France 2014 Software Engineer at Sierra Wireless, implementing various protocols for AirVantage cloud service Apache Software Foundation member Eclipse committer on Californium and Wakaama

754 views • 47 slides

A CENTURY OF AGRICULTURAL RESEARCH IN SIERRA LEONE: CHALLENGES, GAPS AND POLICY ISSUES Presented

A CENTURY OF AGRICULTURAL RESEARCH IN SIERRA LEONE: CHALLENGES, GAPS AND POLICY ISSUES Presented By: Joseph M. Kargbo, Ph.D. Director General Sierra Leone Agricultural Research Institute (SLARI) Tower Hill, PMB 1313 Freetown, Sierra Leone

1.15k views • 20 slides

SIERRA MADRE GOLD & SILVER VENTURE CAPITAL FUND SIERRA MADRE GOLD & SILVER VENTURE

SIERRA MADRE GOLD & SILVER VENTURE CAPITAL FUND SIERRA MADRE GOLD & SILVER VENTURE CAPITAL FUND DISCOVERING PRECIOUS METALS IN LATIN AMERICA IN COUNTRIES WITH A MINING TRADITION GOING BACK HUNDREDS OF YEARS MEXICO, PERU

252 views • 22 slides

IRS OFFSHORE VOLUNTARY DISCLOSURE PROGRAM STUART D. LYONS BAKER NEWMAN NOYES INTERNATIONAL TAX

IRS OFFSHORE VOLUNTARY DISCLOSURE PROGRAM STUART D. LYONS BAKER NEWMAN NOYES INTERNATIONAL TAX PRACTICE LEADER November 2015 19th Annual Maine Tax Forum 1 TOPICS TO COVER 1. US International Information Reporting Obligations 2. Penalties

245 views • 23 slides

Fish and Wildlife Compensation Program Trevor Oussoren, program manager, Columbia region. CRT

Angus Glass Fish and Wildlife Compensation Program Trevor Oussoren, program manager, Columbia region. CRT Workshop, Fauquier, June 15, 2013 The Fish and Wildlife Compensation Program is a partnership of: Today Who We Are FWCP and the

550 views • 24 slides