Advanced Photon Source—Users’ Week 2008 Workshop on Software for Challenging Cases in Macromolecular Crystallography 6 May 2008 Web-Ice and Labelit: Tools for Convenient Diffraction Analysis at the Beamline Nicholas Sauter Lawrence Berkeley National Laboratory Collaborators: Stanford Synchrotron Radiation Laboratory Berkeley Center for Structural Biology/ALS
Sector 5 ALS Automounter Present Goals: Micro- scope • Screen for best crystal growth conditions Cryo Stream • Select the highest-quality samples from a batch • Discovery of drug leads and Quantum protein-ligand complexes Gripper 315 X-ray Gonio- • Enable multi-crystal dataset Detector meter acquisition • Perform initial characterization with minimal radiation dose Dewar Also: • Single-run data collection • ALS-style puck: 112 Crystal Samples Eventual Goals Later… • Beamline Operating System (BOS) control • Liquid Nitrogen Autofill
First task—crystal screening: preliminary characterization of X-ray diffraction quality • Identify crystal lattice and cell dimensions • Good fit between model and observation (r.m.s.d.) • Diffraction to high resolution • Minimal crystal disorder (mosaicity) • Minimal diffraction artifacts (ice rings) The challenge is to perform this analysis reliably in a high- throughput automated setting!
Screening results can be viewed both locally & over the Web González et al. (2008) J Appl Cryst 41:176 Collect 2 oscillation MOSFLM / BEST / RADDOSE 1 min DISTL ~5 sec LABELIT ~25 sec frames 90 ° apart DISTL : the selection of candidate Bragg spots. Zhang et al. (2006) J Appl Cryst 39:112 LABELIT : characterization of the lattice. Sauter et al. (2004) J Appl Cryst 37:399 Blu-Ice / BOS : graphical beamline interface --- or --- Web-Ice : Web-viewer
Second task—selecting the best crystal and deciding on data collection strategy Heuristic score Q = 1 – (.7*e – 4/ resolution ) – (1.5* rmsResidual ) – (.02* mosaicity ) BEST (Popov & Bourenkov, 2003) : optimization of exposure time, Δφ , and distance so as to maximize the signal-to-noise (I/ σ ) in the dataset with a given radiation dose. RADDOSE (Murray et al, 2005) : predict the absorbed radiation dose that limits the useful lifetime of the crystal sample. Beamline-specific and experiment-specific calibration
Details of the “View Strategy” Implementation • Calculate strategy in the correct Laue group • Initiate data collection • Process data after autoindexing (at the command line)
Web-Ice goals: scalability, extendability, portability Main site: http://smb.slac.stanford.edu/research/developments/webice Basic idea: the beamline crystallographer logs in to unix account with user name & password. Command-line scripts are run to process the data: run_best run_mosflm run_labelit run_distl The output files are in the user’s home directory, which is cross-mounted on all unix systems at the beamline. The Web-Ice architecture offers the opportunity (through collaboration) to extend beamline efficiency and ultimately improve the science. Developers’ wiki: https://smb.slac.stanford.edu/wikipub
Autoindexing gives the reduced cell, but can only guess at the Bravais lattice Reduced cell Triclinic Monoclinic Monoclinic Monoclinic Hexagonal C -centered C -centered C -centered Rhombohedral
Collaborative Goals to Extend Beamline Science • Early detection of the Laue group with labelit.rsymop / POINTLESS • Phenix.xtriage ; detection of twinning • Real-time monitoring of radiation damage or heavy-atom signal • Fully automated data collection with multi-wavelength protocol • Combination of multiple crystals to form complete dataset Web-ice is not so much an application as it is a computing architecture on which to hang different applications. Already-implemented features include beamline control & beamline video.
Under the hood: Systems computing on a handshake Step 1. Getting a ticket Step 2. Using a ticket here’s your ticket User User Secure https:// “give me a ticket” https://execute job sockets …here’s my password …here’s my ticket Impersonation Authentication Daemon Server (C++ application running as root) (Java webapp running on Apache Is this ticket valid? Tomcat) Global server If yes, change process keeps track ownership to user & of all user execute job login sessions run_mosflm Pluggable Authentication run_labelit run_distl Other LDAP modules Modules (PAM) Implemented by John Taylor & Scott Classen As many servers in the cluster Unix login LDAP as needed to process the data
High throughput automatic signaling Local User Signal each time a new image is collected Blu-Ice or BOS: Graphical Data Collection Interface Impersonation Web-Ice Daemon Crystal Analysis ticket Webapp ticket Remote run_mosflm run_labelit run_distl User ticket Web-Ice Front Page (SSRL or Manual data ALS code) processing
Managing the Sample List: Different Choices at SSRL and ALS Local User Blu-Ice or BOS Web-Ice Sample Beamline Information database List Server Remote Web-Ice Front Page User SSRL Standard or http Excel protocol ALS spreadsheet
Software demo: SIL server Image server & color markup AJAX client
A Historical Note on Automatic Processing • LABELIT represented a new software approach to autoindexing – The initial approach of writing shell scripts to wrap existing software was changed early in development (2003), as legacy software relied too heavily on human input to make choices – Basic well-known algorithms had to be re-examined (cell reduction; Fourier-based autoindexing) – Use of the Python language to rapidly prototype new approaches was indispensable – A core library of C++ crystallography algorithms (cctbx; Grosse- Kunstleve et al. 2002, J Appl Cryst 35: 126) was exposed at the Python scripting level with Boost.Python bindings • Achieving automation has been an enormous challenge – There are additional challenges related to instrumentation, record- keeping, and communication – Physical properties of macromolecular diffraction patterns are very diverse; the simplest algorithms are inadequate for outlying cases
Very Large Unit Cells: Tightly Packed Diffraction Spots • 621Å cubic cell (virus crystal) leads to barely-separated diffraction spots • Results from the indexing algorithm are degraded when two bright spots are categorized as a single spot at the average position • Special fix: – Find the brightest spots (Blue) – Find the best-fit ellipse – Find each spot’s nearest neighbor (Univ. Maryland ANN) – Plot all nearest-neighbor vectors on top of each other – Vector-clusters are probable reciprocal cell vectors – Throw out the large “blobs” longer than the probable reciprocal cell lengths – Special allowance made to accept spots with very little baseline separation; balanced against need for sufficient background
Pseudocentering: systematically weak Bragg spots • The true symmetry is P2 1 with two protein molecules per asymmetric unit, related by a non-crystallographic translation. • The NCS translation is ½ the cell length, approximating an additional symmetry operator, giving rise to alternating weak spots (Hauptman & Karle, 1953). • If weak spots are ignored, the symmetry is C- centered orthorhombic with one protein molecule per asymmetric unit. • Automatic indexing relies on picking the brightest spots, so it is easy to pick the oC cell by chance. • Lowering the spot-picking threshhold to find the weak spots is counterproductive.
Construction of the Sublattice: Cell Doubling c b a Basis vectors a , b , c 2 a , b , c a , 2 b , c a , b , 2 c Strong reflections hkl h = 2n k = 2n l = 2n Patterson peak 0, 0, 0 ½, 0, 0 0, ½, 0 0, 0, ½ Basis vectors 2 a , b + a , c 2 a , b , c + a a , 2 b , c + b 2 a , b + a , c + a Strong reflections h + k = 2n h + l = 2n k + l = 2n h + k + l = 2n Patterson peak ½, ½, 0 ½, 0, ½ 0, ½, ½ ½, ½, ½
Evidence for Cell Doubling in the Raw Data Original Cell Doubled a-axis Doubled b-axis Doubled c-axis Pseudo C -face centered Pseudo B -face centered Pseudo A -face centered Pseudo body-centered *
Filtering out decoy signals Should the lattice be reindexed by imposing pseudo A -centering … or … pseudo-body centering?
Statistical outlier rejection Distribution of peak-heights Distribution of peak-heights for the pseudo A -centered coset for the pseudo body-centered coset 500 140 120 400 100 300 80 60 200 40 100 20 Outlier 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 1 3 5 7 9 11 13 15 17 19 Peak height of Peak height of candidate spot candidate spot Exponential Distribution Gaussian Distribution
More decoy signals to filter out Inadequate mosaicity model Mismatched or non-Bragg-like profile
In Summary • There is still work to be done so that the most challenging cases can be processed automatically; these cases include samples with large unit cells (viruses), and crystals with pseudo-symmetry. • While screening has been automated, the longer term goal of automated dataset collection is only beginning to be addressed. • Web-Ice has been successfully ported from SSRL to BCSB, and will be the focus of continued efforts at real-time data analysis, to enable better high-throughput data collection.
Recommend
More recommend