The Closest Substring problem with small distances D aniel Marx - PowerPoint PPT Presentation

The Closest Substring problem with small distances D´ aniel Marx dmarx@informatik.hu-berlin.de June 10, 2005 The Closest Substring problem with small distances – p.1/28

The Closest String problem C LOSEST S TRING Strings s 1 , . . . , s k of length L Input: Solution: A string s of length L (center string) max k Minimize: i =1 d ( s, s i ) d ( w 1 , w 2 ) : the number of positions where w 1 and w 2 differ (Hamming distance). Applications: computational biology (e.g., finding common ancestors) Problem is NP-hard even with binary alphabet [Frances and Litman, 1997]. The Closest Substring problem with small distances – p.2/28

The Closest Substring problem C LOSEST S UBSTRING Strings s 1 , . . . , s k , an integer L Input: Solution: — string s of length L (center string), — a length L substring s ′ i of s i for every i max k i =1 d ( s, s ′ Minimize: i ) Remark: For a given s , it is easy to find the best s ′ i for every i . Applications: finding common patterns, drug design. Problem is NP-hard even with binary alphabet (C LOSEST S TRING is the special case | s i | = L .) C LOSEST S UBSTRING admits a PTAS [Li, Ma, & Wang, 2002]: for every ǫ > 0 there is an n O (1 /ǫ 4 ) algorithm that produces a (1 + ǫ ) -approximation. The Closest Substring problem with small distances – p.3/28

Parameterized Complexity Goal: restrict the exponential growth of the running time to one parameter of the input. Definition: Problem is fixed-parameter tractable (FPT) with parameter k if there is an algorithm with running time f ( k ) · n c where c is a fixed constant not depending on k . Definition: Problem is fixed-parameter tractable (FPT) with parameters k 1 and k 2 if there is an algorithm with running time f ( k 1 , k 2 ) · n c where c is a fixed constant not depending on k 1 and k 2 . The Closest Substring problem with small distances – p.4/28

Parameterized intractability We expect that M AXIMUM I NDEPENDENT S ET is not fixed-parameter tractable, no n o ( k ) algorithm is known. W[1]-complete ≈ “as hard as M AXIMUM I NDEPENDENT S ET ” The Closest Substring problem with small distances – p.5/28

Parameterized intractability We expect that M AXIMUM I NDEPENDENT S ET is not fixed-parameter tractable, no n o ( k ) algorithm is known. W[1]-complete ≈ “as hard as M AXIMUM I NDEPENDENT S ET ” Parameterized reductions: L 1 is reducible to L 2 , if there is a function f that transforms ( x, k ) to ( x ′ , k ′ ) such that ( x, k ) ∈ L 1 if and only if ( x ′ , k ′ ) ∈ L 2 , f can be computed in f ( k ) | x | c time, k ′ depends only on k If L 1 is reducible to L 2 , and L 2 is in FPT, then L 1 is in FPT as well. Most NP-completeness proofs are not good for parameterized reductions. The Closest Substring problem with small distances – p.5/28

Parameterized Closest Substring C LOSEST S UBSTRING Strings s 1 , . . . , s k over Σ , integers L and d Input: k, L, d, | Σ | Possible parameters: Find: — string s of length L (center string), — a length L substring s ′ i of s i for every i such that d ( s, s ′ i ) ≤ d for every i Possible parameters: k : might be small d : might be small L : usually large | Σ | : usually a small constant The Closest Substring problem with small distances – p.6/28

Closest Substring—Results parameter | Σ | is constant | Σ | is parameter | Σ | is unbounded d ? ? W[1]-hard k W[1]-hard W[1]-hard W[1]-hard d,k ? ? W[1]-hard L FPT FPT W[1]-hard d,k,L FPT FPT W[1]-hard (Hardness results by [Fellows, Gramm, Niedermeier 2002].) The Closest Substring problem with small distances – p.7/28

Closest Substring—Results parameter | Σ | is constant | Σ | is parameter | Σ | is unbounded d W[1]-hard W[1]-hard W[1]-hard k W[1]-hard W[1]-hard W[1]-hard d,k W[1]-hard W[1]-hard W[1]-hard L FPT FPT W[1]-hard d,k,L FPT FPT W[1]-hard (Hardness results by [Fellows, Gramm, Niedermeier 2002].) Theorem: [D.M.] C LOSEST S UBTRING is W[1]-hard with parameters k and d , even if | Σ | = 2 . (In the rest of the talk, Σ is always { 0 , 1 } .) The Closest Substring problem with small distances – p.7/28

Hardness of Closest Substring Theorem: [D.M.] C LOSEST S UBTRING is W[1]-hard with parameters k and d . Proof by parameterized reduction from M AXIMUM I NDEPENDENT S ET . C LOSEST S UBSTRING M AXIMUM I NDEPENDENT S ET k = 2 2 O ( t ) ⇒ ( G, t ) d = 2 O ( t ) Corollary: No f ( k, d ) · n c algorithm for C LOSEST S UBSTRING unless FPT=W[1]. The Closest Substring problem with small distances – p.8/28

Hardness of Closest Substring Theorem: [D.M.] C LOSEST S UBTRING is W[1]-hard with parameters k and d . Proof by parameterized reduction from M AXIMUM I NDEPENDENT S ET . C LOSEST S UBSTRING M AXIMUM I NDEPENDENT S ET k = 2 2 O ( t ) ⇒ ( G, t ) d = 2 O ( t ) Corollary: No f ( k, d ) · n c algorithm for C LOSEST S UBSTRING unless FPT=W[1]. Corollary: No f ( k, d ) · n o (log d ) or f ( k, d ) · n o (log log k ) algorithm for C LOS - EST S UBSTRING unless M AXIMUM I NDEPENDENT S ET has an f ( t ) · n o ( t ) algorithm. The Closest Substring problem with small distances – p.8/28

Hardness of Closest Substring Corollary: No f ( k, d ) · n o (log d ) or f ( k, d ) · n o (log log k ) algorithm for C LOSEST S UBSTRING unless M AXIMUM I NDEPENDENT S ET has an f ( t ) · n o ( t ) algorithm. M AXIMUM I NDEPENDENT S ET has an f ( t ) · n o ( t ) algorithm ⇓ n variable 3-SAT can be solved in 2 o ( n ) time � FPT=M[1] The Closest Substring problem with small distances – p.9/28

Hardness of Closest Substring Corollary: No f ( k, d ) · n o (log d ) or f ( k, d ) · n o (log log k ) algorithm for C LOSEST S UBSTRING unless M AXIMUM I NDEPENDENT S ET has an f ( t ) · n o ( t ) algorithm. M AXIMUM I NDEPENDENT S ET has an f ( t ) · n o ( t ) algorithm ⇓ n variable 3-SAT can be solved in 2 o ( n ) time � FPT=M[1] The lower bound on the exponent of n is best possible: Theorem: [D.M.] C LOSEST S UBSTRING can be solved in f 1 ( d, k ) · n O (log d ) time. Theorem: [D.M.] C LOSEST S UBSTRING can be solved in f 2 ( d, k ) · n O (log log k ) time. The Closest Substring problem with small distances – p.9/28

Relation to approximability PTAS: algorithm that produces a (1 + ǫ ) -approximation in time n f ( ǫ ) . EPTAS: (efficient PTAS) a PTAS with running time f ( ǫ ) · n O (1) . 1 Observation: if ǫ = d +1 , then a (1 + ǫ ) -approximation algorithm can correctly decide whether the optimum is d or d + 1 ⇒ if an optimization problem has an EPTAS, then it is FPT. Corollary: C LOSEST S UBSTRING has no EPTAS, unless FPT=W[1]. Corollary: C LOSEST S UBSTRING has no f ( ǫ ) · n o (log ǫ ) time PTAS, unless FPT=M[1]. The Closest Substring problem with small distances – p.10/28

What’s next? f 1 ( d, k ) · n O (log d ) time algorithm Some results on hypergraphs f 2 ( d, k ) · n O (log log k ) time algorithm Sketch of the completeness proof Conclusions Lunch The Closest Substring problem with small distances – p.11/28

The first algorithm Definition: A solution is a minimal solution if � k i =1 d ( s, s ′ i ) is as small as possible (and d ( s, s ′ i ) ≤ d for every i ). The Closest Substring problem with small distances – p.12/28

The first algorithm Definition: A solution is a minimal solution if � k i =1 d ( s, s ′ i ) is as small as possible (and d ( s, s ′ i ) ≤ d for every i ). Definition: A set of length L strings G generates a length L string s if whenever the strings in G agree at the i -th position, then s has the same character at this position. Example: G 1 generates s but G 2 does not. 1 1 0 1 0 1 1 1 0 1 1 1 G 1 G 2 0 1 0 1 1 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 s 1 1 0 1 0 1 s 1 1 0 1 0 1 The Closest Substring problem with small distances – p.12/28

First algorithm Let S be the set of all length L substrings of s 1 , . . . , s k . Clearly, |S| ≤ n . Lemma: If s is the center string of a minimal solution, then S has a subset G of size O (log d ) that generates s , and the strings in G agree in all but at most O ( d log d ) positions. The Closest Substring problem with small distances – p.13/28

First algorithm Let S be the set of all length L substrings of s 1 , . . . , s k . Clearly, |S| ≤ n . Lemma: If s is the center string of a minimal solution, then S has a subset G of size O (log d ) that generates s , and the strings in G agree in all but at most O ( d log d ) positions. Algorithm: Construct the set S . Consider every subset G ⊆ S of size O (log d ) . If there are at most O ( d log d ) positions in G where they disagree, then try every center string generated by G . Running time: | Σ | O ( d log d ) · n O (log d ) . The Closest Substring problem with small distances – p.13/28

Proof of the lemma Lemma: If s is the center string of a minimal solution, then S has a subset G of size O (log d ) that generates s , and the strings in G agree in all but at most O ( d log d ) positions. Proof: Let ( s, s ′ 1 , . . . , s ′ k ) be a minimal solution. We show that { s ′ 1 , . . . , s ′ k } has a O (log d ) subset that generates s . The bad positions of a set of strings are the positions where they agree, but s is different. Clearly, { s ′ 1 } has at most d bad positions. We show that if a set of strings has p bad positions, then we can decrease the number of bad positions to p/ 2 by adding a string s ′ i ⇒ no bad position remains after adding log d strings. The Closest Substring problem with small distances – p.14/28

The Closest Substring problem with small distances D aniel Marx - PowerPoint PPT Presentation

The Closest Substring problem with small distances D aniel Marx dmarx@informatik.hu-berlin.de June 10, 2005 The Closest Substring problem with small distances p.1/28 The Closest String problem C LOSEST S TRING Strings s 1 , . . . , s k

Closest Pair of Points Cormen et.al 33.4 Closest Pair of Points Closest pair. Given n points in

Closest Pair One-Shot Problem Given a set P of N points, find p,q P, such that the distance

Dr Jeffrey Chow Research Consultant Civic Exchange Distances to public open spaces Distances to

Machine Translation without Words through Substring Alignment Graham Neubig 1,2,3 , Taro Watanabe

Substring Compression Problems Graham Cormode cormode@bell-labs.com S. Muthukrishnan

Closest Pair of Points in the Plane Inge Li Grtz Thank you to Kevin Wayne for inspiration to

Closest Pair of Points in the Plane Inge Li Grtz Thank you to Kevin Wayne for inspiration to

[9] Orthogonalization Finding the closest point in a plane Goal: Given a point b and a plane, find

String-brane interactions from large to small distances Giuseppe DAppollonio Universit` a di

A Sociolinguistic Analysis of Linguistically Sensitive Dialectal Word Pronunciation Distances

Phylogenetic trees II Estimating distances, estimating trees from distances Gerhard Jger

Metric Distances 28 Great Circle Distances North Pole (90N lat) North Pole C Prime

Geodesic distances and intrinsic distances on some fractal sets Masanori Hino (Kyoto Univ.)

Time-Space Trade-Offs for the Longest Common Substring Problem Tatiana Starikovskaya 1 and Hjalte

Remembering subresults (Part I): Well-formed substring tables Detmar Meurers: Intro to

Remembering subresults: From well-formed substring tables to active charts Detmar Meurers: Intro

Strings Strings A string is a series of characters Characters can be referenced by using

CSE182 lecture 4 notes &questions Vineet Bafna October 5, 2006 1 Notes Recall that we are

Lecture 15: Suffix trees, suffix arrays, and their applica8ons

Strings Part 1: Tries and KMP Lucca Siaudzionis and Jack Spalding-Jamieson 2020/03/05

Technology Folklore Martin Thompson - @mjpt777 http://mechanical-sympathy.blogspot.com/ A

Local stabilizer codes in 3D without string logical operators arXiv:1101.1962 Jeongwan Haah

Class Specifications } A class diagram only gives a sketch of that class } More detail is

MD/RAID-456 Write Journal and Cache Shaohua Li & So Song g Liu Software Engineer, Facebook

The Closest Substring problem with small distances D aniel Marx - PowerPoint PPT Presentation

The Closest Substring problem with small distances D aniel Marx dmarx@informatik.hu-berlin.de June 10, 2005 The Closest Substring problem with small distances p.1/28 The Closest String problem C LOSEST S TRING Strings s 1 , . . . , s k

Closest Pair of Points Cormen et.al 33.4 Closest Pair of Points Closest pair. Given n points in

Closest Pair One-Shot Problem Given a set P of N points, find p,q P, such that the distance

Dr Jeffrey Chow Research Consultant Civic Exchange Distances to public open spaces Distances to

Machine Translation without Words through Substring Alignment Graham Neubig 1,2,3 , Taro Watanabe

Substring Compression Problems Graham Cormode cormode@bell-labs.com S. Muthukrishnan

Closest Pair of Points in the Plane Inge Li Grtz Thank you to Kevin Wayne for inspiration to

Closest Pair of Points in the Plane Inge Li Grtz Thank you to Kevin Wayne for inspiration to

[9] Orthogonalization Finding the closest point in a plane Goal: Given a point b and a plane, find

String-brane interactions from large to small distances Giuseppe DAppollonio Universit` a di

A Sociolinguistic Analysis of Linguistically Sensitive Dialectal Word Pronunciation Distances

Phylogenetic trees II Estimating distances, estimating trees from distances Gerhard Jger

Metric Distances 28 Great Circle Distances North Pole (90N lat) North Pole C Prime

Geodesic distances and intrinsic distances on some fractal sets Masanori Hino (Kyoto Univ.)

Time-Space Trade-Offs for the Longest Common Substring Problem Tatiana Starikovskaya 1 and Hjalte

Remembering subresults (Part I): Well-formed substring tables Detmar Meurers: Intro to

Remembering subresults: From well-formed substring tables to active charts Detmar Meurers: Intro

Strings Strings A string is a series of characters Characters can be referenced by using

CSE182 lecture 4 notes &amp;questions Vineet Bafna October 5, 2006 1 Notes Recall that we are

Lecture 15: Suffix trees, suffix arrays, and their applica8ons

Strings Part 1: Tries and KMP Lucca Siaudzionis and Jack Spalding-Jamieson 2020/03/05

Technology Folklore Martin Thompson - @mjpt777 http://mechanical-sympathy.blogspot.com/ A

Local stabilizer codes in 3D without string logical operators arXiv:1101.1962 Jeongwan Haah

Class Specifications } A class diagram only gives a sketch of that class } More detail is

MD/RAID-456 Write Journal and Cache Shaohua Li &amp; So Song g Liu Software Engineer, Facebook

CSE182 lecture 4 notes &questions Vineet Bafna October 5, 2006 1 Notes Recall that we are

MD/RAID-456 Write Journal and Cache Shaohua Li & So Song g Liu Software Engineer, Facebook