001 Notes

String Structure and Matching Professional Playbook

Overview and Reading Path

String structure problems become efficient when repeated character comparisons are replaced with reusable memory.

Use this family when the problem asks about:

  • finding patterns
  • prefix/suffix borders
  • repeated substrings
  • dictionary prefixes
  • many patterns at once
  • palindromes
  • subsequences and edit distance
  • suffixes, duplicate substrings, or substring fingerprints

Read in this order:

  1. Identify the string relationship.
  2. Decide whether matches are contiguous or gapped.
  3. Choose the reusable memory structure.
  4. Define the mismatch or transition rule.
  5. Verify boundary handling.

Core Vocabulary

TermMeaning
PatternString being searched for or matched
TextString being scanned
PrefixBeginning segment of a string
SuffixEnding segment of a string
BorderA substring that is both prefix and suffix
FingerprintHash or encoded representation of a substring
Trie nodeA shared prefix state for dictionary words
Alignment statePair of positions in two strings

Beginner Mental Model

Think of searching a word in a book. The slow way restarts from scratch after every mismatch.

String structure matching asks:

What did the previous comparisons already prove?
Can I reuse that proof instead of comparing characters again?

Sometimes the reusable memory is a prefix table. Sometimes it is a hash, trie, palindrome radius, or DP table.

From Brute Force To This Pattern

Brute force compares characters again and again:

try a start position
compare characters
on mismatch, move one step
compare the same characters again

String structure matching changes the question:

What did the previous comparisons prove, and how can I reuse that proof?

The shift is from repeated character comparison to reusable string memory.

What Information Becomes Reusable

Reusable information depends on the string relationship:

  • KMP stores reusable prefix/suffix borders
  • Z stores prefix-match lengths at every position
  • rolling hash stores substring fingerprints
  • tries store shared dictionary prefixes
  • failure links store fallback states for many patterns
  • palindrome centers store mirror expansion facts
  • two-string DP stores alignment subproblems

What This Mental Model Prevents

This model prevents:

  • restarting a pattern match from scratch after every mismatch
  • confusing substring with subsequence
  • recomputing substring equality character by character
  • searching every dictionary word independently
  • using floating or ad hoc string keys when a structural index is needed
  • losing endpoint correctness in substring ranges

Small Visual Model

text:    a b a b a c
pattern: a b a b x
              ^
              mismatch

reuse border:
matched "abab" has border "ab"
continue from the border instead of restarting from zero

Core Mental Model

The central question is:

What string relationship can I remember so future comparisons do less work?

String structure problems are not solved by comparing characters harder. They are solved by naming a repeated relationship and storing enough memory to avoid restarting from scratch.

String algorithms are usually built from four pieces:

unit:
    character, substring, prefix, suffix, word, or alignment state

memory:
    border table, hash prefix, trie node, palindrome radius, DP cell, suffix rank

fallback or transition:
    what happens when the next character does or does not match

answer surface:
    index, count, length, reconstructed string, set of matches, or decision

How To Read This Family Visually

Use the same symbols throughout this section:

[]    active substring, matched prefix, trie path, DP cell, or suffix interval
{}    reusable memory: border, hash, trie node, radius, DP table, suffix order
^     current character, center, fallback point, or alignment state
-->   match extension, fallback, trie edge, or DP transition
x     mismatch, impossible alignment, or discarded comparison
==    proven equal string relation
...   unchecked text or remaining suffix

One-minute mental model:

see repeated string comparison
    -> ask what relationship can be remembered
    -> store border, hash, trie path, radius, DP cell, or suffix order
    -> extend on match and fall back/transition on mismatch
    -> invariant: memory represents the longest/exact reusable relation so far

Read every diagram as:

shape:
    what string unit is being compared?
memory:
    what relation is remembered from previous characters?
action:
    does the next character extend, fall back, or branch?
invariant:
    what equality or alignment fact remains true?

ASCII Visual Map

String structure should feel like saving a relationship so the next mismatch does not erase all previous work.

KMP border fallback:

pattern:  a b a b a c
text:     a b a b a x
                  ^
                  mismatch

matched prefix "ababa" has border "aba"

fallback:
    keep "aba" as useful matched work
    continue from that border length
rolling hash interval:

string:     a  b  c  d  e  f
prefix:  H0 H1 H2 H3 H4 H5 H6

hash [l, r):

         l        r
         v        v
         c   d   e

hash = H[r] - H[l] * base^(r-l)
trie shared prefixes:

          root
          /  \
         c    d
        /      \
       a        o
      / \        \
     r   t        g

"car" and "cat" reuse the path c -> a before splitting.
palindrome center expansion:

chars:    a   b   a   b   a
                  ^
                center

compare left/right mirrors:

        a   b   a   b   a
        ^       |       ^
        left  center  right
two-string DP grid:

             t index ->
          +---+---+---+---+
s index   |   |   |   |   |
   |      +---+---+---+---+
   v      |   |(i,j)      |
          +---+---+---+---+

Cell (i, j) means the answer for two prefixes or two suffixes, depending on
the chosen convention.

Unit of Comparison

Choose what one step compares.

Problem shapeUnit
exact pattern searchpattern character against text character
repeated substringprefix/suffix border
substring equalityinterval hash or suffix rank
dictionary matchingtrie edge or word end
palindromemirrored character pair around a center
edit distance(i, j) alignment state
subsequence matchingnext character position with gaps allowed
suffix indexingsuffix order and common prefix length

Problem Shape Diagrams

exact pattern search:

text:     a b c a b c a b
pattern:      a b c
              ^ ^ ^

unit = compare pattern[j] with text[i]
invariant: matched prefix equals the current text suffix
repeated substring:

string:  a b a b a b
prefix:  a b a b
suffix:      a b a b

border = prefix that is also suffix
invariant: border can be reused after a mismatch
substring equality:

string:  a b c d a b c x
         [a b c]   [a b c]
          l1 r1     l2 r2

unit = interval hash or suffix-rank evidence
invariant: equal interval evidence implies equal substring under the model
dictionary matching:

          root
           |
           c
          / \
         a   o
        /     \
       t       d

unit = trie edge or terminal word marker
invariant: current trie path equals the scanned prefix
palindrome:

chars:  r a c e c a r
              ^
            center
        ^           ^
       mirror pair compare
invariant: all matched mirror pairs around the center are equal
edit distance:

          target index ->
        +---+---+---+
source  |   |   |   |
index   +---+---+---+
  |     |   | i,j   |
  v     +---+---+---+

unit = alignment state (i, j)
invariant: cell (i, j) owns the answer for chosen prefixes or suffixes
subsequence matching:

text:    a x b y c z d
pattern: a   b   c
         ^   ^   ^

Gaps are allowed; order must be preserved.
invariant: matched pattern characters appear in increasing text positions
suffix indexing:

string: banana

suffixes:
0 banana
1 anana
2 nana
3 ana
4 na
5 a

sorted suffix order exposes common prefixes and duplicates.
invariant: adjacent sorted suffixes reveal reusable common-prefix structure

The unit decides the state. A substring problem can be about ranges, hashes, suffixes, or DP intervals depending on what relationship is reused.

Reusable Memory

The memory structure stores a proof about earlier characters.

StrategyReusable memory
KMPlongest proper prefix that is also suffix for each pattern prefix
Z algorithmlongest match from current position to global prefix
rolling hashnumeric fingerprint of every prefix
trieshared prefix path across many words
Aho-Corasicktrie plus failure links to next valid border
palindrome expansioncenter and radius or DP interval truth
two-string DPbest answer for prefixes or suffixes (i, j)
suffix indexingsorted suffixes, LCPs, or hash+binary-search evidence

Good memory answers the next comparison question directly. Bad memory stores raw substrings repeatedly and recreates the same comparisons.

Fallback or Transition

The fallback rule is what makes the algorithm professional.

match:
    extend the current relationship

mismatch:
    reuse the largest still-valid relationship instead of restarting blindly

complete match:
    record an answer and fall back to allow overlapping matches

For KMP, fallback follows the prefix table. For Aho-Corasick, failure links move to the longest suffix that is also a trie prefix. For edit DP, mismatch chooses insert, delete, or replace transitions.

Equality Representation

String equality can be represented in several ways.

NeedRepresentation
exact online pattern matchKMP or Z
many substring equality queriesrolling hash or suffix structure
many dictionary prefixestrie
many patterns in one scanAho-Corasick
mirror equalitypalindrome center/radius
gapped equalitysubsequence or edit DP

Rolling hash is a probabilistic shortcut unless collision handling or double hashing is used. KMP/Z are exact but specialized for prefix-style reuse.

Endpoint and Interval Meaning

String bugs often come from unclear interval ownership.

prefix length k:
    characters [0, k)

substring start, length:
    characters [start, start + length)

DP state (i, j):
    usually prefixes s[0:i], t[0:j] or suffixes s[i:], t[j:]

State the convention before mixing hashes, DP tables, or substring boundaries.

Universal String Structure Template

identify the string unit and match relation
choose exact reusable memory for that relation
build or update the memory with a clear endpoint convention

scan, query, or transition through the string:
    compare the current unit
    extend on match
    fall back, branch, or update DP on mismatch
    record answer when the memory proves the requested relation

return index, count, length, reconstruction, or decision

Strategy-Specific Mental Model

StrategyMental modelMain risk
KMP/Zprefix relationships prevent restartwrong fallback after full match
rolling hashsubstring equality becomes prefix arithmeticcollision or modulo mistakes
triewords share prefix pathsforgetting terminal word state
Aho-Corasickmany patterns share trie and fallback linksmissing output links
palindromemirror equality expands around centerseven/odd center mismatch
two-string DPprefixes/suffixes encode alignment choiceswrong base row/column
suffix indexingsuffix order exposes duplicate/distinct substringsunclear LCP semantics

Mini Translation Table

Statement signalLikely memory
”find all occurrences of pattern”KMP/Z
”many words with shared prefixes”trie
”many patterns in one text”Aho-Corasick
”repeated/duplicate substring”suffix structure or rolling hash
”palindromic substring/subsequence”center/radius or interval DP
”minimum edits”two-string DP
”subsequence count”alignment DP or next-position table

Mini Traces

Trace 1: KMP fallback.

matched:
pattern: a b a b a c
text:    a b a b a x
                 ^
                 mismatch

step 1:
    matched length = 5 ("ababa")
    invariant: first 5 pattern chars matched text suffix

step 2:
    fallback to border length 3 ("aba")
    invariant: "aba" is still a valid matched suffix, so work is reused

Trace 2: trie dictionary scan.

words:
cat, car, dog

scan prefix "ca":
    root -> c -> a

step:
    next char t reaches terminal "cat"

invariant:
    current trie node represents exactly the scanned prefix path

Contrast Diagrams

Trie vs KMP vs rolling hash:

trie:
    many dictionary words share prefix paths
    invariant: node path equals a word prefix

KMP:
    one pattern reuses its own borders during text scan
    invariant: border length is the longest reusable matched prefix

rolling hash:
    many substring equality checks become numeric interval checks
    invariant: hash interval represents substring content under the hash model

Palindrome DP vs expand-around-center:

expand center:
    left <- center -> right
    invariant: mirrored chars matched for this center radius

interval DP:
    dp[l][r] depends on dp[l+1][r-1]
    invariant: cell records whether substring [l, r] is palindromic

Misclassification Checks

Use this family when repeated textual relationships are the bottleneck. Be careful when:

SymptomBetter family may be
string is only scanned with countsSequential Aggregation
problem is mostly graph reachabilityConnectivity Discovery
choices over string cuts repeatState Space Exploration
range queries over character counts dominateRange Query & Indexed Aggregation
numeric parsing/formula is the coreMathematical Numeric Reasoning

The clean test is: can you name the reusable string relationship that prevents the next comparison from restarting?

What This Family Is / Is Not

Use this family when repeated string relationships are the bottleneck.

Do not use it when:

  • the problem only needs a simple frequency count
  • the answer is about arbitrary graph reachability
  • the main relation is numeric rather than textual
  • the string is just incidental input for a different pattern

Universal Template

identify the string unit
define what it means to match
build reusable memory
scan or query using that memory
apply the mismatch or transition rule
return the required match, count, length, or decision

Detailed Translation Workflow

Translate every string-structure problem in this order:

1. Is the match contiguous or gapped?
2. What exact relationship repeats?
3. What unit is being compared?
4. What memory avoids repeated comparison?
5. What is the mismatch or transition rule?
6. What endpoint convention prevents off-by-one mistakes?
7. What does the answer ask for: index, count, length, boolean, or construction?

If the match is gapped, think subsequence/DP before substring algorithms. If the match is contiguous and repeated many times, think borders, hashes, tries, or suffix indexing.

Strategy Map

StrategyUse whenReusable memory
KMP / Zprefix-pattern reuse mattersborder table or Z-box
Rolling hashmany substring equality checksprefix hash arrays
Triemany words share prefixesprefix tree nodes
Aho-Corasick / multi-patternmany patterns scan one texttrie plus failure links
Palindrome structuremirror equality matterscenters, radii, intervals
Subsequence/edit matchinggapped alignment matters(i, j) DP state
Suffix indexingduplicate/distinct substring structure matterssuffix order, LCP, hash+binary search

Problem Translation Table

Problem typeUnit / stateReuse ruleEngine
Find First Occurrencepattern indexKMP border fallbackKMP
Repeated Substring Patternprefix/suffix borderlongest border testKMP
Shortest Palindromeprefix palindrome borderKMP on combined stringKMP
Sum of Scores of Built Stringsprefix match lengthZ-box reuseZ algorithm
Repeated DNA Sequencesfixed substringrolling hash / encoded windowrolling hash
Longest Duplicate Substringsubstring lengthhash equality plus binary searchrolling hash
Implement Trienode per charactershared prefix pathtrie
Word Breakstring indexdictionary prefix transitionstrie or DP
Word Search IIgrid path and trie nodeprune missing prefixestrie + backtracking
Aho-Corasick Searchautomaton statefailure linksmulti-pattern matching
Longest Palindromic Substringcenter or intervalexpand/mirrorpalindrome
Palindromic Substringscenterexpand around centerpalindrome
Longest Palindromic Subsequencei, jshrink intervalDP
Longest Common Subsequencei, jadvance one or both stringstwo-string DP
Edit Distancei, jinsert/delete/replacetwo-string DP
Distinct Subsequencesi, jmatch or skip source charDP
Suffix Array Searchsuffix rankneighboring suffix comparisonssuffix indexing

Canonical Examples

Invariant:

lps[i] is the length of the longest proper prefix that is also a suffix ending
at pattern[i].

Skeleton:

def build_lps(pattern):
    lps = [0] * len(pattern)
    j = 0
    for i in range(1, len(pattern)):
        while j > 0 and pattern[i] != pattern[j]:
            j = lps[j - 1]
        if pattern[i] == pattern[j]:
            j += 1
            lps[i] = j
    return lps

Rolling Hash: Repeated DNA

Invariant:

The rolling window fingerprint represents the current fixed-length substring.

Use direct encoding for small alphabets when possible. It avoids collision risk from modular hashing.

Invariant:

Every path from root to a node represents exactly one prefix.
class TrieNode:
    def __init__(self):
        self.children = {}
        self.end = False

Palindrome: Expand Around Center

Invariant:

While left and right match, s[left:right + 1] is a palindrome.
def longest_palindrome(s):
    best = ""

    def expand(left, right):
        nonlocal best
        while left >= 0 and right < len(s) and s[left] == s[right]:
            left -= 1
            right += 1
        candidate = s[left + 1:right]
        if len(candidate) > len(best):
            best = candidate

    for i in range(len(s)):
        expand(i, i)
        expand(i, i + 1)

    return best

Edit Distance: Two-String DP

Invariant:

dp[i][j] is the minimum edits to convert the first i characters of A into the
first j characters of B.

Decision Rules

Use KMP/Z when mismatch reuse in a single pattern matters. Use rolling hash when many substring equality checks are needed. Use a trie when many words share prefixes or prefix queries are frequent. Use multi-pattern automata when one text must be scanned against many patterns. Use palindrome center/interval logic when mirror symmetry is the structure. Use two-string DP when matching is gapped or edit-like. Use suffix indexing when the problem is about duplicate, distinct, or ordered substrings at scale.

Common Failure Modes

  • Restarting pattern search from scratch after a mismatch.
  • Mixing inclusive and half-open substring endpoints.
  • Ignoring rolling-hash collision risk.
  • Forgetting terminal markers in tries.
  • Not pruning trie branches in grid word search.
  • Treating subsequence matching as substring matching.
  • Using center expansion for a subsequence problem.

Performance Notes

KMP/Z are linear. Rolling hash is usually linear for fixed windows and O(n log n) when paired with binary search on length. Trie memory can be large because each node stores children. Two-string DP is usually O(mn) and can often compress rows. Suffix arrays are powerful but implementation-heavy.

Cheat Sheet

Pattern search:
    What can I reuse after mismatch?

Hashing:
    What substring equality checks repeat?

Trie:
    Which prefixes are shared?

Palindrome:
    What mirrors what?

Two-string DP:
    What do i and j mean?

Suffix indexing:
    Do I need global substring order or duplicates?

Extended Translation Notes

String structure problems should start with the relationship being reused.

Step 1: Identify Match Type

Match typeMeaningTypical tools
exact substringcontiguous pattern occurrenceKMP, Z, rolling hash
prefix/suffix borderprefix equals suffixKMP, Z
dictionary prefixmany words share startstrie
many patternsmany words against one texttrie, Aho-Corasick
palindromemirrored equalitycenter expansion, DP, Manacher
subsequenceordered but not contiguoustwo-pointer, DP
edit alignmentinsert/delete/replacetwo-string DP
duplicate substringrepeated contiguous blockrolling hash, suffix array

Do not choose an algorithm before deciding whether the match is contiguous.

Step 2: Define the Unit

The unit may be:

character
substring
prefix
suffix
dictionary word
pattern state
palindrome center
two-string index pair
suffix rank

Step 3: Define Reuse After Mismatch

This is the heart of string matching.

Examples:

KMP:
    fallback to the longest border

Z algorithm:
    reuse the current prefix-match box

Trie:
    stop when a prefix edge is missing

Aho-Corasick:
    follow failure links

Edit distance:
    try insert, delete, replace

Step 4: Choose Endpoint Convention

Use half-open ranges when possible:

s[left:right]
left included
right excluded

This reduces off-by-one bugs in substring DP and hashing.

Strategy Invariants

KMP

Invariant:

j is the number of pattern characters currently matched.

On mismatch, KMP does not reset to zero unless no border remains.

Z Algorithm

Invariant:

The Z-box [left, right] is the farthest known interval matching the prefix.

Z is often better when every position needs a prefix-match length.

Rolling Hash

Invariant:

hash(left, right) is computed from prefix hashes in O(1).

Use double hashing or direct encoding when collision risk matters.

Trie

Invariant:

Every node represents the prefix formed by the path from root.

Terminal markers distinguish complete words from prefixes.

Palindrome

Invariant:

Expansion is valid while left and right characters mirror each other.

Interval DP uses:

s[i] == s[j] and inside interval is palindrome

Two-String DP

Invariant:

dp[i][j] describes the relationship between prefixes or suffixes of two
strings.

Define whether i and j mean prefix lengths or current suffix indexes before coding.

Worked Problem Snapshots

Repeated Substring Pattern

State:

longest border length of the whole string

Rule:

if n % (n - border) == 0, the string repeats a smaller unit

Engine:

KMP prefix table

Shortest Palindrome

Goal:

find longest palindromic prefix

Technique:

run KMP-style border on s + "#" + reversed(s)

Trap:

Use a separator that cannot appear as a normal bridge between the two strings.

Sum of Scores of Built Strings

State:

Z[i] = prefix match length starting at i

Answer:

sum of all Z values plus full string length

Engine:

Z algorithm

Longest Duplicate Substring

State:

candidate length

Check:

does any substring of this length have a repeated hash?

Engine:

binary search on length plus rolling hash

Word Search II

State:

grid position, trie node, visited path

Invariant:

If the current trie node has no edge for a character, no word under that prefix
can be found on this branch.

Engine:

trie plus backtracking

Longest Palindromic Subsequence

State:

i, j interval

Transition:

if s[i] == s[j]: 2 + solve(i + 1, j - 1)
else max(solve(i + 1, j), solve(i, j - 1))

Engine:

interval DP

Distinct Subsequences

State:

i in source, j in target

Choices:

skip source[i]
use source[i] if it matches target[j]

Engine:

two-string counting DP

Longest Common Substring

State:

dp[i][j] = length of common suffix ending at i and j

Trap:

Substring resets to zero on mismatch; subsequence does not.

Dry Runs

Dry Run: KMP Fallback

Pattern:

ababaca

When matching ababa and seeing a mismatch after length 5:

matched prefix = ababa
fallback length = lps[4] = 3
next comparison resumes as if aba is already matched

KMP reuses the border instead of restarting at the next text position.

Dry Run: Rolling Hash Window

String:

AAAAACCCCCAAAAA

For fixed length 10:

window 0: AAAAACCCCC -> store hash
window 1: AAAACCCCCA -> roll hash
...
later window: AAAAACCCCC -> hash repeats

The repeated hash identifies a repeated fixed-length substring.

Dry Run: Palindrome Center

String:

babad

Center at a:

left b, center a, right b -> bab
next expansion fails

Candidate:

bab

Another center gives aba; both length 3 answers are valid.