001 Notes

String Structure and Matching Professional Playbook

Overview and Reading Path

String structure problems become efficient when repeated character comparisons are replaced with reusable memory.

Use this family when the problem asks about:

finding patterns
prefix/suffix borders
repeated substrings
dictionary prefixes
many patterns at once
palindromes
subsequences and edit distance
suffixes, duplicate substrings, or substring fingerprints

Read in this order:

Identify the string relationship.
Decide whether matches are contiguous or gapped.
Choose the reusable memory structure.
Define the mismatch or transition rule.
Verify boundary handling.

Core Vocabulary

Term	Meaning
Pattern	String being searched for or matched
Text	String being scanned
Prefix	Beginning segment of a string
Suffix	Ending segment of a string
Border	A substring that is both prefix and suffix
Fingerprint	Hash or encoded representation of a substring
Trie node	A shared prefix state for dictionary words
Alignment state	Pair of positions in two strings

Beginner Mental Model

Think of searching a word in a book. The slow way restarts from scratch after every mismatch.

String structure matching asks:

What did the previous comparisons already prove?
Can I reuse that proof instead of comparing characters again?

Sometimes the reusable memory is a prefix table. Sometimes it is a hash, trie, palindrome radius, or DP table.

From Brute Force To This Pattern

Brute force compares characters again and again:

try a start position
compare characters
on mismatch, move one step
compare the same characters again

String structure matching changes the question:

What did the previous comparisons prove, and how can I reuse that proof?

The shift is from repeated character comparison to reusable string memory.

What Information Becomes Reusable

Reusable information depends on the string relationship:

KMP stores reusable prefix/suffix borders
Z stores prefix-match lengths at every position
rolling hash stores substring fingerprints
tries store shared dictionary prefixes
failure links store fallback states for many patterns
palindrome centers store mirror expansion facts
two-string DP stores alignment subproblems

What This Mental Model Prevents

This model prevents:

restarting a pattern match from scratch after every mismatch
confusing substring with subsequence
recomputing substring equality character by character
searching every dictionary word independently
using floating or ad hoc string keys when a structural index is needed
losing endpoint correctness in substring ranges

Small Visual Model

text:    a b a b a c
pattern: a b a b x
              ^
              mismatch

reuse border:
matched "abab" has border "ab"
continue from the border instead of restarting from zero

Core Mental Model

The central question is:

What string relationship can I remember so future comparisons do less work?

String structure problems are not solved by comparing characters harder. They are solved by naming a repeated relationship and storing enough memory to avoid restarting from scratch.

String algorithms are usually built from four pieces:

unit:
    character, substring, prefix, suffix, word, or alignment state

memory:
    border table, hash prefix, trie node, palindrome radius, DP cell, suffix rank

fallback or transition:
    what happens when the next character does or does not match

answer surface:
    index, count, length, reconstructed string, set of matches, or decision

How To Read This Family Visually

Use the same symbols throughout this section:

[]    active substring, matched prefix, trie path, DP cell, or suffix interval
{}    reusable memory: border, hash, trie node, radius, DP table, suffix order
^     current character, center, fallback point, or alignment state
-->   match extension, fallback, trie edge, or DP transition
x     mismatch, impossible alignment, or discarded comparison
==    proven equal string relation
...   unchecked text or remaining suffix

One-minute mental model:

see repeated string comparison
    -> ask what relationship can be remembered
    -> store border, hash, trie path, radius, DP cell, or suffix order
    -> extend on match and fall back/transition on mismatch
    -> invariant: memory represents the longest/exact reusable relation so far

Read every diagram as:

shape:
    what string unit is being compared?
memory:
    what relation is remembered from previous characters?
action:
    does the next character extend, fall back, or branch?
invariant:
    what equality or alignment fact remains true?

ASCII Visual Map

String structure should feel like saving a relationship so the next mismatch does not erase all previous work.

KMP border fallback:

pattern:  a b a b a c
text:     a b a b a x
                  ^
                  mismatch

matched prefix "ababa" has border "aba"

fallback:
    keep "aba" as useful matched work
    continue from that border length

rolling hash interval:

string:     a  b  c  d  e  f
prefix:  H0 H1 H2 H3 H4 H5 H6

hash [l, r):

         l        r
         v        v
         c   d   e

hash = H[r] - H[l] * base^(r-l)

trie shared prefixes:

          root
          /  \
         c    d
        /      \
       a        o
      / \        \
     r   t        g

"car" and "cat" reuse the path c -> a before splitting.

palindrome center expansion:

chars:    a   b   a   b   a
                  ^
                center

compare left/right mirrors:

        a   b   a   b   a
        ^       |       ^
        left  center  right

two-string DP grid:

             t index ->
          +---+---+---+---+
s index   |   |   |   |   |
   |      +---+---+---+---+
   v      |   |(i,j)      |
          +---+---+---+---+

Cell (i, j) means the answer for two prefixes or two suffixes, depending on
the chosen convention.

Unit of Comparison

Choose what one step compares.

Problem shape	Unit
exact pattern search	pattern character against text character
repeated substring	prefix/suffix border
substring equality	interval hash or suffix rank
dictionary matching	trie edge or word end
palindrome	mirrored character pair around a center
edit distance	`(i, j)` alignment state
subsequence matching	next character position with gaps allowed
suffix indexing	suffix order and common prefix length

Problem Shape Diagrams

exact pattern search:

text:     a b c a b c a b
pattern:      a b c
              ^ ^ ^

unit = compare pattern[j] with text[i]
invariant: matched prefix equals the current text suffix

repeated substring:

string:  a b a b a b
prefix:  a b a b
suffix:      a b a b

border = prefix that is also suffix
invariant: border can be reused after a mismatch

substring equality:

string:  a b c d a b c x
         [a b c]   [a b c]
          l1 r1     l2 r2

unit = interval hash or suffix-rank evidence
invariant: equal interval evidence implies equal substring under the model

dictionary matching:

          root
           |
           c
          / \
         a   o
        /     \
       t       d

unit = trie edge or terminal word marker
invariant: current trie path equals the scanned prefix

palindrome:

chars:  r a c e c a r
              ^
            center
        ^           ^
       mirror pair compare
invariant: all matched mirror pairs around the center are equal

edit distance:

          target index ->
        +---+---+---+
source  |   |   |   |
index   +---+---+---+
  |     |   | i,j   |
  v     +---+---+---+

unit = alignment state (i, j)
invariant: cell (i, j) owns the answer for chosen prefixes or suffixes

subsequence matching:

text:    a x b y c z d
pattern: a   b   c
         ^   ^   ^

Gaps are allowed; order must be preserved.
invariant: matched pattern characters appear in increasing text positions

suffix indexing:

string: banana

suffixes:
0 banana
1 anana
2 nana
3 ana
4 na
5 a

sorted suffix order exposes common prefixes and duplicates.
invariant: adjacent sorted suffixes reveal reusable common-prefix structure

The unit decides the state. A substring problem can be about ranges, hashes, suffixes, or DP intervals depending on what relationship is reused.

Reusable Memory

The memory structure stores a proof about earlier characters.

Strategy	Reusable memory
KMP	longest proper prefix that is also suffix for each pattern prefix
Z algorithm	longest match from current position to global prefix
rolling hash	numeric fingerprint of every prefix
trie	shared prefix path across many words
Aho-Corasick	trie plus failure links to next valid border
palindrome expansion	center and radius or DP interval truth
two-string DP	best answer for prefixes or suffixes `(i, j)`
suffix indexing	sorted suffixes, LCPs, or hash+binary-search evidence

Good memory answers the next comparison question directly. Bad memory stores raw substrings repeatedly and recreates the same comparisons.

Fallback or Transition

The fallback rule is what makes the algorithm professional.

match:
    extend the current relationship

mismatch:
    reuse the largest still-valid relationship instead of restarting blindly

complete match:
    record an answer and fall back to allow overlapping matches

For KMP, fallback follows the prefix table. For Aho-Corasick, failure links move to the longest suffix that is also a trie prefix. For edit DP, mismatch chooses insert, delete, or replace transitions.

Equality Representation

String equality can be represented in several ways.

Need	Representation
exact online pattern match	KMP or Z
many substring equality queries	rolling hash or suffix structure
many dictionary prefixes	trie
many patterns in one scan	Aho-Corasick
mirror equality	palindrome center/radius
gapped equality	subsequence or edit DP

Rolling hash is a probabilistic shortcut unless collision handling or double hashing is used. KMP/Z are exact but specialized for prefix-style reuse.

Endpoint and Interval Meaning

String bugs often come from unclear interval ownership.

prefix length k:
    characters [0, k)

substring start, length:
    characters [start, start + length)

DP state (i, j):
    usually prefixes s[0:i], t[0:j] or suffixes s[i:], t[j:]

State the convention before mixing hashes, DP tables, or substring boundaries.

Universal String Structure Template

identify the string unit and match relation
choose exact reusable memory for that relation
build or update the memory with a clear endpoint convention

scan, query, or transition through the string:
    compare the current unit
    extend on match
    fall back, branch, or update DP on mismatch
    record answer when the memory proves the requested relation

return index, count, length, reconstruction, or decision

Strategy-Specific Mental Model

Strategy	Mental model	Main risk
KMP/Z	prefix relationships prevent restart	wrong fallback after full match
rolling hash	substring equality becomes prefix arithmetic	collision or modulo mistakes
trie	words share prefix paths	forgetting terminal word state
Aho-Corasick	many patterns share trie and fallback links	missing output links
palindrome	mirror equality expands around centers	even/odd center mismatch
two-string DP	prefixes/suffixes encode alignment choices	wrong base row/column
suffix indexing	suffix order exposes duplicate/distinct substrings	unclear LCP semantics

Mini Translation Table

Statement signal	Likely memory
”find all occurrences of pattern”	KMP/Z
”many words with shared prefixes”	trie
”many patterns in one text”	Aho-Corasick
”repeated/duplicate substring”	suffix structure or rolling hash
”palindromic substring/subsequence”	center/radius or interval DP
”minimum edits”	two-string DP
”subsequence count”	alignment DP or next-position table

Mini Traces

Trace 1: KMP fallback.

matched:
pattern: a b a b a c
text:    a b a b a x
                 ^
                 mismatch

step 1:
    matched length = 5 ("ababa")
    invariant: first 5 pattern chars matched text suffix

step 2:
    fallback to border length 3 ("aba")
    invariant: "aba" is still a valid matched suffix, so work is reused

Trace 2: trie dictionary scan.

words:
cat, car, dog

scan prefix "ca":
    root -> c -> a

step:
    next char t reaches terminal "cat"

invariant:
    current trie node represents exactly the scanned prefix path

Contrast Diagrams

Trie vs KMP vs rolling hash:

trie:
    many dictionary words share prefix paths
    invariant: node path equals a word prefix

KMP:
    one pattern reuses its own borders during text scan
    invariant: border length is the longest reusable matched prefix

rolling hash:
    many substring equality checks become numeric interval checks
    invariant: hash interval represents substring content under the hash model

Palindrome DP vs expand-around-center:

expand center:
    left <- center -> right
    invariant: mirrored chars matched for this center radius

interval DP:
    dp[l][r] depends on dp[l+1][r-1]
    invariant: cell records whether substring [l, r] is palindromic

Misclassification Checks

Use this family when repeated textual relationships are the bottleneck. Be careful when:

Symptom	Better family may be
string is only scanned with counts	Sequential Aggregation
problem is mostly graph reachability	Connectivity Discovery
choices over string cuts repeat	State Space Exploration
range queries over character counts dominate	Range Query & Indexed Aggregation
numeric parsing/formula is the core	Mathematical Numeric Reasoning

The clean test is: can you name the reusable string relationship that prevents the next comparison from restarting?

What This Family Is / Is Not

Use this family when repeated string relationships are the bottleneck.

Do not use it when:

the problem only needs a simple frequency count
the answer is about arbitrary graph reachability
the main relation is numeric rather than textual
the string is just incidental input for a different pattern

Universal Template

identify the string unit
define what it means to match
build reusable memory
scan or query using that memory
apply the mismatch or transition rule
return the required match, count, length, or decision

Detailed Translation Workflow

Translate every string-structure problem in this order:

1. Is the match contiguous or gapped?
2. What exact relationship repeats?
3. What unit is being compared?
4. What memory avoids repeated comparison?
5. What is the mismatch or transition rule?
6. What endpoint convention prevents off-by-one mistakes?
7. What does the answer ask for: index, count, length, boolean, or construction?

If the match is gapped, think subsequence/DP before substring algorithms. If the match is contiguous and repeated many times, think borders, hashes, tries, or suffix indexing.

Strategy Map

Strategy	Use when	Reusable memory
KMP / Z	prefix-pattern reuse matters	border table or Z-box
Rolling hash	many substring equality checks	prefix hash arrays
Trie	many words share prefixes	prefix tree nodes
Aho-Corasick / multi-pattern	many patterns scan one text	trie plus failure links
Palindrome structure	mirror equality matters	centers, radii, intervals
Subsequence/edit matching	gapped alignment matters	`(i, j)` DP state
Suffix indexing	duplicate/distinct substring structure matters	suffix order, LCP, hash+binary search

Problem Translation Table

Problem type	Unit / state	Reuse rule	Engine
Find First Occurrence	pattern index	KMP border fallback	KMP
Repeated Substring Pattern	prefix/suffix border	longest border test	KMP
Shortest Palindrome	prefix palindrome border	KMP on combined string	KMP
Sum of Scores of Built Strings	prefix match length	Z-box reuse	Z algorithm
Repeated DNA Sequences	fixed substring	rolling hash / encoded window	rolling hash
Longest Duplicate Substring	substring length	hash equality plus binary search	rolling hash
Implement Trie	node per character	shared prefix path	trie
Word Break	string index	dictionary prefix transitions	trie or DP
Word Search II	grid path and trie node	prune missing prefixes	trie + backtracking
Aho-Corasick Search	automaton state	failure links	multi-pattern matching
Longest Palindromic Substring	center or interval	expand/mirror	palindrome
Palindromic Substrings	center	expand around center	palindrome
Longest Palindromic Subsequence	`i`, `j`	shrink interval	DP
Longest Common Subsequence	`i`, `j`	advance one or both strings	two-string DP
Edit Distance	`i`, `j`	insert/delete/replace	two-string DP
Distinct Subsequences	`i`, `j`	match or skip source char	DP
Suffix Array Search	suffix rank	neighboring suffix comparisons	suffix indexing

Canonical Examples

KMP: Pattern Search

Invariant:

lps[i] is the length of the longest proper prefix that is also a suffix ending
at pattern[i].

Skeleton:

def build_lps(pattern):
    lps = [0] * len(pattern)
    j = 0
    for i in range(1, len(pattern)):
        while j > 0 and pattern[i] != pattern[j]:
            j = lps[j - 1]
        if pattern[i] == pattern[j]:
            j += 1
            lps[i] = j
    return lps

Rolling Hash: Repeated DNA

Invariant:

The rolling window fingerprint represents the current fixed-length substring.

Use direct encoding for small alphabets when possible. It avoids collision risk from modular hashing.

Trie: Prefix Search

Invariant:

Every path from root to a node represents exactly one prefix.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.end = False

Palindrome: Expand Around Center

Invariant:

While left and right match, s[left:right + 1] is a palindrome.

def longest_palindrome(s):
    best = ""

    def expand(left, right):
        nonlocal best
        while left >= 0 and right < len(s) and s[left] == s[right]:
            left -= 1
            right += 1
        candidate = s[left + 1:right]
        if len(candidate) > len(best):
            best = candidate

    for i in range(len(s)):
        expand(i, i)
        expand(i, i + 1)

    return best

Edit Distance: Two-String DP

Invariant:

dp[i][j] is the minimum edits to convert the first i characters of A into the
first j characters of B.

Decision Rules

Use KMP/Z when mismatch reuse in a single pattern matters. Use rolling hash when many substring equality checks are needed. Use a trie when many words share prefixes or prefix queries are frequent. Use multi-pattern automata when one text must be scanned against many patterns. Use palindrome center/interval logic when mirror symmetry is the structure. Use two-string DP when matching is gapped or edit-like. Use suffix indexing when the problem is about duplicate, distinct, or ordered substrings at scale.

Common Failure Modes

Restarting pattern search from scratch after a mismatch.
Mixing inclusive and half-open substring endpoints.
Ignoring rolling-hash collision risk.
Forgetting terminal markers in tries.
Not pruning trie branches in grid word search.
Treating subsequence matching as substring matching.
Using center expansion for a subsequence problem.

Performance Notes

KMP/Z are linear. Rolling hash is usually linear for fixed windows and O(n log n) when paired with binary search on length. Trie memory can be large because each node stores children. Two-string DP is usually O(mn) and can often compress rows. Suffix arrays are powerful but implementation-heavy.

Cheat Sheet

Pattern search:
    What can I reuse after mismatch?

Hashing:
    What substring equality checks repeat?

Trie:
    Which prefixes are shared?

Palindrome:
    What mirrors what?

Two-string DP:
    What do i and j mean?

Suffix indexing:
    Do I need global substring order or duplicates?

Extended Translation Notes

String structure problems should start with the relationship being reused.

Step 1: Identify Match Type

Match type	Meaning	Typical tools
exact substring	contiguous pattern occurrence	KMP, Z, rolling hash
prefix/suffix border	prefix equals suffix	KMP, Z
dictionary prefix	many words share starts	trie
many patterns	many words against one text	trie, Aho-Corasick
palindrome	mirrored equality	center expansion, DP, Manacher
subsequence	ordered but not contiguous	two-pointer, DP
edit alignment	insert/delete/replace	two-string DP
duplicate substring	repeated contiguous block	rolling hash, suffix array

Do not choose an algorithm before deciding whether the match is contiguous.

Step 2: Define the Unit

The unit may be:

character
substring
prefix
suffix
dictionary word
pattern state
palindrome center
two-string index pair
suffix rank

Step 3: Define Reuse After Mismatch

This is the heart of string matching.

Examples:

KMP:
    fallback to the longest border

Z algorithm:
    reuse the current prefix-match box

Trie:
    stop when a prefix edge is missing

Aho-Corasick:
    follow failure links

Edit distance:
    try insert, delete, replace

Step 4: Choose Endpoint Convention

Use half-open ranges when possible:

s[left:right]
left included
right excluded

This reduces off-by-one bugs in substring DP and hashing.

Strategy Invariants

KMP

Invariant:

j is the number of pattern characters currently matched.

On mismatch, KMP does not reset to zero unless no border remains.

Z Algorithm

Invariant:

The Z-box [left, right] is the farthest known interval matching the prefix.

Z is often better when every position needs a prefix-match length.

Rolling Hash

Invariant:

hash(left, right) is computed from prefix hashes in O(1).

Use double hashing or direct encoding when collision risk matters.

Trie

Invariant:

Every node represents the prefix formed by the path from root.

Terminal markers distinguish complete words from prefixes.

Palindrome

Invariant:

Expansion is valid while left and right characters mirror each other.

Interval DP uses:

s[i] == s[j] and inside interval is palindrome

Two-String DP

Invariant:

dp[i][j] describes the relationship between prefixes or suffixes of two
strings.

Define whether i and j mean prefix lengths or current suffix indexes before coding.

Worked Problem Snapshots

Repeated Substring Pattern

State:

longest border length of the whole string

Rule:

if n % (n - border) == 0, the string repeats a smaller unit

Engine:

KMP prefix table

Shortest Palindrome

Goal:

find longest palindromic prefix

Technique:

run KMP-style border on s + "#" + reversed(s)

Trap:

Use a separator that cannot appear as a normal bridge between the two strings.

Sum of Scores of Built Strings

State:

Z[i] = prefix match length starting at i

Answer:

sum of all Z values plus full string length

Engine:

Z algorithm

Longest Duplicate Substring

State:

candidate length

Check:

does any substring of this length have a repeated hash?

Engine:

binary search on length plus rolling hash

Word Search II

State:

grid position, trie node, visited path

Invariant:

If the current trie node has no edge for a character, no word under that prefix
can be found on this branch.

Engine:

trie plus backtracking

Longest Palindromic Subsequence

State:

i, j interval

Transition:

if s[i] == s[j]: 2 + solve(i + 1, j - 1)
else max(solve(i + 1, j), solve(i, j - 1))

Engine:

interval DP

Distinct Subsequences

State:

i in source, j in target

Choices:

skip source[i]
use source[i] if it matches target[j]

Engine:

two-string counting DP

Longest Common Substring

State:

dp[i][j] = length of common suffix ending at i and j

Trap:

Substring resets to zero on mismatch; subsequence does not.

Dry Runs

Dry Run: KMP Fallback

Pattern:

ababaca

When matching ababa and seeing a mismatch after length 5:

matched prefix = ababa
fallback length = lps[4] = 3
next comparison resumes as if aba is already matched

KMP reuses the border instead of restarting at the next text position.

Dry Run: Rolling Hash Window

String:

AAAAACCCCCAAAAA

For fixed length 10:

window 0: AAAAACCCCC -> store hash
window 1: AAAACCCCCA -> roll hash
...
later window: AAAAACCCCC -> hash repeats

The repeated hash identifies a repeated fixed-length substring.

Dry Run: Palindrome Center

String:

babad

Center at a:

left b, center a, right b -> bab
next expansion fails

Candidate:

bab

Another center gives aba; both length 3 answers are valid.