Leetcode 30: Substring with Concatenation of All Words

1. Problem Statement (Simple Explanation):

You are given:

A string s.
An array of strings words, where:

All words[i] have the same length.

A concatenated string in this problem is:

A string formed by concatenating all the words in words in any order, using each word exactly as many times as it appears in words (i.e., including duplicates).

You must:

Find all starting indices i in s such that the substring:

s[i .. i + totalLen - 1]

is a concatenated string of all words (in some permutation).

Return the list of indices (order doesn’t matter).

Where:

wordLen = length(words[0])
wordCount = len(words)
totalLen = wordLen * wordCount

2. Examples:

Example 1:

Input:

s = "barfoothefoobarman"

words = ["foo","bar"]

wordLen = 3
wordCount = 2
totalLen = 6

Valid substrings:

Starting at index 0: "barfoo" → "bar" + "foo" (a permutation of ["foo","bar"])
Starting at index 9: "foobar" → "foo" + "bar"

Output: [0, 9]

Example 2:

Input:

s = "wordgoodgoodgoodbestword"

words = ["word","good","best","word"]

No substring is a concatenation of all words (with required counts).

Output: []

Example 3:

Input:

s = "barfoofoobarthefoobarman"

words = ["bar","foo","the"]

Valid substrings (length 3 words × 3 = 9):

i = 6: "foobarthe" → ["foo","bar","the"]
i = 9: "barthefoo" → ["bar","the","foo"]
i = 12: "thefoobar" → ["the","foo","bar"]

Output: [6, 9, 12]

Constraints:

1 <= |s| <= 104
1 <= words.length <=5000
1 <= |words[i]| <= 30
All words[i] have the same length.
s and words[i] contain only lowercase English letters.

3. Approach – Sliding Window with Word Frequency (O(n · wordLen)):

This is a classic hard string + hashmap + sliding window problem.

Key Observations:

All words have the same length L = wordLen.
We are looking for substrings of fixed length totalLen = L * wordCount.
Each valid substring can be thought of as wordCount consecutive "blocks" of size L:

word1 = s[i .. i+L-1]
word2 = s[i+L .. i+2L-1]
...

The order of words does not matter, only their frequency matters.

Thus, for each candidate starting index i, we need to check if the wordCount blocks match words in terms of word counts.

Brute force would be O(n.wordCount) per starting index ⇒ too slow.

Instead, we use a sliding window aligned by word boundaries with hashing.

High-level Strategy:

Precompute a frequency map targetCount from words:

targetCount[word] = number of times word appears in words.

Because words are length L, valid window starts are aligned at offsets:

offset = 0, 1, 2, ..., L-1
We use L separate sliding windows, each stepping by L.

For each offset:

Initialize:

left = offset
currentCount = {} (map from word → count in current window)
countWords = 0 (number of valid words in the current window)

For right from offset to len(s)-L in steps of L:

Extract word = s[right .. right+L-1].
If word is in targetCount:

Add it to currentCount[word].
Increment countWords.
If currentCount[word] > targetCount[word]:

Window has too many of this word.
Shrink window from the left:

While currentCount[word] > targetCount[word]:

leftWord = s[left .. left+L-1]
Decrement currentCount[leftWord]
left += L
countWords--

If countWords == wordCount:

We have a valid window of wordCount words.
Record left as an answer.
Then shrink left by 1 word to continue searching the next window:

leftWord = s[left .. left+L-1]
Decrement currentCount[leftWord]
left += L
countWords--

Else (word not in targetCount):

Reset window:

currentCount.clear()
countWords = 0
left = right + L (start after this junk word)

Why Offsets?

We must ensure that words are always taken in aligned chunks of length L. By iterating over starting positions offset in [0..L-1] and then stepping by L, we inspect all possible word-aligned substrings exactly once.

Pseudo-code (Sliding Window by Offset):