Press "Enter" to skip to content

Posts tagged as “substring”

花花酱 LeetCode 1520. Maximum Number of Non-Overlapping Substrings

Given a string s of lowercase letters, you need to find the maximum number of non-empty substrings of s that meet the following conditions:

  1. The substrings do not overlap, that is for any two substrings s[i..j] and s[k..l], either j < k or i > l is true.
  2. A substring that contains a certain character c must also contain all occurrences of c.

Find the maximum number of substrings that meet the above conditions. If there are multiple solutions with the same number of substrings, return the one with minimum total length. It can be shown that there exists a unique solution of minimum total length.

Notice that you can return the substrings in any order.

Example 1:

Input: s = "adefaddaccc"
Output: ["e","f","ccc"]
Explanation: The following are all the possible substrings that meet the conditions:
[
  "adefaddaccc"
  "adefadda",
  "ef",
  "e",
  "f",
  "ccc",
]
If we choose the first string, we cannot choose anything else and we'd get only 1. If we choose "adefadda", we are left with "ccc" which is the only one that doesn't overlap, thus obtaining 2 substrings. Notice also, that it's not optimal to choose "ef" since it can be split into two. Therefore, the optimal way is to choose ["e","f","ccc"] which gives us 3 substrings. No other solution of the same number of substrings exist.

Example 2:

Input: s = "abbaccd"
Output: ["d","bb","cc"]
Explanation: Notice that while the set of substrings ["d","abba","cc"] also has length 3, it's considered incorrect since it has larger total length.

Constraints:

  • 1 <= s.length <= 10^5
  • s contains only lowercase English letters.

Solution: Greedy

Observation: If a valid substring contains shorter valid strings, ignore the longer one and use the shorter one.
e.g. “abbeefba” is a valid substring, however, it includes “bbeefb”, “ee”, “f” three valid substrings, thus it won’t be part of the optimal solution, since we can always choose a shorter one, with potential to have one or more non-overlapping substrings. For “bbeefb”, again it includes “ee” and “f”, so it won’t be optimal either. Thus, the optimal ones are “ee” and “f”.

  1. We just need to record the first and last occurrence of each character
  2. When we meet a character for the first time we must include everything from current pos to it’s last position. e.g. “abbeefba” | ccc, from first ‘a’ to last ‘a’, we need to cover “abbeefba”
  3. If any character in that range has larger end position, we must extend the string. e.g. “abcabbcc” | efg, from first ‘a’ to last ‘a’, we have characters ‘b’ and ‘c’, so we have to extend the string to cover all ‘b’s and ‘c’s. Our first valid substring extended from “abca” to “abcabbcc”.
  4. If any character in the covered range has a smallest first occurrence, then it’s an invalid substring. e.g. ab | “cbc”, from first ‘c’ to last ‘c’, we have ‘b’, but ‘b’ is not fully covered, thus “cbc” is an invalid substring.
  5. For the first valid substring, we append it to the ans array. “abbeefba” => ans = [“abbeefba”]
  6. If we find a shorter substring that is full covered by the previous valid substring, we replace that substring with the shorter one. e.g.
    “abbeefba” | ccc => ans = [“abbeefba”]
    abbeefba” | ccc => ans = [“bbeefb”]
    “abbeefba” | ccc => ans = [“ee”]
  7. If the current substring does not overlap with previous one, append it to ans array.
    “abbeefba” | ccc => ans = [“ee”]
    “abbeefba” | ccc => ans = [“ee”, “f”]
    “abbeefbaccc” => ans = [“ee”, “f”, “ccc”]

Time complexity: O(n)
Space complexity: O(1)

C++

花花酱 LeetCode 1461. Check If a String Contains All Binary Codes of Size K

Given a binary string s and an integer k.

Return True if any binary code of length k is a substring of s. Otherwise, return False.

Example 1:

Input: s = "00110110", k = 2
Output: true
Explanation: The binary codes of length 2 are "00", "01", "10" and "11". They can be all found as substrings at indicies 0, 1, 3 and 2 respectively.

Example 2:

Input: s = "00110", k = 2
Output: true

Example 3:

Input: s = "0110", k = 1
Output: true
Explanation: The binary codes of length 1 are "0" and "1", it is clear that both exist as a substring. 

Example 4:

Input: s = "0110", k = 2
Output: false
Explanation: The binary code "00" is of length 2 and doesn't exist in the array.

Example 5:

Input: s = "0000000001011100", k = 4
Output: false

Constraints:

  • 1 <= s.length <= 5 * 10^5
  • s consists of 0’s and 1’s only.
  • 1 <= k <= 20

Solution: Hashtable

Insert all possible substrings into a hashtable, the size of the hashtable should be 2^k.

Time complexity: O(n*k)
Space complexity: O(2^k*k) -> O(2^k)

std::string_view: 484 ms, 40.1MB
std::string 644 ms, 58.6MB

C++

花花酱 LeetCode 87. Scramble String

Given a string s1, we may represent it as a binary tree by partitioning it to two non-empty substrings recursively.

Below is one possible representation of s1 = "great":

    great
   /    \
  gr    eat
 / \    /  \
g   r  e   at
           / \
          a   t

To scramble the string, we may choose any non-leaf node and swap its two children.

For example, if we choose the node "gr" and swap its two children, it produces a scrambled string "rgeat".

    rgeat
   /    \
  rg    eat
 / \    /  \
r   g  e   at
           / \
          a   t

We say that "rgeat" is a scrambled string of "great".

Similarly, if we continue to swap the children of nodes "eat" and "at", it produces a scrambled string "rgtae".

    rgtae
   /    \
  rg    tae
 / \    /  \
r   g  ta  e
       / \
      t   a

We say that "rgtae" is a scrambled string of "great".

Given two strings s1 and s2 of the same length, determine if s2 is a scrambled string of s1.

Example 1:

Input: s1 = "great", s2 = "rgeat"
Output: true

Example 2:

Input: s1 = "abcde", s2 = "caebd"
Output: false

Solution: Recursion

isScramble(s1, s2)
if s1 == s2: return true
if sorted(s1) != sroted(s2): return false
We try all possible partitions:

  1. s1[0:l] v.s s2[0:l] && s1[l:] vs s2[l:]
  2. s1[0:l] vs s2[L-l:l] && s1[l:] vs s2[0:L-l]

Time complexity: O(n^5)
Space complexity: O(n^4)

C++

Python3

花花酱 LeetCode 1371. Find the Longest Substring Containing Vowels in Even Counts

Given the string s, return the size of the longest substring containing each vowel an even number of times. That is, ‘a’, ‘e’, ‘i’, ‘o’, and ‘u’ must appear an even number of times.

Example 1:

Input: s = "eleetminicoworoep"
Output: 13
Explanation: The longest substring is "leetminicowor" which contains two each of the vowels: e, i and o and zero of the vowels: a and u.

Example 2:

Input: s = "leetcodeisgreat"
Output: 5
Explanation: The longest substring is "leetc" which contains two e's.

Example 3:

Input: s = "bcbcbc"
Output: 6
Explanation: In this case, the given string "bcbcbc" is the longest because all vowels: a, e, i, o and u appear zero times.

Constraints:

  • 1 <= s.length <= 5 x 10^5
  • s contains only lowercase English letters.

Solution: HashTable

Record the first index when a state occurs. index – last_index is the length of the all-even-vowel substring.

State: {a: odd|even, e: odd|even, …, u:odd|even}.

There are total 2^5 = 32 states that can be represented as a binary string.

whenever a vowel occurs, we flip the bit, e.g. odd->even, even->odd using XOR.

Time complexity: O(5*n)
Space complexity: O(32)

C++

Python3

花花酱 1358. Number of Substrings Containing All Three Characters

Given a string s consisting only of characters ab and c.

Return the number of substrings containing at least one occurrence of all these characters ab and c.

Example 1:

Input: s = "abcabc"
Output: 10
Explanation: The substrings containing at least one occurrence of the characters ab and c are "abc", "abca", "abcab", "abcabc", "bca", "bcab", "bcabc", "cab", "cabc" and "abc" (again). 

Example 2:

Input: s = "aaacb"
Output: 3
Explanation: The substrings containing at least one occurrence of the characters ab and c are "aaacb", "aacb" and "acb".

Example 3:

Input: s = "abc"
Output: 1

Constraints:

  • 3 <= s.length <= 5 x 10^4
  • s only consists of ab or characters.

Solution

Record the last index of each character.

At each index i, we can choose any index j that j <= min(last_a, last_b, last_c) as the starting point, and there will be min(last_a, last_b, last_c) + 1 valid substrings.

e.g. aabbabcc…
last_a = 4
last_b = 5
last_c = 7
min(last_a, last_b, last_c) = 4
aabba | bcc
We can choose any char with index <= 4 as string point, there are 5 of them:
aabbabcc
abbabcc
bbabcc
babcc
abcc

Time complexity: O(n)
Space complexity: O(1)

C++

Python3