Press "Enter" to skip to content

花花酱 LeetCode 187. Repeated DNA Sequences

The DNA sequence is composed of a series of nucleotides abbreviated as 'A''C''G', and 'T'.

  • For example, "ACGAATTCCG" is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Example 1:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]

Example 2:

Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]

Constraints:

  • 1 <= s.length <= 105
  • s[i] is either 'A''C''G', or 'T'.

Solution: Hashtable

Store each subsequence into the hashtable, add it into the answer array when it appears for the second time.

Time complexity: O(n*l)
Space complexity: O(n*l) -> O(n) / string_view

C++

Optimization

There are 4 type of letters, each can be encoded into 2 bits. We can represent the 10-letter-long string using 20 lowest bit of a int32. We can use int as key for the hashtable.

A -> 00
C -> 01
G -> 10
T -> 11

Time complexity: O(n)
Space complexity: O(n)

C++

请尊重作者的劳动成果,转载请注明出处!花花保留对文章/视频的所有权利。
如果您喜欢这篇文章/视频,欢迎您捐赠花花。
If you like my articles / videos, donations are welcome.

Buy anything from Amazon to support our website
您可以通过在亚马逊上购物(任意商品)来支持我们

Paypal
Venmo
huahualeetcode
微信打赏

Be First to Comment

Leave a Reply