{"id":8905,"date":"2021-11-28T20:54:17","date_gmt":"2021-11-29T04:54:17","guid":{"rendered":"https:\/\/zxi.mytechroad.com\/blog\/?p=8905"},"modified":"2021-11-28T20:55:29","modified_gmt":"2021-11-29T04:55:29","slug":"leetcode-187-repeated-dna-sequences","status":"publish","type":"post","link":"https:\/\/zxi.mytechroad.com\/blog\/hashtable\/leetcode-187-repeated-dna-sequences\/","title":{"rendered":"\u82b1\u82b1\u9171 LeetCode 187. Repeated DNA Sequences"},"content":{"rendered":"\n<p>The&nbsp;<strong>DNA sequence<\/strong>&nbsp;is composed of a series of nucleotides abbreviated as&nbsp;<code>'A'<\/code>,&nbsp;<code>'C'<\/code>,&nbsp;<code>'G'<\/code>, and&nbsp;<code>'T'<\/code>.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>For example,&nbsp;<code>\"ACGAATTCCG\"<\/code>&nbsp;is a&nbsp;<strong>DNA sequence<\/strong>.<\/li><\/ul>\n\n\n\n<p>When studying&nbsp;<strong>DNA<\/strong>, it is useful to identify repeated sequences within the DNA.<\/p>\n\n\n\n<p>Given a string&nbsp;<code>s<\/code>&nbsp;that represents a&nbsp;<strong>DNA sequence<\/strong>, return all the&nbsp;<strong><code>10<\/code>-letter-long<\/strong>&nbsp;sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in&nbsp;<strong>any order<\/strong>.<\/p>\n\n\n\n<p><strong>Example 1:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted;crayon:false\"><strong>Input:<\/strong> s = \"AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT\"\n<strong>Output:<\/strong> [\"AAAAACCCCC\",\"CCCCCAAAAA\"]\n<\/pre>\n\n\n\n<p><strong>Example 2:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted;crayon:false\"><strong>Input:<\/strong> s = \"AAAAAAAAAAAAA\"\n<strong>Output:<\/strong> [\"AAAAAAAAAA\"]\n<\/pre>\n\n\n\n<p><strong>Constraints:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>1 &lt;= s.length &lt;= 10<sup>5<\/sup><\/code><\/li><li><code>s[i]<\/code>&nbsp;is either&nbsp;<code>'A'<\/code>,&nbsp;<code>'C'<\/code>,&nbsp;<code>'G'<\/code>, or&nbsp;<code>'T'<\/code>.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Solution: Hashtable<\/strong><\/h2>\n\n\n\n<p>Store each subsequence into the hashtable, add it into the answer array when it appears for the second time.<\/p>\n\n\n\n<p>Time complexity: O(n*l)<br>Space complexity: O(n*l) -> O(n) \/ string_view<\/p>\n\n\n\n<div class=\"responsive-tabs\">\n<h2 class=\"tabtitle\">C++<\/h2>\n<div class=\"tabcontent\">\n\n<pre lang=\"c++\">\n\/\/ Author: Huahua\nclass Solution {\npublic:\n  vector<string> findRepeatedDnaSequences(string_view s) {\n    constexpr int kLen = 10;    \n    const int n = s.length();\n    unordered_map<string_view, int> m;   \n    vector<string> ans;\n    for (int i = 0; i + kLen <= n; ++i)\n      if (++m[s.substr(i, kLen)] == 2)\n        ans.emplace_back(s.substr(i, kLen));\n    return ans;\n  }\n};\n<\/pre>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Optimization<\/strong><\/h2>\n\n\n\n<p>There are 4 type of letters, each can be encoded into 2 bits. We can represent the 10-letter-long string using 20 lowest bit of a int32. We can use int as key for the hashtable.<\/p>\n\n\n\n<p>A -&gt; 00<br>C -&gt; 01<br>G -&gt; 10<br>T -&gt; 11<\/p>\n\n\n\n<p>Time complexity: O(n)<br>Space complexity: O(n)<br><\/p>\n\n\n\n<div class=\"responsive-tabs\">\n<h2 class=\"tabtitle\">C++<\/h2>\n<div class=\"tabcontent\">\n\n<pre lang=\"c++\">\n\/\/ Author: Huahua\nclass Solution {\npublic:\n  vector<string> findRepeatedDnaSequences(string s) {\n    constexpr int kLen = 10;\n    constexpr int mask = (1 << (2 * kLen)) -1;\n    const int n = s.length();\n    unordered_map<int, int> m;\n\n    array<int, 128> km;\n    km['A'] = 0;\n    km['C'] = 1;\n    km['G'] = 2;\n    km['T'] = 3;\n    \n    vector<string> ans;\n    for(int i = 0, key = 0; i < n; ++i) {\n      key = ((key << 2) &#038; mask) | km[s[i]];\n      if (i < kLen - 1) continue;      \n      if (++m[key] == 2)\n        ans.push_back(s.substr(i - kLen + 1, kLen));\n    }\n    return ans;\n  }\n};\n<\/pre>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>The&nbsp;DNA sequence&nbsp;is composed of a series of nucleotides abbreviated as&nbsp;&#8216;A&#8217;,&nbsp;&#8216;C&#8217;,&nbsp;&#8216;G&#8217;, and&nbsp;&#8216;T&#8217;. For example,&nbsp;&#8220;ACGAATTCCG&#8221;&nbsp;is a&nbsp;DNA sequence. When studying&nbsp;DNA, it is useful to identify repeated sequences within&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[70],"tags":[272,82,177,4],"class_list":["post-8905","post","type-post","status-publish","format-standard","hentry","category-hashtable","tag-encoding","tag-hashtable","tag-medium","tag-string","entry","simple"],"_links":{"self":[{"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/posts\/8905","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/comments?post=8905"}],"version-history":[{"count":2,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/posts\/8905\/revisions"}],"predecessor-version":[{"id":8907,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/posts\/8905\/revisions\/8907"}],"wp:attachment":[{"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/media?parent=8905"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/categories?post=8905"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/tags?post=8905"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}