<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>encoding Archives - Huahua&#039;s Tech Road</title>
	<atom:link href="https://zxi.mytechroad.com/blog/tag/encoding/feed/" rel="self" type="application/rss+xml" />
	<link>https://zxi.mytechroad.com/blog/tag/encoding/</link>
	<description></description>
	<lastBuildDate>Mon, 29 Nov 2021 04:55:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.0.8</generator>

<image>
	<url>https://zxi.mytechroad.com/blog/wp-content/uploads/2017/09/cropped-photo-32x32.jpg</url>
	<title>encoding Archives - Huahua&#039;s Tech Road</title>
	<link>https://zxi.mytechroad.com/blog/tag/encoding/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>花花酱 LeetCode 187. Repeated DNA Sequences</title>
		<link>https://zxi.mytechroad.com/blog/hashtable/leetcode-187-repeated-dna-sequences/</link>
					<comments>https://zxi.mytechroad.com/blog/hashtable/leetcode-187-repeated-dna-sequences/#respond</comments>
		
		<dc:creator><![CDATA[zxi]]></dc:creator>
		<pubDate>Mon, 29 Nov 2021 04:54:17 +0000</pubDate>
				<category><![CDATA[Hashtable]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[hashtable]]></category>
		<category><![CDATA[medium]]></category>
		<category><![CDATA[string]]></category>
		<guid isPermaLink="false">https://zxi.mytechroad.com/blog/?p=8905</guid>

					<description><![CDATA[<p>The&#160;DNA sequence&#160;is composed of a series of nucleotides abbreviated as&#160;'A',&#160;'C',&#160;'G', and&#160;'T'. For example,&#160;"ACGAATTCCG"&#160;is a&#160;DNA sequence. When studying&#160;DNA, it is useful to identify repeated sequences within&#8230;</p>
<p>The post <a rel="nofollow" href="https://zxi.mytechroad.com/blog/hashtable/leetcode-187-repeated-dna-sequences/">花花酱 LeetCode 187. Repeated DNA Sequences</a> appeared first on <a rel="nofollow" href="https://zxi.mytechroad.com/blog">Huahua&#039;s Tech Road</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>The&nbsp;<strong>DNA sequence</strong>&nbsp;is composed of a series of nucleotides abbreviated as&nbsp;<code>'A'</code>,&nbsp;<code>'C'</code>,&nbsp;<code>'G'</code>, and&nbsp;<code>'T'</code>.</p>



<ul><li>For example,&nbsp;<code>"ACGAATTCCG"</code>&nbsp;is a&nbsp;<strong>DNA sequence</strong>.</li></ul>



<p>When studying&nbsp;<strong>DNA</strong>, it is useful to identify repeated sequences within the DNA.</p>



<p>Given a string&nbsp;<code>s</code>&nbsp;that represents a&nbsp;<strong>DNA sequence</strong>, return all the&nbsp;<strong><code>10</code>-letter-long</strong>&nbsp;sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in&nbsp;<strong>any order</strong>.</p>



<p><strong>Example 1:</strong></p>



<pre class="wp-block-preformatted;crayon:false"><strong>Input:</strong> s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
<strong>Output:</strong> ["AAAAACCCCC","CCCCCAAAAA"]
</pre>



<p><strong>Example 2:</strong></p>



<pre class="wp-block-preformatted;crayon:false"><strong>Input:</strong> s = "AAAAAAAAAAAAA"
<strong>Output:</strong> ["AAAAAAAAAA"]
</pre>



<p><strong>Constraints:</strong></p>



<ul><li><code>1 &lt;= s.length &lt;= 10<sup>5</sup></code></li><li><code>s[i]</code>&nbsp;is either&nbsp;<code>'A'</code>,&nbsp;<code>'C'</code>,&nbsp;<code>'G'</code>, or&nbsp;<code>'T'</code>.</li></ul>



<h2><strong>Solution: Hashtable</strong></h2>



<p>Store each subsequence into the hashtable, add it into the answer array when it appears for the second time.</p>



<p>Time complexity: O(n*l)<br>Space complexity: O(n*l) -> O(n) / string_view</p>



<div class="responsive-tabs">
<h2 class="tabtitle">C++</h2>
<div class="tabcontent">

<pre class="crayon-plain-tag">// Author: Huahua
class Solution {
public:
  vector&lt;string&gt; findRepeatedDnaSequences(string_view s) {
    constexpr int kLen = 10;    
    const int n = s.length();
    unordered_map&lt;string_view, int&gt; m;   
    vector&lt;string&gt; ans;
    for (int i = 0; i + kLen &lt;= n; ++i)
      if (++m[s.substr(i, kLen)] == 2)
        ans.emplace_back(s.substr(i, kLen));
    return ans;
  }
};</pre>
</div></div>



<h2><strong>Optimization</strong></h2>



<p>There are 4 type of letters, each can be encoded into 2 bits. We can represent the 10-letter-long string using 20 lowest bit of a int32. We can use int as key for the hashtable.</p>



<p>A -&gt; 00<br>C -&gt; 01<br>G -&gt; 10<br>T -&gt; 11</p>



<p>Time complexity: O(n)<br>Space complexity: O(n)<br></p>



<div class="responsive-tabs">
<h2 class="tabtitle">C++</h2>
<div class="tabcontent">

<pre class="crayon-plain-tag">// Author: Huahua
class Solution {
public:
  vector&lt;string&gt; findRepeatedDnaSequences(string s) {
    constexpr int kLen = 10;
    constexpr int mask = (1 &lt;&lt; (2 * kLen)) -1;
    const int n = s.length();
    unordered_map&lt;int, int&gt; m;

    array&lt;int, 128&gt; km;
    km['A'] = 0;
    km['C'] = 1;
    km['G'] = 2;
    km['T'] = 3;
    
    vector&lt;string&gt; ans;
    for(int i = 0, key = 0; i &lt; n; ++i) {
      key = ((key &lt;&lt; 2) &amp; mask) | km[s[i]];
      if (i &lt; kLen - 1) continue;      
      if (++m[key] == 2)
        ans.push_back(s.substr(i - kLen + 1, kLen));
    }
    return ans;
  }
};</pre>
</div></div>
<p>The post <a rel="nofollow" href="https://zxi.mytechroad.com/blog/hashtable/leetcode-187-repeated-dna-sequences/">花花酱 LeetCode 187. Repeated DNA Sequences</a> appeared first on <a rel="nofollow" href="https://zxi.mytechroad.com/blog">Huahua&#039;s Tech Road</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://zxi.mytechroad.com/blog/hashtable/leetcode-187-repeated-dna-sequences/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>花花酱 LeetCode 1531. String Compression II</title>
		<link>https://zxi.mytechroad.com/blog/dynamic-programming/leetcode-1531-string-compression-ii/</link>
					<comments>https://zxi.mytechroad.com/blog/dynamic-programming/leetcode-1531-string-compression-ii/#respond</comments>
		
		<dc:creator><![CDATA[zxi]]></dc:creator>
		<pubDate>Sun, 26 Jul 2020 20:01:53 +0000</pubDate>
				<category><![CDATA[Dynamic Programming]]></category>
		<category><![CDATA[dp]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[hard]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[run length]]></category>
		<guid isPermaLink="false">https://zxi.mytechroad.com/blog/?p=7169</guid>

					<description><![CDATA[<p>Run-length encoding&#160;is a string compression method that works by&#160;replacing consecutive identical characters (repeated 2 or more times) with the concatenation of the character and the&#8230;</p>
<p>The post <a rel="nofollow" href="https://zxi.mytechroad.com/blog/dynamic-programming/leetcode-1531-string-compression-ii/">花花酱 LeetCode 1531. String Compression II</a> appeared first on <a rel="nofollow" href="https://zxi.mytechroad.com/blog">Huahua&#039;s Tech Road</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe title="花花酱 LeetCode 1531. String Compression II - 刷题找工作 EP347" width="500" height="281" src="https://www.youtube.com/embed/UIK00l_AiPQ?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div></figure>



<p><a href="http://en.wikipedia.org/wiki/Run-length_encoding">Run-length encoding</a>&nbsp;is a string compression method that works by&nbsp;replacing consecutive identical characters (repeated 2 or more times) with the concatenation of the character and the number marking the count of the characters (length of the run). For example, to compress the string&nbsp;<code>"aabccc"</code>&nbsp;we replace&nbsp;<code>"aa"</code>&nbsp;by&nbsp;<code>"a2"</code>&nbsp;and replace&nbsp;<code>"ccc"</code>&nbsp;by&nbsp;<code>"c3"</code>. Thus the compressed string becomes&nbsp;<code>"a2bc3"</code>.</p>



<p>Notice that in this problem, we are not adding&nbsp;<code>'1'</code>&nbsp;after single characters.</p>



<p>Given a&nbsp;string&nbsp;<code>s</code>&nbsp;and an integer&nbsp;<code>k</code>. You need to delete&nbsp;<strong>at most</strong>&nbsp;<code>k</code>&nbsp;characters from&nbsp;<code>s</code>&nbsp;such that the run-length encoded version of&nbsp;<code>s</code>&nbsp;has minimum length.</p>



<p>Find the&nbsp;<em>minimum length of the run-length encoded&nbsp;version of&nbsp;</em><code>s</code><em>&nbsp;after deleting at most&nbsp;</em><code>k</code><em>&nbsp;characters</em>.</p>



<p><strong>Example 1:</strong></p>



<pre class="wp-block-preformatted;crayon:false"><strong>Input:</strong> s = "aaabcccd", k = 2
<strong>Output:</strong> 4
<strong>Explanation: </strong>Compressing s without deleting anything will give us "a3bc3d" of length 6. Deleting any of the characters 'a' or 'c' would at most decrease the length of the compressed string to 5, for instance delete 2 'a' then we will have s = "abcccd" which compressed is abc3d. Therefore, the optimal way is to delete 'b' and 'd', then the compressed version of s will be "a3c3" of length 4.</pre>



<p><strong>Example 2:</strong></p>



<pre class="wp-block-preformatted;crayon:false"><strong>Input:</strong> s = "aabbaa", k = 2
<strong>Output:</strong> 2
<strong>Explanation: </strong>If we delete both 'b' characters, the resulting compressed string would be "a4" of length 2.
</pre>



<p><strong>Example 3:</strong></p>



<pre class="wp-block-preformatted;crayon:false"><strong>Input:</strong> s = "aaaaaaaaaaa", k = 0
<strong>Output:</strong> 3
<strong>Explanation: </strong>Since k is zero, we cannot delete anything. The compressed string is "a11" of length 3.
</pre>



<p><strong>Constraints:</strong></p>



<ul><li><code>1 &lt;= s.length &lt;= 100</code></li><li><code>0 &lt;= k &lt;= s.length</code></li><li><code>s</code>&nbsp;contains only lowercase English letters.</li></ul>



<h2><strong>Solution 0: Brute Force DFS (TLE)</strong></h2>



<p>Time complexity: O(C(n,k))<br>Space complexity: O(k)</p>



<div class="responsive-tabs">
<h2 class="tabtitle">C++</h2>
<div class="tabcontent">

<pre class="crayon-plain-tag">class Solution {
public:
  int getLengthOfOptimalCompression(string s, int k) {
    const int n = s.length();
    auto encode = [&amp;]() -&gt; int {
      char p = '$';
      int count = 0;
      int len = 0;
      for (char c : s) {
        if (c == '.') continue;
        if (c != p) {
          p = c;
          count = 0;
        }
        ++count;
        if (count &lt;= 2 || count == 10 || count == 100)
          ++len;               
      }
      return len;
    };
    function&lt;int(int, int)&gt; dfs = [&amp;](int start, int k) -&gt; int {
      if (start == n || k == 0) return encode();
      int ans = n;
      for (int i = start; i &lt; n; ++i) {
        char c = s[i];
        s[i] = '.'; // delete
        ans = min(ans, dfs(i + 1, k - 1));
        s[i] = c;
      }
      return ans;
    };
    return dfs(0, k);
  }
};</pre>
</div></div>



<h2><strong>Solution1: DP</strong></h2>



<figure class="wp-block-image size-large"><img width="960" height="540" src="https://zxi.mytechroad.com/blog/wp-content/uploads/2020/07/1531-ep347-2.png" alt="" class="wp-image-7175" srcset="https://zxi.mytechroad.com/blog/wp-content/uploads/2020/07/1531-ep347-2.png 960w, https://zxi.mytechroad.com/blog/wp-content/uploads/2020/07/1531-ep347-2-300x169.png 300w, https://zxi.mytechroad.com/blog/wp-content/uploads/2020/07/1531-ep347-2-768x432.png 768w" sizes="(max-width: 960px) 100vw, 960px" /></figure>



<p>State: <br>i: the start index of the substring<br>last: last char<br>len: run-length<br>k: # of chars that can be deleted.<br><br>base case:<br>1. k &lt; 0: return inf # invalid <br>2. i &gt;= s.length(): return 0 # done<br></p>



<p>Transition:<br>1. if s[i] == last: return carry + dp(i + 1, last, len + 1, k)</p>



<p>2. if s[i] != last:<br>  return min(1 + dp(i + 1, s[i], 1, k, #  start a new group with s[i]<br>     dp(i + 1, last, len, k -1) # delete / skip s[i], keep it as is.</p>



<p>Time complexity: O(n^3*26)<br>Space complexity: O(n^3*26) </p>



<div class="responsive-tabs">
<h2 class="tabtitle">C++</h2>
<div class="tabcontent">

<pre class="crayon-plain-tag">int cache[101][27][101][101];
class Solution {
public:
  int getLengthOfOptimalCompression(string s, int k) {    
    memset(cache, -1, sizeof(cache));
    // Min length of compressioned string of s[i:]    
    // 1. last char is |last|
    // 2. current run-length is len
    // 3. we can delete k chars.
    function&lt;int(int, int, int, int)&gt; dp = 
      [&amp;](int i, int last, int len, int k) {
      if (k &lt; 0) return INT_MAX / 2;
      if (i &gt;= s.length()) return 0;      
      int&amp; ans = cache[i][last][len][k];
      if (ans != -1) return ans;
      if (s[i] - 'a' == last) { 
        // same as the previous char, no need to delete.
        int carry = (len == 1 || len == 9 || len == 99);
        ans = carry + dp(i + 1, last, len + 1, k);
      } else {
        ans = min(1 + dp(i + 1, s[i] - 'a', 1, k),  // keep s[i]
                      dp(i + 1, last, len, k - 1)); // delete s[i]
      }
      return ans;
    };
    return dp(0, 26, 0, k);
  }
};</pre>
</div></div>



<h2><strong>State compression</strong></h2>



<figure class="wp-block-image size-large"><img width="960" height="540" src="https://zxi.mytechroad.com/blog/wp-content/uploads/2020/07/1531-ep347-3.png" alt="" class="wp-image-7174" srcset="https://zxi.mytechroad.com/blog/wp-content/uploads/2020/07/1531-ep347-3.png 960w, https://zxi.mytechroad.com/blog/wp-content/uploads/2020/07/1531-ep347-3-300x169.png 300w, https://zxi.mytechroad.com/blog/wp-content/uploads/2020/07/1531-ep347-3-768x432.png 768w" sizes="(max-width: 960px) 100vw, 960px" /></figure>



<p>dp[i][k] := min len of s[i:] encoded by deleting at most k charchters.</p>



<p>dp[i][k] = min(dp[i+1][k-1] # delete s[i]<br>encode_len(s[i~j] == s[i]) + dp(j+1, k &#8211; sum(s[i~j])) for j in range(i, n)) # keep</p>



<p>Time complexity: O(n^2*k)<br>Space complexity: O(n*k)</p>



<div class="responsive-tabs">
<h2 class="tabtitle">C++</h2>
<div class="tabcontent">

<pre class="crayon-plain-tag">// Author: Huahua
class Solution {
public:
  int getLengthOfOptimalCompression(string s, int k) {    
    const int n = s.length();
    vector&lt;vector&lt;int&gt;&gt; cache(n, vector&lt;int&gt;(k + 1, -1));
    function&lt;int(int, int)&gt; dp = [&amp;](int i, int k) -&gt; int {
      if (k &lt; 0) return n;
      if (i + k &gt;= n) return 0;
      int&amp; ans = cache[i][k];
      if (ans != -1) return ans;
      ans = dp(i + 1, k - 1); // delete      
      int len = 0;
      int same = 0;
      int diff = 0;
      for (int j = i; j &lt; n &amp;&amp; diff &lt;= k; ++j) {
        if (s[j] == s[i] &amp;&amp; ++same) {
          if (same &lt;= 2 || same == 10 || same == 100) ++len;
        } else {
          ++diff;
        }
        ans = min(ans, len + dp(j + 1, k - diff));
      }
      return ans;
    };
    return dp(0, k);
  }
};</pre>

</div><h2 class="tabtitle">Java</h2>
<div class="tabcontent">

<pre class="crayon-plain-tag">// Author: Huahua
class Solution {
  private int[][] dp;
  private char[] s;
  private int n;
  
  public int getLengthOfOptimalCompression(
    String s, int k) {
    this.s = s.toCharArray();
    this.n = s.length();
    this.dp = new int[n][k + 1];
    for (int[] row : dp)
      Arrays.fill(row, -1);
    return dp(0, k);
  }  
  
  private int dp(int i, int k) {
    if (k &lt; 0) return this.n;
    if (i + k &gt;= n) return 0; // done or delete all.    
    int ans = dp[i][k];
    if (ans != -1) return ans;
    ans = dp(i + 1, k - 1); // delete s[i]
    int len = 0;
    int same = 0;    
    int diff = 0;
    for (int j = i; j &lt; n &amp;&amp; diff &lt;= k; ++j) {
      if (s[j] == s[i]) {
        ++same;
        if (same &lt;= 2 || same == 10 || same == 100) ++len;
      } else {
        ++diff;
      }      
      ans = Math.min(ans, len + dp(j + 1, k - diff)); 
    }
    dp[i][k] = ans;
    return ans;
  }
}</pre>

</div><h2 class="tabtitle">Python3</h2>
<div class="tabcontent">

<pre class="crayon-plain-tag"># Author: Huahua
class Solution:
  def getLengthOfOptimalCompression(self, s: str, k: int) -&gt; int:
    n = len(s)
    @functools.lru_cache(maxsize=None)
    def dp(i, k):
      if k &lt; 0: return n
      if i + k &gt;= n: return 0
      ans = dp(i + 1, k - 1)
      l = 0
      same = 0
      for j in range(i, n):
        if s[j] == s[i]:
          same += 1
          if same &lt;= 2 or same == 10 or same == 100:
            l += 1
        diff = j - i + 1 - same
        if diff &lt; 0: break
        ans = min(ans, l + dp(j + 1, k - diff))
      return ans
    return dp(0, k)</pre>
</div></div>
<p>The post <a rel="nofollow" href="https://zxi.mytechroad.com/blog/dynamic-programming/leetcode-1531-string-compression-ii/">花花酱 LeetCode 1531. String Compression II</a> appeared first on <a rel="nofollow" href="https://zxi.mytechroad.com/blog">Huahua&#039;s Tech Road</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://zxi.mytechroad.com/blog/dynamic-programming/leetcode-1531-string-compression-ii/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>花花酱 LeetCode 393. UTF-8 Validation</title>
		<link>https://zxi.mytechroad.com/blog/bit/leetcode-393-utf-8-validation/</link>
					<comments>https://zxi.mytechroad.com/blog/bit/leetcode-393-utf-8-validation/#respond</comments>
		
		<dc:creator><![CDATA[zxi]]></dc:creator>
		<pubDate>Sat, 21 Mar 2020 05:37:59 +0000</pubDate>
				<category><![CDATA[Bit]]></category>
		<category><![CDATA[bit]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[medium]]></category>
		<guid isPermaLink="false">https://zxi.mytechroad.com/blog/?p=6507</guid>

					<description><![CDATA[<p>A character in UTF8 can be from&#160;1 to 4 bytes&#160;long, subjected to the following rules: For 1-byte character, the first bit is a 0, followed&#8230;</p>
<p>The post <a rel="nofollow" href="https://zxi.mytechroad.com/blog/bit/leetcode-393-utf-8-validation/">花花酱 LeetCode 393. UTF-8 Validation</a> appeared first on <a rel="nofollow" href="https://zxi.mytechroad.com/blog">Huahua&#039;s Tech Road</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-4-3 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe title="花花酱 LeetCode 393. UTF-8 Validation - 刷题找工作 EP316" width="500" height="375" src="https://www.youtube.com/embed/0s4M9Y1ue5o?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div></figure>



<p>A character in UTF8 can be from&nbsp;<strong>1 to 4 bytes</strong>&nbsp;long, subjected to the following rules:</p>



<ol><li>For 1-byte character, the first bit is a 0, followed by its unicode code.</li><li>For n-bytes character, the first n-bits are all one&#8217;s, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.</li></ol>



<p>This is how the UTF-8 encoding would work:</p>



<pre class="crayon-plain-tag">Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</pre>



<p>Given an array of integers representing the data, return whether it is a valid utf-8 encoding.</p>



<p><strong>Note:</strong><br>The input is an array of integers. Only the&nbsp;<strong>least significant 8 bits</strong>&nbsp;of each integer is used to store the data. This means each integer represents only 1 byte of data.</p>



<p><strong>Example 1:</strong></p>



<pre class="wp-block-preformatted;crayon:false">data = [197, 130, 1], which represents the octet sequence: <strong>11000101 10000010 00000001</strong>.

Return <strong>true</strong>.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
</pre>



<p><strong>Example 2:</strong></p>



<pre class="wp-block-preformatted;crayon:false">data = [235, 140, 4], which represented the octet sequence: <strong>11101011 10001100 00000100</strong>.

Return <strong>false</strong>.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.</pre>



<h2><strong>Solution: Bit Operation</strong></h2>



Check the first byte of a character and find out the number of bytes (from 0 to 3) left to check. The left bytes must start with 0b10.



<p>Time complexity: O(n)<br>Space complexity: O(1)</p>



<div class="responsive-tabs">
<h2 class="tabtitle">C++</h2>
<div class="tabcontent">

<pre class="crayon-plain-tag">// Author: Huahua
class Solution {
public:
  bool validUtf8(vector&lt;int&gt;&amp; data) {
    int left = 0;
    for (int d : data) {      
      if (left == 0) {
        if ((d &gt;&gt; 3) == 0b11110) left = 3;
        else if ((d &gt;&gt; 4) == 0b1110) left = 2;
        else if ((d &gt;&gt; 5) == 0b110) left = 1;
        else if ((d &gt;&gt; 7) == 0b0) left = 0;
        else return false;
      } else {
        if ((d &gt;&gt; 6) != 0b10) return false;
        --left;
      }
    }
    return left == 0;
  }
};</pre>
</div></div>
<p>The post <a rel="nofollow" href="https://zxi.mytechroad.com/blog/bit/leetcode-393-utf-8-validation/">花花酱 LeetCode 393. UTF-8 Validation</a> appeared first on <a rel="nofollow" href="https://zxi.mytechroad.com/blog">Huahua&#039;s Tech Road</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://zxi.mytechroad.com/blog/bit/leetcode-393-utf-8-validation/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>花花酱 LeetCode 443. String Compression</title>
		<link>https://zxi.mytechroad.com/blog/string/leetcode-443-string-compression/</link>
					<comments>https://zxi.mytechroad.com/blog/string/leetcode-443-string-compression/#respond</comments>
		
		<dc:creator><![CDATA[zxi]]></dc:creator>
		<pubDate>Sat, 24 Mar 2018 19:50:55 +0000</pubDate>
				<category><![CDATA[String]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[in place]]></category>
		<category><![CDATA[run length]]></category>
		<guid isPermaLink="false">http://zxi.mytechroad.com/blog/?p=2354</guid>

					<description><![CDATA[<p>Problem 题目大意：对一个string进行in-place的run length encoding。 https://leetcode.com/problems/string-compression/description/ Given an array of characters, compress it in-place. The length after compression must always be smaller than or equal to the&#8230;</p>
<p>The post <a rel="nofollow" href="https://zxi.mytechroad.com/blog/string/leetcode-443-string-compression/">花花酱 LeetCode 443. String Compression</a> appeared first on <a rel="nofollow" href="https://zxi.mytechroad.com/blog">Huahua&#039;s Tech Road</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h1><strong>Problem</strong></h1>
<p>题目大意：对一个string进行in-place的run length encoding。</p>
<p><a href="https://leetcode.com/problems/string-compression/description/">https://leetcode.com/problems/string-compression/description/</a></p>
<p>Given an array of characters, compress it <a href="https://en.wikipedia.org/wiki/In-place_algorithm" target="_blank" rel="noopener"><b>in-place</b></a>.</p>
<p>The length after compression must always be smaller than or equal to the original array.</p>
<p>Every element of the array should be a <b>character</b> (not int) of length 1.</p>
<p>After you are done <b>modifying the input array <a href="https://en.wikipedia.org/wiki/In-place_algorithm" target="_blank" rel="noopener">in-place</a></b>, return the new length of the array.</p>
<p><b>Follow up:</b><br />
Could you solve it using only O(1) extra space?</p>
<p><b>Example 1:</b></p>
<pre class="crayon:false"><b>Input:</b>
["a","a","b","b","c","c","c"]

<b>Output:</b>
Return 6, and the first 6 characters of the input array should be: ["a","2","b","2","c","3"]

<b>Explanation:</b>
"aa" is replaced by "a2". "bb" is replaced by "b2". "ccc" is replaced by "c3".
</pre>
<p><b>Example 2:</b></p>
<pre class="crayon:false"><b>Input:</b>
["a"]

<b>Output:</b>
Return 1, and the first 1 characters of the input array should be: ["a"]

<b>Explanation:</b>
Nothing is replaced.
</pre>
<p><b>Example 3:</b></p>
<pre class="crayon:false  "><b>Input:</b>
["a","b","b","b","b","b","b","b","b","b","b","b","b"]

<b>Output:</b>
Return 4, and the first 4 characters of the input array should be: ["a","b","1","2"].

<b>Explanation:</b>
Since the character "a" does not repeat, it is not compressed. "bbbbbbbbbbbb" is replaced by "b12".
Notice each digit has it's own entry in the array.
</pre>
<p><b>Note:</b></p>
<ol>
<li>All characters have an ASCII value in <code>[35, 126]</code>.</li>
<li><code>1 &lt;= len(chars) &lt;= 1000</code>.</li>
</ol>
<h1><strong>Solution</strong></h1>
<p>Time complexity: O(n)</p>
<p>Space complexity: O(1)</p>
<p>C++</p><pre class="crayon-plain-tag">// Author: Huahua
// Running time: 9 ms
class Solution {
public:
  int compress(vector&lt;char&gt;&amp; chars) {
    const int n = chars.size();
    int p = 0;
    for (int i = 1; i &lt;= n; ++i) {
      int count = 1;
      while (i &lt; n &amp;&amp; chars[i] == chars[i - 1]) { ++i; ++count; }
      chars[p++] = chars[i - 1];
      if (count == 1) continue;
      for (char c : to_string(count))
        chars[p++] = c;
    }
    return p;
  }
};</pre><p>&nbsp;</p>
<p>The post <a rel="nofollow" href="https://zxi.mytechroad.com/blog/string/leetcode-443-string-compression/">花花酱 LeetCode 443. String Compression</a> appeared first on <a rel="nofollow" href="https://zxi.mytechroad.com/blog">Huahua&#039;s Tech Road</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://zxi.mytechroad.com/blog/string/leetcode-443-string-compression/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
