{"id":6507,"date":"2020-03-20T22:37:59","date_gmt":"2020-03-21T05:37:59","guid":{"rendered":"https:\/\/zxi.mytechroad.com\/blog\/?p=6507"},"modified":"2020-03-21T12:03:34","modified_gmt":"2020-03-21T19:03:34","slug":"leetcode-393-utf-8-validation","status":"publish","type":"post","link":"https:\/\/zxi.mytechroad.com\/blog\/bit\/leetcode-393-utf-8-validation\/","title":{"rendered":"\u82b1\u82b1\u9171 LeetCode 393. UTF-8 Validation"},"content":{"rendered":"\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-4-3 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"\u82b1\u82b1\u9171 LeetCode 393. UTF-8 Validation - \u5237\u9898\u627e\u5de5\u4f5c EP316\" width=\"500\" height=\"375\" src=\"https:\/\/www.youtube.com\/embed\/0s4M9Y1ue5o?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>A character in UTF8 can be from&nbsp;<strong>1 to 4 bytes<\/strong>&nbsp;long, subjected to the following rules:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>For 1-byte character, the first bit is a 0, followed by its unicode code.<\/li><li>For n-bytes character, the first n-bits are all one&#8217;s, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.<\/li><\/ol>\n\n\n\n<p>This is how the UTF-8 encoding would work:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>   Char. number range  |        UTF-8 octet sequence\n      (hexadecimal)    |              (binary)\n   --------------------+---------------------------------------------\n   0000 0000-0000 007F | 0xxxxxxx\n   0000 0080-0000 07FF | 110xxxxx 10xxxxxx\n   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx\n   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx\n<\/code><\/pre>\n\n\n\n<p>Given an array of integers representing the data, return whether it is a valid utf-8 encoding.<\/p>\n\n\n\n<p><strong>Note:<\/strong><br>The input is an array of integers. Only the&nbsp;<strong>least significant 8 bits<\/strong>&nbsp;of each integer is used to store the data. This means each integer represents only 1 byte of data.<\/p>\n\n\n\n<p><strong>Example 1:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted;crayon:false\">data = [197, 130, 1], which represents the octet sequence: <strong>11000101 10000010 00000001<\/strong>.\n\nReturn <strong>true<\/strong>.\nIt is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.\n<\/pre>\n\n\n\n<p><strong>Example 2:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted;crayon:false\">data = [235, 140, 4], which represented the octet sequence: <strong>11101011 10001100 00000100<\/strong>.\n\nReturn <strong>false<\/strong>.\nThe first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.\nThe next byte is a continuation byte which starts with 10 and that's correct.\nBut the second continuation byte does not start with 10, so it is invalid.<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Solution: Bit Operation<\/strong><\/h2>\n\n\n\nCheck the first byte of a character and find out the number of bytes (from 0 to 3) left to check. The left bytes must start with 0b10.\n\n\n\n<p>Time complexity: O(n)<br>Space complexity: O(1)<\/p>\n\n\n\n<div class=\"responsive-tabs\">\n<h2 class=\"tabtitle\">C++<\/h2>\n<div class=\"tabcontent\">\n\n<pre lang=\"C++\">\n\/\/ Author: Huahua\nclass Solution {\npublic:\n  bool validUtf8(vector<int>& data) {\n    int left = 0;\n    for (int d : data) {      \n      if (left == 0) {\n        if ((d >> 3) == 0b11110) left = 3;\n        else if ((d >> 4) == 0b1110) left = 2;\n        else if ((d >> 5) == 0b110) left = 1;\n        else if ((d >> 7) == 0b0) left = 0;\n        else return false;\n      } else {\n        if ((d >> 6) != 0b10) return false;\n        --left;\n      }\n    }\n    return left == 0;\n  }\n};\n<\/pre>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>A character in UTF8 can be from&nbsp;1 to 4 bytes&nbsp;long, subjected to the following rules: For 1-byte character, the first bit is a 0, followed&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[126],"tags":[16,272,177],"class_list":["post-6507","post","type-post","status-publish","format-standard","hentry","category-bit","tag-bit","tag-encoding","tag-medium","entry","simple"],"_links":{"self":[{"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/posts\/6507","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/comments?post=6507"}],"version-history":[{"count":4,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/posts\/6507\/revisions"}],"predecessor-version":[{"id":6516,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/posts\/6507\/revisions\/6516"}],"wp:attachment":[{"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/media?parent=6507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/categories?post=6507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zxi.mytechroad.com\/blog\/wp-json\/wp\/v2\/tags?post=6507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}