Skip to content

Commit cfe8809

Browse files
committed
New Problem Solution "UTF-8 Validation"
1 parent ea1a213 commit cfe8809

File tree

2 files changed

+87
-0
lines changed

2 files changed

+87
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ LeetCode
88

99
| # | Title | Solution | Difficulty |
1010
|---| ----- | -------- | ---------- |
11+
|393|[UTF-8 Validation](https://leetcode.com/problems/utf-8-validation/) | [C++](./algorithms/cpp/UTF8Validation/UTF8Validation.cpp)|Medium|
1112
|392|[Is Subsequence](https://leetcode.com/problems/is-subsequence/) | [C++](./algorithms/cpp/isSubsequence/IsSubsequence.cpp)|Medium|
1213
|391|[Perfect Rectangle](https://leetcode.com/problems/perfect-rectangle/) | [C++](./algorithms/cpp/perfectRectangle/PerfectRectangle.cpp)|Hard|
1314
|390|[Elimination Game](https://leetcode.com/contest/2/problems/elimination-game/) | [C++](./algorithms/cpp/eliminationGame/EliminationGame.cpp)|Medium|
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
// Source : https://leetcode.com/problems/utf-8-validation/
2+
// Author : Hao Chen
3+
// Date : 2016-09-08
4+
5+
/***************************************************************************************
6+
*
7+
* A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:
8+
*
9+
* For 1-byte character, the first bit is a 0, followed by its unicode code.
10+
* For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by
11+
* n-1 bytes with most significant 2 bits being 10.
12+
*
13+
* This is how the UTF-8 encoding would work:
14+
*
15+
* Char. number range | UTF-8 octet sequence
16+
* --------------------+---------------------------------------------
17+
* 0000 0000-0000 007F | 0xxxxxxx
18+
* 0000 0080-0000 07FF | 110xxxxx 10xxxxxx
19+
* 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
20+
* 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
21+
*
22+
* Given an array of integers representing the data, return whether it is a valid utf-8
23+
* encoding.
24+
*
25+
* Note:
26+
* The input is an array of integers. Only the least significant 8 bits of each integer
27+
* is used to store the data. This means each integer represents only 1 byte of data.
28+
*
29+
* Example 1:
30+
*
31+
* data = [197, 130, 1], which represents the octet sequence: 11000101 10000010
32+
* 00000001.
33+
*
34+
* Return true.
35+
* It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
36+
*
37+
* Example 2:
38+
*
39+
* data = [235, 140, 4], which represented the octet sequence: 11101011 10001100
40+
* 00000100.
41+
*
42+
* Return false.
43+
* The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
44+
* The next byte is a continuation byte which starts with 10 and that's correct.
45+
* But the second continuation byte does not start with 10, so it is invalid.
46+
***************************************************************************************/
47+
48+
49+
class Solution {
50+
public:
51+
bool validUtf8(vector<int>& data) {
52+
int i = 0;
53+
while ( i < data.size() ) {
54+
if ( (data[i] & 0x80) == 0 ){
55+
i++;
56+
continue;
57+
}
58+
59+
int len = 0;
60+
if ( (data[i] & 0xE0) == 0xC0 ) { // checking 110xxxxx
61+
len = 2;
62+
}else if ( (data[i] & 0xF0) == 0xE0) { // checking 1110xxxx
63+
len = 3;
64+
}else if ( (data[i] & 0xF8) == 0xF0) { // checking 11110xxx
65+
len = 4;
66+
}else {
67+
return false;
68+
}
69+
70+
71+
for (int j=i+1; j < i+len; j++) { //checking 10xxxxxx
72+
if ( (data[j] & 0xC0) != 0x80 ) {
73+
return false;
74+
}
75+
}
76+
77+
i += len ;
78+
79+
if (i > data.size()) {
80+
return false;
81+
}
82+
83+
}
84+
return true;
85+
}
86+
};

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy