Press "Enter" to skip to content

Posts published in “Hashtable”

花花酱 LeetCode 609. Find Duplicate File in System



Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt … fn.txt with content f1_contentf2_content … fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:


Example 1:

["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]


  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

  1. Imagine you are given a real file system, how will you search files? DFS or BFS?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

Solution: HashTable

Key: content, Value: Array of filenames

Time complexity: O(n)

Space complexity: O(n)



花花酱 LeetCode 594. Longest Harmonious Subsequence



We define a harmonious array is an array where the difference between its maximum value and its minimum value is exactly 1.

Now, given an integer array, you need to find the length of its longest harmonious subsequence among all its possible subsequences.

Example 1:

Input: [1,3,2,2,5,2,3,7]
Output: 5
Explanation: The longest harmonious subsequence is [3,2,2,2,3].

Note: The length of the input array will not exceed 20,000.

Solution1: HashTable

Time complexity: O(n)
Space complexity: O(n)



花花酱 LeetCode 383. Ransom Note



Given an arbitrary ransom note string and another string containing letters from all the magazines, write a function that will return true if the ransom note can be constructed from the magazines ; otherwise, it will return false.

Each letter in the magazine string can only be used once in your ransom note.

You may assume that both strings contain only lowercase letters.

canConstruct("a", "b") -> false
canConstruct("aa", "ab") -> false
canConstruct("aa", "aab") -> true

Solution: HashTable

Time complexity: O(n + m)

Space complexity: O(128)





花花酱 LeetCode 349. Intersection of Two Arrays



Given two arrays, write a function to compute their intersection.

Given nums1 = [1, 2, 2, 1]nums2 = [2, 2], return [2].


  • Each element in the result must be unique.
  • The result can be in any order.

C++ using std::set_intersection

C++ hashtable


花花酱 LeetCode 454. 4Sum II

题目大意:给你4个组数:A, B, C, D。问有多少组组合的A[i] + B[j] + C[k] + D[l] 的和为0。

Given four lists A, B, C, D of integer values, compute how many tuples (i, j, k, l) there are such that A[i] + B[j] + C[k] + D[l]is zero.

To make problem a bit easier, all A, B, C, D have same length of N where 0 ≤ N ≤ 500. All integers are in the range of -228 to 228 – 1 and the result is guaranteed to be at most 231 – 1.


Solution 1: HashTable

Split the arrays into two groups: (A, B), (C, D)

Create the Minkowski sum of two groups and count the occurrence of each element.

e.g. A = [1, 2, 3], B = [0, -1, 1]

Minkowski sum(A, B) SAB= [0, 1, 2, 3, 4] => [0:1, 1:2, 2:3, 3:2, 4:1]

for each element s in SAB, check whether -s is in Minkowski sum(C, D) SCD

ans = sum(SAB[s] * SCD[-s]), for all s, that s in SAB and -s in SCD

Time complexity: O(n^2)

Space complexity: O(n^2)




Related Problems: