窗口类
Window Sum
Longest Substring Without Repeating Characters
- 分析
- code
Minimum Size Subarray Sum
Minimum Window Substring
Longest Substring with At Most K Distinct Characters

11-30 Bobst

窗口类

// 模板
/** 优化思想：
 *  通过两层for循环优化得来
 *  外层指针i依然是依次遍历
 *  内层指针j证明不需要回退
 */
for(int i = 0; i < n; i++) {
    while(j < n) {
        if(condition) {
            j++;
            update j;
        } else {
            break;
        }
    }
    update i;
}

// 和sliding window的区别， sliding window的窗口大小是固定的。

Window Sum

这题其实不是two pointer, 但是也有window的思想在里面就放这儿吧。

A simpler way to solve this is instead of doing two loops to get the partial sum in the window, you can save each sum result in a variable called current. Then every time you shift the window to the right, one element is shifted out and a new element is shifted in. So you can just update the windowed sum by adding the new shift-in element and subtracting the old shift-out element. This way you save some computation. The overall time complexity is O(N).

public class WindowSum {
    public static int[] getWindowSum(int[] nums, int k) {
        //input check
        if(nums == null || nums.length == 0 || k <= 0) {
            return new int[0];
        }

        int[] result = new int[nums.length - k + 1];
        int sum = 0;
        for(int i = 0; i < k; i++) {
            sum += nums[i];
        }
        result[0] = sum;
        // add the shift in element, substract the shift out element
        for(int i = 1; i < nums.length - k + 1; i++) {
            result[i] = sum + nums[i + k - 1] - nums[i - 1];
        }

        return result;
    }
}

Longest Substring Without Repeating Characters

Given a string, find the length of the longest substring without repeating characters.

分析

If we already know a substring S has duplicate characters, there’s no need to check any substring that contains S.

解法：基本思路是维护一个窗口，每次关注窗口中的字符串，在每次判断中，左窗口和右窗口选择其一向前移动。同时维护一个HashSet, 正常情况下移动右窗口，如果没有出现重复则继续移动右窗口，如果发现重复字符，则说明当前窗口中的串已经不满足要求，继续移动右窗口不可能得到更好的结果，此时移动左窗口， 直到不再有重复字符为止，中间跳过的这些串中不会有更好的结果，因为他们不是重复就是更短。

当然，我们可以扫一遍，找到最长的substring: Obviously, we can go over all the substrings and find the longest one.

但是这个会导致O(n^4)的时间复杂度（O(n^2)去得到所有的substring，O(n^2) 去检查是否重复）。 This will give us O(N^4) time complexity (O(N^2) to get all substrings, O(N^2) to check if each substring has dups) and O(1) space complexity.

很显然这个解法太慢了 It's clear that the solution is too slow.

常见的加速的方法是用更多的空间。 And the most common approach to speeding up the program is to use more memory.

我们可以用空间换时间 we can sacrifice the space complexity to improve the time complexity.

哈希set是查重的常用数据结构，通过hashset我们可以在O(N)的时间里面检查substring是否有重复。 A hashset is the most common data structure to detect dups and with a hashset, we can check if a substring has dups in O(N) time.

现在，瓶颈是遍历所有的substring Now, the bottleneck is at iterating over all substrings, which has O(N^2) complexity.

实际上，这里面有很多不必要的检查。 In fact, there are lots of unnecessary checks.

如果我们已经知道substring s有重复的char了，那就没必要检查所有包含s的substring了。 If we already know a substring S has duplicate characters, there’s no need to check any substring that contains S.

以。。。为例 Take string ... as an example.

当我们意识到 "abcc" 有重复，我们可以马上弃掉整个substring，从c开始检查。因此，我们就有了下面的方式： When we realize that substring “abcc” has dups, we can immediately abandon the whole substring and check substrings from “cdefgh”. Thus, we have the following approach:

用两个指针去标记现在在检查的substring，并都初始化为0. Keep two pointers (start, end) that denotes the current substring we’re checking and both has an initial value of 0.

每次，向前移动end的指针一个字符，用hashset去检查是否新加入的字符是已经存在的。 Each time, move forward end pointer by 1 character and use the hashset to check if the newly added character has already existed.

如果没有，向前移动end指针 If not, keep moving end pointer forward.

如果新字符是重复的，移动start指针一直到现在end指针的位置，从hashset中remove所有的char。 If the new character is a dup one, move the start pointer all the way to the current end pointer and pop out all these chars from the hashset.

重复步骤二，在完成整个string后打印结果。 Repeating step 2 and output the result after finishing the whole string.

本质上，我们知道处理所有的substring是没有必要的，sliding window 是一个很好的方法帮助我们提升。 In essence, as we know processing all substrings are unnecessary, sliding window can be a great technique to improve that.

The final solution has O(N) time complexity and O(N) space complexity.

code

public class Solution {
    public int lengthOfLongestSubstring(String s) {
        // It’s important to validate the input in the beginning.
        if(s == null || s.length() == 0) {
            return 0;
        }

        int i = 0, j = 0;
        HashSet<Character> set = new HashSet<>();
        int max = 1;

        for(i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            while(j < s.length() && !set.contains(s.charAt(j))) {
                set.add(s.charAt(j));
                j++;
            }
            max = Math.max(max, j - i);
            set.remove(c);

        } 
        return max;
    }
}

Minimum Size Subarray Sum

解法：我们用两个指针维护一个窗口，保证这个窗口的内的和是小于目标数的。如果新来的数加上后，和大于目标，则比较下当前窗口长度和最短窗口长度，窗口左边界右移。如果和仍小于目标数，则将窗口右边界右移。

注意这里退出的条件，右边界是小于等于长度，因为我们窗口到了最右侧时，还需要继续左移左边界来看有没有更优的解法。另外，如果左边界大于右边界时，说明最短子串的长度已经小于等于1，我们就不用再查找了。

public class Solution {
    public int minSubArrayLen(int s, int[] nums) {
        if(nums == null || nums.length == 0 || s <= 0) {
            return 0;
        }

        int left = 0, right = 0;
        int sum = 0;
        int min = Integer.MAX_VALUE;

        for(left = 0; left < nums.length; left++) {
            while(right < nums.length) {
                if(sum < s) {
                    sum += nums[right];
                    right++;
                } else {
                    break;
                }
            }

            // 注意此时right已经是++过的right
            if(sum >= s) min = Math.min(min, right - left);
            sum -= nums[left];
        }

        // if can't find such sum, we wouldn't update min
        return min == Integer.MAX_VALUE ? 0 : min;
    }
}

Minimum Window Substring

找到S里面最小的window含有T的所有的字符

思路：还是维护一个 window, 如果里面没有T的所有字符，就把右边界++，如果含有所有字符，那么移动左边界不会有更小的结果，这时候我们移动右边界。

如何快速检查string str 含有 string t的所有字符：用两个hash array,如果在str里面每个字符的个数都大于t的字符数，那么就是返回true.

public class Solution {
    public String minWindow(String s, String t) {
        if (s == null || t == null) {
            return null;
        }

        int[] source = new int[256];
        int[] target = new int[256];

        initTarget(t, target);

        String min = "";
        int minSize = Integer.MAX_VALUE;
        int j = 0;
        for(int i = 0; i < s.length(); i++) {
            // error-prone
            while(j < s.length() && !valid(source, target)) {
                source[s.charAt(j)]++;
                j++;
            }

            // error-prone
            if(valid(source, target)) {
                if(minSize > j - i) {
                    minSize = Math.min(minSize, j - i);
                    min = s.substring(i, j);
                }    
            }

            source[s.charAt(i)]--; 
        }

        return min;
    }

    public void initTarget(String t, int[] target) {
        for(char c :t.toCharArray()) {
            target[c]++;
        }
    }

    public boolean valid(int[] source, int[] target) {
        for(int i = 0; i < 256; i++) {
            if(target[i] > source[i]) {
                return false;
            }
        }

        return true;
    }
}

Longest Substring with At Most K Distinct Characters

因为是维护一个k个不同的char，什么数据结构可以判重并且能方便的控制大小? HashMap Key 是distinct char, value 是个数。

maintain a hashmap size() not larger than k and a corresponding window labeled by i and j. If the window is valid, we expand the window by moving forward j. else remove the element from window.

public class Solution {
    public int lengthOfLongestSubstringKDistinct(String s, int k) {
        if(s == null || s.length() == 0 || k <= 0) {
            return 0;
        } 

        if(k > s.length()) {
            return s.length();
        }

        HashMap<Character,Integer> map = new HashMap<>();

        int maxLen = 1;

        int j = 0;
        for(int i = 0; i < s.length(); i++) {
            while(j < s.length()) {
                char c = s.charAt(j);
                if(map.containsKey(c)) {
                    map.put(c, map.get(c) + 1);
                } else {
                    // error-prone
                    // important
                    // keep the size not larger than k
                    if(map.size() == k) {
                        break;
                    } else {
                        map.put(c, 1);
                    }
                }
                j++;
            }

            maxLen = Math.max(maxLen, j - i);
            // remove
            char c = s.charAt(i);
            if(map.containsKey(c)) {
                if(map.get(c) == 1) {
                    map.remove(c);
                } else {
                    map.put(c, map.get(c) - 1);
                }
            }
        }

        return maxLen;
    }
}

窗口类

窗口类