692. Top K Frequent Words

Hello, reader 👋🏽 ! Welcome to day 19 of the series on Problem Solving. Through this series, I aim to pick up at least one question everyday and share my approach for solving it.

Today, I will be picking up LeetCode's daily challenge problem: 692. Top K Frequent Words.

🤔 Problem Statement

Given a list of words, return the top K frequent words from the list. The words should be returned in the descending order of their frequency.
If two or more words have the same frequency, then they should be returned in the lexicographical order.
E.g.:
- words = ["i","love","leetcode","i","love","coding"], k = 2
  - Output: ["i", "love"]
- words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 4
  - Output: ["the", "is", "sunny", "day"]

💬 Thought Process - Sorting I

The first way that I could think of to solve the problem was to:
1. Find frequencies of all words, and keep track of the highest frequency
2. Sort the words list
3. For every possible frequency from the highest to 0, traverse the sorted array and add the word if not added before. Repeat this until all k words fount/
A very naive approach, but this does the work! Let's look at the code for the same now.

👩🏽‍💻 Solution - Sorting I

Below is the code for the above approach.

class Solution {
    public List<String> topKFrequent(String[] words, int k) {
        List<String> frequentWords = new ArrayList();


        HashMap<String, Integer> map = new HashMap();
        if(words == null || words.length == 0) {
            return frequentWords;
        }

        int mostFreq = 0;
        for(String word: words) {
            if(!map.containsKey(word)) {
                map.put(word, 0);
            }
            map.put(word, map.get(word)+1);
            mostFreq = Math.max(mostFreq, map.get(word));
        }

        Arrays.sort(words);

        // This is needed to ensure that we don't add duplicate words
        // that appear multiple times in the word
        String prevWord = "";

        while(mostFreq > 0 && frequentWords.size() < k) {
            for(String word: words) {
                if(map.get(word) == mostFreq && 
                   !prevWord.equals(word)) {
                    frequentWords.add(word);
                    prevWord = word;
                    if(frequentWords.size() == k) {
                        break;
                    }
                }
            }
            mostFreq--;
        }

        return frequentWords;
    }
}

Time Complexity: O(n*k)
    - n = number of words
    - k = number of unique words in the list
Space Complexity: O(n)

Although very easy to come up with, the above approach in the worst case runs in O(n^2) when all the words are unique and we have to find k (=n) frequent words.
We can improve the complexity using a custom comparator to sort.

💬 Thought Process - Sorting II using Custom Comparator

In the last approach, we handled adding the top k frequent element and sorting words in lexicographical order separately.
In this approach, we can combine the two steps using a custom comparator.
1. Count the frequency of all words in the list
2. Create list pairs of <word, frequency> and sort this in the decreasing order using a custom comparator such that:
  - If two words have same frequency, the smaller string appears before larger string
  - If two words have different frequency, the string with larger frequency appears before the string with smaller frequency
3. Traverse the sorted list until the top k frequent words are found.
Let's look into the code now.

👩🏽‍💻 Solution - Sorting II using Custom Comparator

Below is the code for the above approach using custom comparator.

class Pair {
    String word;
    int frequency;
    Pair(String word, int frequency) {
        this.word = word;
        this.frequency = frequency;
    }
}

class Solution {
    public List<String> topKFrequent(String[] words, int k) {
        List<String> frequentWords = new ArrayList();

        HashMap<String, Integer> map = new HashMap();
        if(words == null || words.length == 0) {
            return frequentWords;
        }

        int mostFreq = 0;
        for(String word: words) {
            if(!map.containsKey(word)) {
                map.put(word, 0);
            }
            map.put(word, map.get(word)+1);
            mostFreq = Math.max(mostFreq, map.get(word));
        }

        List<Pair> sortedPairs = new ArrayList();
        for(String word: map.keySet()) {
            int frequency = map.get(word);
            sortedPairs.add(new Pair(word, frequency));
        }

        Collections.sort(sortedPairs, new CustomComparator());

        for(int i = 0; i<k; i++) {
            Pair p = sortedPairs.get(i);
            String word = p.word;
            frequentWords.add(word);
        }


        return frequentWords;
    }
}

class CustomComparator implements Comparator<Pair> {
    public int compare(Pair pOne, Pair pTwo) {
        int freqOne = pOne.frequency;
        int freqTwo = pTwo.frequency;

        String wordOne = pOne.word;
        String wordTwo = pTwo.word;

        // If frequencies are the same, we return the smaller string
        if(freqOne == freqTwo) {
            return wordOne.compareTo(wordTwo);
        }

        // else we return the string with a greater frequency
        return freqTwo - freqOne;
    }
}

Time Complexity: O(n* log n)
    - n = number of words
Space Complexity: O(n)
    - n = number of words
    - We use extra space for map, and maintaining sorted pairs

💬 Thought Process - Max Heap

Instead of sorting the elements, we could make use of a heap (priority queue), that helps us maintain items such that the root value is greater than children in the left and right subtrees.
This way, we can maintain all the top frequently appearing elements.
1. Create a max heap of pairs <word, frequency> such that a pair p1 appears before pair p2 if:
  - frequency of p1 is greater than frequency p2, or
  - frequency of p1 and p2 are same, but p2 is lexicographically smaller than p2
2. Pop k elements from the priority queue, which will remove the top k frequent elements.
We will create custom comparator that allows us to maintain the above ordering.
Now, let's look into the code.

👩🏽‍💻 Solution - Max Heap

Below is the code for the above approach using max heap and custom comparator.

class Solution {
    public List<String> topKFrequent(String[] words, int k) {
        List<String> frequentWords = new ArrayList();

        HashMap<String, Integer> map = new HashMap();
        if(words == null || words.length == 0) {
            return frequentWords;
        }

        int mostFreq = 0;
        for(String word: words) {
            if(!map.containsKey(word)) {
                map.put(word, 0);
            }
            map.put(word, map.get(word)+1);
            mostFreq = Math.max(mostFreq, map.get(word));
        }

        PriorityQueue<Pair> pq = 
            new PriorityQueue<>(new CustomComparator());

        for(String word: map.keySet()) {
            int frequency = map.get(word);
            Pair p = new Pair(word, frequency);

            pq.offer(p);
        }

        while(frequentWords.size() < k) {
            Pair p = pq.poll();
            frequentWords.add(p.word);
        }

        return frequentWords;
    }
}

class CustomComparator implements Comparator<Pair> {
    public int compare(Pair pOne, Pair pTwo) {
        int freqOne = pOne.frequency;
        int freqTwo = pTwo.frequency;

        String wordOne = pOne.word;
        String wordTwo = pTwo.word;

        if(freqOne == freqTwo) {
            return wordOne.compareTo(wordTwo);
        }

        return freqTwo - freqOne;
    }
}

class Pair {
    String word;
    int frequency;
    Pair(String word, int frequency) {
        this.word = word;
        this.frequency = frequency;
    }
}

Time Complexity: O(n* log n + k* lon)
    - n = number of words
    - n * log n -> to build the PQ for an items.
    - k * log n -> to pop k elements from PQ.
Space Complexity: O(n)
    - n = number of words
    - We use extra space for map, and maintaining priority queue

💬 Thought Process - Min Heap

Instead of maintaining all n elements in the heap, we can use a min heap to only maintain k elements.
And each time the heap becomes full, we compare the new pair of <word, frequency> with the smallest element.
1. Keep adding each pair <word, frequency> to the priority queue such that the length is always 1 and:
  - Pair p1 is the root if the frequency of word in p1 is less than the frequency of word in p2.
  - Pair p2 is the root if the frequency of the word in p1 is same as frequency of word in p2, but the word in p1 is lexicographically larger than the word in p2.
2. All the k elements are popped and returned.
We will use a custom comparator to maintain the above ordering.

👩🏽‍💻 Solution - Min Heap

Below is the code for the above approach using min heap and custom comparator.

class Solution {
    public List<String> topKFrequent(String[] words, int k) {
        List<String> frequentWords = new ArrayList();

        HashMap<String, Integer> map = new HashMap();
        if(words == null || words.length == 0) {
            return frequentWords;
        }

        int mostFreq = 0;
        for(String word: words) {
            if(!map.containsKey(word)) {
                map.put(word, 0);
            }
            map.put(word, map.get(word)+1);
            mostFreq = Math.max(mostFreq, map.get(word));
        }

        PriorityQueue<Pair> pq = 
            new PriorityQueue<>(new CustomComparator());

        for(String word: map.keySet()) {
            int frequency = map.get(word);
            Pair p = new Pair(word, frequency);

            if(pq.size() < k) {
                pq.offer(p);
            }
            else {
                Pair root = pq.peek();
                if(root.frequency < frequency) {
                    pq.poll();
                    pq.offer(p);
                }
                else if(root.frequency == frequency && root.word.compareTo(word) > 0) {
                    pq.poll();
                    pq.offer(p);
                }
            }
        }

        while(!pq.isEmpty()) {
            Pair p = pq.poll();
            frequentWords.add(0,p.word);
        }

        return frequentWords;
    }
}

class CustomComparator implements Comparator<Pair> {
    public int compare(Pair pOne, Pair pTwo) {
        int freqOne = pOne.frequency;
        int freqTwo = pTwo.frequency;

        String wordOne = pOne.word;
        String wordTwo = pTwo.word;

        if(freqOne == freqTwo) {
            return wordTwo.compareTo(wordOne);
        }

        return freqOne - freqTwo;
    }
}

class Pair {
    String word;
    int frequency;
    Pair(String word, int frequency) {
        this.word = word;
        this.frequency = frequency;
    }
}

Time Complexity: O(n * log k)
    - n = number of words
    - n * log k -> we are only maintaining a PQ of size k
Space Complexity: O(n)
    - n = number of words
    - We use extra space for map, and maintaining priority queue

There are so many other solutions as well, and I'm not going to mention them here. In an interview, if you specify till min-heap it's going to be more than enough probably.
But a good resource to check out other solutions are the discussion forum.

You can find the code for this problem on GitHub repo: 692. Top K Frequent Words.

Conclusion

That's a wrap for today's problem. If you liked my explanation then please do drop a like/ comment. Also, please correct me if I've made any mistakes or if you want me to improve something!

Thank you for reading!