Robin on Linux – Page 5 – All about technology

Effectively fetch the smallest element in a heap of Python

To solve Leetcode #1675, I wrote the below code with the help of hints:

import sys
import heapq

class Solution:
    def minimumDeviation(self, nums: List[int]) -> int:
        n = len(nums)
        heapq.heapify(nums)
        _max = max(nums)
        ans = max(nums) - min(nums)
        while True:
            item = heapq.heappop(nums)
            if item % 2 == 1:
                heapq.heappush(nums, item * 2)
                _max = max(_max, item * 2)
                ans = min(ans, _max - heapq.nsmallest(1, nums)[0])
            else:
                heapq.heappush(nums, item)
                break
        print("stage1:", nums)
        nums = [-item for item in nums]
        heapq.heapify(nums)
        _max = max(nums)
        while True:
            item = heapq.heappop(nums)
            if item % 2 == 0:
                heapq.heappush(nums, item // 2)
                _max = max(_max, item // 2)
                ans = min(ans, _max - heapq.nsmallest(1, nums)[0])
            else:
                break
                
        return ans

I know, the code looks quite messy. But the more annoying problem is: it exceeded the time limit.

After learning the solutions from these smart guys, I realized how stupid I am — I can just use nums[0] instead of heapq.nsmallest(1, nums)[0] to get the smallest element in a heap, just as the official document said.

Then I just change two lines of my code and it passed all the test cases in time:

import sys
import heapq

class Solution:
    def minimumDeviation(self, nums: List[int]) -> int:
        n = len(nums)
        heapq.heapify(nums)
        _max = max(nums)
        ans = max(nums) - min(nums)
        while True:
            item = heapq.heappop(nums)
            if item % 2 == 1:
                heapq.heappush(nums, item * 2)
                _max = max(_max, item * 2)
                ans = min(ans, _max - nums[0])
            else:
                heapq.heappush(nums, item)
                break
        print("stage1:", nums)
        nums = [-item for item in nums]
        heapq.heapify(nums)
        _max = max(nums)
        while True:
            item = heapq.heappop(nums)
            if item % 2 == 0:
                heapq.heappush(nums, item // 2)
                _max = max(_max, item // 2)
                ans = min(ans, _max - nums[0])
            else:
                break
                
        return ans

A tip for the time complexity of LeetCode #127

The first intuitive idea that jumps out of my mind after taking a look at LeetCode #127 is using the graph algorithm. But for building the graph first, I need to traverse the wordList by O(n²) times.

Here comes the time complexity analysis: the length of the wordList is about 5000, O(n²) means 5000*5000=25*10⁶. For a python script in LeetCode, it will cost about 1 second for every 10⁶ operations. Thus 25*10⁶ will cost about 25 seconds, which is too long for a LeetCode question.

Therefore the best method to build a graph is not to traverse the wordList multiple times, but to just iterate all lower-case alphabets (be aware of the constraints beginWord, endWord, and wordList[i] consist of lowercase English letters.). By just iterating lower-case alphabets, I can reduce time to 260*5000=1.3*10⁶ (the max length of words in wordList is 10).

The code below also uses my old trick of visited nodes.

from collections import defaultdict

class Solution:
    def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
        words_set = set(wordList)
        conns = defaultdict(set)
        for word in wordList + [beginWord]:
            for index in range(len(word)):
                conns[word] |= {word[:index] + cand + word[index+1:] for cand in "abcdefghijklmnopqrstuvwxyz" if cand != word[index] and word[:index] + cand + word[index+1:] in words_set}
        # bfs
        queue = {beginWord}
        already = set()
        ans = 1
        while queue:
            new_queue = set()
            for node in queue:
                for _next in conns[node]:
                    if _next == endWord:
                        return ans + 1
                    new_queue.add(_next)
            already |= queue
            queue = new_queue - already
            ans += 1
        return 0

Divide and Conquer solution for LeetCode #494

The popular solution for LeetCode #494 is dynamic programming. But my first idea is Divide and Conquer. Although it’s not very fast, it’s another idea:

from collections import Counter

class Solution:
    def get_results(self, nums: List[int]) -> Counter:
        layer = [nums[0], -nums[0]]
        for num in nums[1:]:
            new_layer = []
            for item in layer:
                for val in [-num, num]:
                    new_layer.append(item + val)
            layer = new_layer
        return Counter(layer)
    
    def findTargetSumWays(self, nums: List[int], target: int) -> int:
        n = len(nums)
        if n == 1:
            return [0, 1][nums[0] == target or -nums[0] == target]
        half = n // 2
        left = self.get_results(nums[:half])
        right = self.get_results(nums[half:])
        ans = 0
        for lkey, lcnt in left.items():
            rcnt = right[target - lkey]
            ans += rcnt * lcnt
        return ans

An improvement makes the pass of LeetCode #2359

The first idea that jumped out of my mind was using Sets to track two nodes and pick up the first intersection node between these two Sets. Hence came out the first solution:

from collections import defaultdict

class Solution:
    def bfs(self, node1: int, node2: int, conns, length) -> int:
        set1 = {node1}
        set2 = {node2}
        step = 0
        while step <= length:
            inter = set1 & set2
            if len(inter) > 0:
                return min(list(inter))
            new_set1 = set()
            new_set2 = set()
            for node in set1:
                new_set1 |= conns[node]
            for node in set2:
                new_set2 |= conns[node]
            if len(new_set1 - set1) <= 0 and len(new_set2 - set2) <= 0:
                return -1
            set1 |= new_set1
            set2 |= new_set2
            step += 1
    
    def closestMeetingNode(self, edges: List[int], node1: int, node2: int) -> int:
        conns = defaultdict(set)
        for index, edge in enumerate(edges):
            if edge >= 0:
                conns[index].add(edge)
        return self.bfs(node1, node2, conns, len(edges))

I am pretty satisfied with this the simplicity of the above code. But unfortunately, it exceeded the time limit.

Sometimes we might not need to start a new solution before optimising the first one. Maybe I don’t need to use Set since they are too expensive in Python. Using an array to track all visited nodes instead and meeting a VISITED node means “intersection”. To distinguish visiting from two different nodes, I let Node1 mark “1” in the array and Node2 mark “2”. Then comes out my second solution. It’s a little longer but uses arrays instead of Sets:

class Solution:
    def closestMeetingNode(self, edges: List[int], node1: int, node2: int) -> int:
        if node1 == node2:
            return node1
        n = len(edges)
        visited = [0] * n
        step = 0
        visited[node1] = 1
        visited[node2] = 2
        while True:
            ans = []
            old_node1 = node1
            nxt = edges[node1]
            if nxt >= 0:
                if visited[nxt] == 0:
                    visited[nxt] = 1
                    node1 = nxt
                elif visited[nxt] == 2:
                    ans.append(nxt)
            old_node2 = node2
            nxt = edges[node2]
            if nxt >= 0:
                if visited[nxt] == 0:
                    visited[nxt] = 2
                    node2 = nxt
                elif visited[nxt] == 1:
                    ans.append(nxt)
            if len(ans) > 0:
                return min(ans)
            if old_node1 == node1 and old_node2 == node2:
                return -1
        return -1

As above, I use “old_node1” and “old_node2” to check for a dead loop. It beats 97% on time-spending. Not bad.

Road to solve LeetCode #322

My first solution is using dynamic programming. Then I want to also try breath-first-search. The first version of my BFS:

class Solution:
    def coinChange(self, coins: List[int], amount: int) -> int:
        if amount == 0:
            return 0
        # bfs
        depth, ans = 1, -1
        queue = [amount]
        while len(queue) > 0:
            new_queue = []
            for node in queue:
                for coin in coins:
                    if coin > node:
                        continue
                    if coin == node:
                        return depth
                    else:
                        new_queue.append(node-coin)
            queue = new_queue
            depth += 1
        return ans

It could get the correct answer but met TLE (Time Limit Exceeded) error when running the below case:

[2,3,5,7,11,13,17,19,21]
9999

After checking the variables “queue” and “new_queue”, I noticed there are many duplicate values. Therefore I set them to “set” instead of “list”:

class Solution:
    def coinChange(self, coins: List[int], amount: int) -> int:
        if amount == 0:
            return 0
        # bfs
        depth, ans = 1, -1
        queue = {amount}
        while len(queue) > 0:
            new_queue = set()
            for node in queue:
                for coin in coins:
                    if coin > node:
                        continue
                    if coin == node:
                        return depth
                    else:
                        new_queue.add(node-coin)
            queue = new_queue
            depth += 1
        return ans

It’s faster but still costs about 5 seconds (This is too long for a contest in LeetCode).

Actually, there are still many duplicated values. Not in one “queue”, but in different depths. Let me take this case:

Input: coins = [1,2,5], amount = 11

as an example. The BFS of it looks like this:

The program already starts to check “9” at the second layer, so it’s just a waste of time to check “9” again in the third layer. Thus, we can ignore any number in “new_queue” that is already in all the previous “queues”.

I will add a new set called “already” to record all the values the program ALREADY TO PROCESS, and minus it before searching the next depth. The new code only costs 90ms:

class Solution:
    def coinChange(self, coins: List[int], amount: int) -> int:
        if amount == 0:
            return 0
        # bfs
        depth, ans = 1, -1
        queue = {amount}
        already = set()
        while len(queue) > 0:
            new_queue = set()
            for node in queue:
                for coin in coins:
                    if coin > node:
                        continue
                    if coin == node:
                        return depth
                    else:
                        new_queue.add(node-coin)
            already |= queue
            queue = new_queue - already
            depth += 1
        return ans

———- 2023.01.24 ———-

Seems other BFS problems could also borrow this idea. Like this one I made https://leetcode.com/submissions/detail/884138668/

A BigQuery error about the partition

We were using client.query() (from Python API of BigQuery) to insert selected data into a table with a specific partition. But the script reported errors like:

google.api_core.exceptions.BadRequest: 400 Some rows belong to different partitions rather than destination partition

This note said it might be the cause of the incorrect date format for the partition. I checked the code but only found the partition format is correct.

The real reason is the input: the “selected data”. The data that will be inserted is from this SQL:

SELECT col1, col2, "2023-01-06" as partition_date FROM my_table;

The partition date set by the Python script bigquery.QueryJobConfig(destination="new_table$20230103") for the destination table is “2023-01-03” but the source data’s partition date is “2023-01-06”. This is why there is the above error.

Books I read in the year 2022

It does not mean I just read three books in the whole year of 2022 when I just showed three books above. Although reading some short history stories and books about financial knowledge, I still think those are not vital to my life experience.

As a software developer, why did I read a book about the semiconductor material: Silicon Carbide? Frankly speaking, just because of curiosity. In recent years, a lot of news and articles said SiC (Silicon Carbide) will be the future material of EVs (Electric Vehicles), and all the companies that produced SiC will become extremely popular and rich. But after skimming this book (or maybe a long paper), I realized that SiC has been found and produced for many years and at an early age, scientists only thought of it as a material for lighting and sensing. If even the most intelligent people consider SiC as “not very popular” about 10 years ago, why should I believe it will become “the future of EV” in the next 10 years only because some financial guys said this? Inflating a financial bubble is easy, but science is all about hard work and frequent failures.

“A Song of Ice and Fire” is a great novel. The only drawback after reading it (I mean, the first 5 volumes) is I couldn’t be interested in other novels for a long time. Even the “A Knight of the Seven Kingdoms”, written by George RR Martin himself, can’t match it. The first story for Dunk the Hedge Knight is wonderful, but the left two are normal.

The reason I put the algorithm book here (also in 2021) is that I am still trying to revisit and learn algorithms in 2022. After graduating from school, I spend quite a lot of time learning technology about machine learning, compilers, operating systems, and even semiconductors. But not the algorithms. Since learning new algorithms are terribly tedious, I avoid touching them for such a long time. How could a software engineer try to learn everything but algorithms? I felt a little regret. So, I will do it now.

Using pendulum in Python

pendulum is a prevalent python library in my company. For example, if I want to get the time of the previous Monday, it could be written:

pendulum.today("US/Pacific").previous(pendulum.MONDAY)
# return type "pendulum.datetime.DateTime`

But this will return a class of pendulum.datetime.DateTime. What if I wish Date instead of pendulum.datetime.DataTime? Actually, the pendulum.datetime.DateTime inherited from datetime.datetime so we can use .date():

pendulum.today("US/Pacific").previous(pendulum.MONDAY).date()
# return type "pendulum.datetime.Date`

Resolve dependencies in Argo workflow

My configuration DAG of Argo workflow was like this:

{
  name: "my-workflow",
  dag: {
    tasks: [
      {
        name: "step1",
        template: "template1",
      },
      {
        name: "manual-check",
        template: "template-manual-check",
        depends: "step1.Failed",
      },
      {
        name: "step2",
        template: "template2",
        depends: "manual-check",
      },
    ],
  },
}

If “step1” failed, the “manual-check” will suspend the whole pipeline and let users (or customers) decide whether this pipeline could continue. But I met a funny situation when “step1” is successful: the “manual-check” step was omitted by workflow and “step2” would never be executed because it depends on “manual-check”!

The correct solution should let “step2” run when “step1” is successful. Therefore I need to change the configuration to this:

{
  ......
      {
        name: "step2",
        template: "template2",
        depends: "manual-check || step1.Succeeded",
      },
    ],
  },
}

Thanks to this reference: “Enhanced Depends Logic“.

Using Python to run BigQuery job with project id

Here is the code for me to query a table of BigQuery:

from google.cloud import bigquery
from google.cloud.bigquery_storage import BigQueryReadClient

client = bigquery.Client()
storage_client = BigQueryReadClient()
df = client.query("select * from my_table1").to_dataframe(bqstorage_client=storage_client)

Then it reported the error:

“Access Denied: Project PRJ_B: User does not have bigquery.jobs.create permission in project PRJ_B.”

But actually, I want to launch a job in project PRJ_A. So I add a shell command “gcloud config set project PRJ_A” before running this python script. But the errors continued.

After searching the API doc of Python BigQuery, I found out that the “bigquery.Client()” function could add an argument:

client = bigquery.Client(project="PRJ_A")

Now the script works well.