Firstly, I use a function to transform words into word-embedding:

def text_to_array(text, embeddings_index):
    empty_embed = np.zeros(EMBEDDING_LENGTH, dtype = np.float32)
    text = text[:-1].split()[:MAX_TEXT_LENGTH]
    embeds = []
    for x in text:
        em = embeddings_index.get(x)
        if em is not None:
            embeds.append(em)
    embeds += [empty_embed] * (MAX_TEXT_LENGTH - len(embeds))
    return np.array(embeds, dtype = np.float32)

But I noticed that it costs quite a few CPU resource while GPU usage is still low. The reason is simple: using single thread python to do search in dictionary is uneffective. We should use Embedding layer in Keras to put all word-embedding-table into GPU memory.
The code is not difficult to understand:

...
    # word_index = tokenizer.word_index
    nb_words = max(MAX_WORDS, len(word_index))
    embedding_matrix = np.random.normal(emb_mean, emb_std, (nb_words, embed_size))
    for word, i in tqdm(word_index.items()):
        if i >= nb_words: continue
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None: embedding_matrix[i] = embedding_vector
...
    inp = Input(shape = (MAX_TEXT_LENGTH,))
    net = Embedding(embedding_matrix.shape[0], EMBEDDING_LENGTH, weights=[embedding_matrix], trainable = False)(inp)
...

This time, the program run two times faster than before. Using GPU memory (GDDR) to find word embedding is the right way.