Building a Python Project with Hugging Face Translation Model Using FastAPI and React JS Frontend: Performance Analysis and Estimation

6 min readOct 21, 2024

Building a Python Project with Hugging Face Translation Model Using FastAPI and React JS Frontend: Performance Analysis and Estimation

Quick summary:

This article outlines the process of creating a client (using React JS, Vite, and TypeScript) and a server (utilizing Python FastAPI, the Transformers library, and the Helsinki-NLP translation model) to demonstrate and analyze the performance of the translation model in real-time applications.

About model:

The Helsinki-NLP translation models are part of the Hugging Face model hub, created by the University of Helsinki’s Natural Language Processing (NLP) group. These models are based on the MarianMT framework, a highly efficient machine translation model designed for multiple languages.

These models, like opus-mt-en-fr (English to French) or opus-mt-fr-en (French to English), are trained on the OPUS corpus, a multilingual dataset containing parallel text data from various sources. They support translations between many languages, especially low-resource ones, and are pre-trained using a transformer architecture.

Key aspects:

Wide language support: Can handle many language pairs, including low-resource languages.
Performance: Optimized for translation tasks in real-time applications.
Model availability: These models are available for free on the Hugging Face Model Hub, where users can easily integrate them into their projects using libraries like transformers.

These models are commonly used in academic research, multilingual NLP applications, and real-time translation systems.

For more details, you can explore the models on the Hugging Face platform: Hugging Face Helsinki-NLP models.

Let’s start with FastAPI backend (server side):

mkdir huggingface_translation
cd huggingface_translation

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install the required packages:

pip install transformers torch sentencepiece sacremoses psutil

Now, let’s create a simple Python script that uses a Hugging Face translation model with some additional performance metrics (memory and execution time):


from fastapi import FastAPI
from pydantic import BaseModel
from fastapi.concurrency import run_in_threadpool
import torch
import psutil
import os
import time
from transformers import pipeline

app = FastAPI()

def get_size(bytes, suffix="B"):
    
    factor = 1024
    for unit in ["", "K", "M", "G", "T", "P"]:
        if bytes < factor:
            return f"{bytes:.2f}{unit}{suffix}"
        bytes /= factor

def translate(text, src_lang="en", tgt_lang="fr"):
    
    device = 0 if torch.cuda.is_available() else -1
    
    start_time = time.time()
    if torch.cuda.is_available():
        torch.cuda.reset_peak_memory_stats()
        start_mem = torch.cuda.memory_allocated()
    else:
        process = psutil.Process(os.getpid())
        start_mem = process.memory_info().rss

    translator = pipeline("translation", model=f"Helsinki-NLP/opus-mt-{src_lang}-{tgt_lang}", device=device)
    translated = translator(text, max_length=40)[0]['translation_text']

    end_time = time.time()
    if torch.cuda.is_available():
        end_mem = torch.cuda.max_memory_allocated()
        mem_diff = end_mem - start_mem
    else:
        end_mem = process.memory_info().rss
        mem_diff = end_mem - start_mem

    computation_time = end_time - start_time

    return translated, get_size(mem_diff), computation_time

class TranslationRequest(BaseModel):
    text: str
    src_lang: str = "en"
    tgt_lang: str = "fr"

@app.post("/translate")
async def translate_text(req: TranslationRequest):
    translated_text, memory_used, comp_time = await run_in_threadpool(translate, req.text, req.src_lang, req.tgt_lang)
    response = {
        "original_text": req.text,
        "translated_text": translated_text,
        "memory_used": memory_used,
        "computation_time": f"{comp_time:.2f} seconds",
        "device": "GPU" if torch.cuda.is_available() else "CPU"
    }
    return response

This code demonstrates a FastAPI-based backend service that translates text using the Hugging Face transformers library. It implements the Helsinki-NLP translation models, allowing translation between different language pairs. Here’s a breakdown:

Memory Tracking and Performance Monitoring: The translate function calculates memory usage and execution time for the translation operation, supporting both GPU and CPU processing.
FastAPI POST Route: The /translate endpoint accepts a POST request with text, src_lang, and tgt_lang. It calls the translation function, logs memory used, translation time, and whether the operation used GPU or CPU.
Dependencies: Includes torch for GPU computation, psutil for memory usage stats, and transformers for the translation pipeline.
run_in_threadpool: This method is used to execute synchronous functions within an asynchronous context, without blocking the event loop. It moves the CPU-bound translate function to a separate thread.
Async Route: The route remains asynchronous, allowing FastAPI to continue handling other requests while the translation runs.This API is optimized for performance tracking, with data being returned to clients about the translation, computation time, and device used (GPU/CPU).

To run the backend:

uvicorn huggingface_translation:app --reload

Let’s proceed with Frontend (TypeScript) Client side:

npm create vite@latest my-react-app -- --template react-ts
cd my-react-app
npm install

Update App.tsx (in src directory)

import { useState } from 'react';
import './App.css';

// Define the interface for the request data structure
interface TranslateRequest {
  text: string;
  src_lang: string;
  tgt_lang: string;
}

// Define an array of language options
const languageOptions = [
  { code: 'en', name: 'English' },
  { code: 'fr', name: 'French' },
  { code: 'es', name: 'Spanish' },
  { code: 'de', name: 'German' },
  { code: 'zh', name: 'Chinese' },
  { code: 'ru', name: 'Russian' },
  // Add more languages as needed
];

const App = () => {
  const [text, setText] = useState<string>('');
  const [srcLang, setSrcLang] = useState<string>('en'); // State for source language
  const [tgtLang, setTgtLang] = useState<string>('fr'); // State for target language
  const [translated, setTranslated] = useState<string>('');

  const handleSubmit = async (event: React.FormEvent) => {
    event.preventDefault();
    console.log("Form submitted with text:", text); // Log the input text

    // Create the request data based on the interface
    const requestData: TranslateRequest = {
      text,
      src_lang: srcLang, // Use the source language from state
      tgt_lang: tgtLang, // Use the target language from state
    };

    const response = await fetch('/api/translate', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify(requestData), // Use requestData here
    });

    if (!response.ok) {
      console.error("Error in response:", response); // Log response if not ok
      return;
    }

    const data = await response.json();
    console.log("Translated text received:", data.translated_text);
    setTranslated(data.translated_text);
  };

  return (
    <div className="App">
      <form onSubmit={handleSubmit}>
        <input
          type="text"
          value={text}
          onChange={(e) => setText(e.target.value)}
          placeholder="Enter text to translate"
        />
        <br></br>
        {/* Dropdown for source language selection */}
        From: <select value={srcLang} onChange={(e) => setSrcLang(e.target.value)}>
          {languageOptions.map((lang) => (
            <option key={lang.code} value={lang.code}>
              {lang.name}
            </option>
          ))}
        </select>
        <br></br>
        {/* Dropdown for target language selection */}
        To: <select value={tgtLang} onChange={(e) => setTgtLang(e.target.value)}>
          {languageOptions.map((lang) => (
            <option key={lang.code} value={lang.code}>
              {lang.name}
            </option>
          ))}
        </select>
        <br></br>
        <button type="submit">Translate</button>
      </form>
      <p>{translated}</p>
    </div>
  );
};

export default App; // Exporting as default

This React app allows users to input text, select source and target languages from dropdowns, and submit the text for translation. It uses useState hooks to manage the input text, source language, target language, and translated text. When the form is submitted, a POST request is sent to a backend API (/api/translate), which processes the translation and returns the result. The app displays the translated text after receiving the response.

The code includes:

TranslateRequest Interface: Defines the structure of the data sent to the API (text, source language, target language).
Language Selection: A dropdown menu allows users to choose the source and target languages dynamically.
State Management: The useState hook is used for handling text input, language selection, and translated text.

Just add proxy to proceed with /api/ path correctly in vite.config.ts:

import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';

// https://vitejs.dev/config/
export default defineConfig({
  plugins: [react()],
  server: {
    proxy: {
      '/api': {
        target: 'http://127.0.0.1:8000',
        changeOrigin: true,
        rewrite: (path) => path.replace(/^\/api/, '')
      }
    }
  }
});

Run the app

npm run dev

What you should see in the browser windows (mind the port name in terminal window)

.Source text and translated result (english to french)

Let’s check the performance (MacBook Pro M1Max, GPU not supported):

Use http://127.0.0.1:8000/docs#/default for Swager (thanks to FastAPI Swagger available by default so no new code needed):

Request with text and language selections

And result, not too bad, under 2 sec:

Once the translation model is loaded, subsequent translations might not require loading the full model into memory again. This means that only minimal memory is allocated during translation, which results in low memory usage for each translation task after the initial load.

Time to execute raw data:

Basic visualisation and analysis:

As you can see there is a close to linear function of text lengths vs execute time which is expected from Helsinki-NLP translation model

Summary:

In this article we’ve created client (React JS, Vite, TypeScript) and Server (Python FastAPI, Transformers library, Helsinki-NLP translation model) and analyzed performance of this solution.

Building a Python Project with Hugging Face Translation Model Using FastAPI and React JS Frontend: Performance Analysis and Estimation

Written by Alexander Uspenskiy

No responses yet