Streaming NLP infrastructure on Dataflow

Trustpilot is an e-commerce reviews platform delivering millions of new reviews to businesses each week. We are using Apache Beam on GCP Dataflow to deliver real-time streaming inferences with the latest NLP transformer models.

Our talk will touch on:

  • Infrastructure setup to enable Python Beam to interface with Kafka for streaming data
  • Taking advantage of Beam’s unified programming model to enable batch jobs for backfilling via BigQuery
  • Working with GPUs on Dataflow to speed up local model inference
  • MLOps: Using Dataflow as part of a continuous evaluation model monitoring setup