enrichMobile Pub/Sub sample¶

A minimal Python sample showing how to publish enrichMobile job requests to Modigie's Pub/Sub interface and consume the job responses through a pull subscription (long-polling).

It consists of two scripts and a CSV file. Download all files by clicking on each link:

publisher.py: reads a CSV of input rows and publishes one request per row
subscriber.py: pulls responses from the response subscription and writes completed ones to disk
`sample_input.csv: the CSV file with sample inputs that you can use safely with a synthetic repository

Requirements¶

Python 3.9+
google-cloud-pubsub
A GCP identity (user account or service account) with permission to publish to the request topic and pull from the response subscription

Install the dependency:

pip install google-cloud-pubsub

Authenticating to GCP¶

The scripts use Application Default Credentials (ADC). If you've never set ADC on this machine, both scripts will fail with an authentication error on the first call.

Option 1 — user account (easiest for trying out the sample):

The first command opens a browser for you to sign in.

gcloud auth application-default login

The second tells GCP which project to bill the API calls against — without it you'll see a UserWarning on every run (harmless, but noisy).

Important

Before executing the second command, replace the PROJECT_ID with the configuration value you received from Modigie.

gcloud auth application-default set-quota-project PROJECT_ID  # (1)!

Replace PROJECT_ID with the exclusive tenant project ID of your Modigie Org Account. Example: modigie-c-q8irqgvocgg8lt4dwa3e.

Option 2 — service account (recommended for unattended use):

Download the service account key JSON from GCP, then point ADC at it:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json  # (1)!

Replace with the actual path to your service account key JSON file.

The service account must have these IAM roles on the project:

roles/pubsub.publisher on the request topic
roles/pubsub.subscriber on the response subscription

By default, this would be the service account Application Account that Modigie shared with you and the necessary IAM roles are already granted to it.

Configuration¶

Both scripts have project, topic, and subscription IDs hardcoded as constants at the top of the file.

Important

You're running against a different environment (your own project, a different account, etc.), therefore please replace them with the configuration values you received from Modigie.

PROJECT_ID and TOPIC_ID in publisher.py

# TODO: Replace with GCP project that owns the Pub/Sub topic.
PROJECT_ID = "modigie-c-q8irqgvocgg8lt4dwa3e"  # (1)!

# TODO: Replace with Pub/Sub topic that receives enrichMobile JobV2 request messages.
TOPIC_ID = "inpubsub-job-request-enrichmobile-2678a46bd60d84c"  # (2)!

Replace PROJECT_ID with the exclusive tenant project ID of your Modigie Org Account.
Replace TOPIC_ID with your job request topic for the job type enrichMobile. The part after /topics/.

PROJECT_ID and SUBSCRIPTION_ID in subscriber.py

# GCP project that owns the Pub/Sub subscription.
PROJECT_ID = "modigie-c-q8irqgvocgg8lt4dwa3e"  # (1)!

# Pub/Sub subscription that receives enrichMobile lifecycle/job response events.
SUBSCRIPTION_ID = "inpubsub-job-response-enrichmobile-pull-all-externalendpoint-2678a46bd60d84c"  # (2)!

Replace PROJECT_ID with the exclusive tenant project ID of your Modigie Org Account.
Replace SUBSCRIPTION_ID with your job response pull subscription for the job type enrichMobile. The part after /subscriptions.

Important

If your repository allows multiple job types, make sure that you pick the correct topic and subscription. Topics and subscriptions are job-type specific.

Publisher¶

Reads a CSV and publishes one job request per row.

python publisher.py sample_input.csv

CSV input format¶

The publisher expects a CSV file with a header row.

Required columns:

requestId
firstName
lastName
At least one of email or linkedinUrl (both may be provided on the same row)

Optional columns:

jobTitle
company
expireAfter

Expected header row:

requestId,firstName,lastName,jobTitle,company,email,linkedinUrl,expireAfter

Example CSV¶

The included sample_input.csv exercises every code path the subscriber handles — successful completion across the three input shapes (email-only, URL-only, both), a job-level rejection, and a request-level discard:

requestId,firstName,lastName,jobTitle,company,email,linkedinUrl,expireAfter
req_b01_email_only,John,Doe,Software Engineer,Acme Inc.,jdoe@acme.com,,P2D
req_b02_url_only,Kelly,Sample,Engineer,Acme Inc.,,https://www.linkedin.com/in/kelly-location,P2D
req_b03_email_and_url,Jane,Smith,Product Manager,Globex,jsmith@globex.com,https://www.linkedin.com/in/kelly-country-location,P2D
req_b04_missing_company,Kelly,Sample,,,kelly@example.com,https://www.linkedin.com/in/kelly-email-country-location,
req_b05_bad_email,Bob,Test,Engineer,Acme Inc.,not-a-valid-email,,P2D
req_b01_email_only,Different,Person,,,other@example.com,,

What each row demonstrates:

Row	Demonstrates
`req_b01_email_only` (1st)	Successful completion with email-only input
`req_b02_url_only`	Successful completion with LinkedIn URL only
`req_b03_email_and_url`	Successful completion with both inputs
`req_b04_missing_company`	Job-level rejection (missing `employment.company.title`)
`req_b05_bad_email`	Job-level rejection (no usable identifier)
`req_b01_email_only` (2nd)	Request-level discard (`DuplicateJobIdError`)

Notes¶

requestId must be unique per row. Used in job request as Pub/Sub attribute modigieJobRequestId. Reusing a requestId with a different payload triggers a DuplicateJobIdError discard; reusing one with an identical payload causes the service to replay the previous final response rather than create a new job.
expireAfter must be an ISO-8601 duration, for example P1D (1 day) or PT4H (4 hours). Used in job request as Pub/Sub attribute modigieJobExpireAfter. Also see Duration | ISO 8601 | Wikipedia.
The script does not fully validate business rules before publishing.
Validation is mainly handled by the downstream service.

Subscriber¶

Pulls responses from the response subscription and processes them based on what the service sent. The service emits two kinds of messages:

Request discards: signaled by the modigieJobRequestDiscardReasonType Pub/Sub attribute. The request was malformed (missing/invalid/duplicate modigieJobRequestId, or invalid JSON body) and no job was created. There will be no further messages for the requestId. In a production app, mark the requestId as terminal in your own tracking so you don't wait for responses that will never come.
Job lifecycle messages: created, validated, enqueued, dispatching, processing, completed, rejected, failed, etc. Multiple messages may be emitted per job as it progresses. Only completed is persisted to disk; the rest are logged.

python subscriber.py            # writes to ./responses
python subscriber.py /tmp/out   # custom output dir

Each completed response is written to {outputDir}/{requestId}.json, so you can join input CSV rows to output files by requestId.

Runs until Ctrl-C.

Running end-to-end¶

Use two terminals. Start the subscriber first — Pub/Sub will hold messages on the subscription either way, but starting the subscriber first lets you watch responses arrive in real time instead of wondering whether anything is happening.

Terminal 1 — subscriber:

python subscriber.py

Terminal 2 — publisher:

python publisher.py sample_input.csv

You'll see lifecycle messages flow through the subscriber:

[created] req_b01_email_only
[validated] req_b01_email_only
[enqueued] req_b01_email_only
[dispatching] req_b01_email_only
[processing] req_b01_email_only
[completed] req_b01_email_only -> /path/to/responses/req_b01_email_only.json

Rejected jobs (the request became a job, then validation failed) include the reason:

[rejected] req_b04_missing_company - The job payload must contain `employment.company.title` of type `string`

Discarded requests (the request never became a job — typically a duplicate or malformed modigieJobRequestId) include the discard reason type:

[discarded] req_b01_email_only - no job was created due to DuplicateJobIdError - {"message":"..."}

When you receive a discard, your app should mark the requestId as terminal in its own tracking no further messages will arrive for it.