Ecommerce Product Catalog Optimization at Scale with AI: Python, OpenAI, and Azure's Bing Search API

Written by Max E. Sequeira Garza | Oct 2, 2024 4:57:54 AM

In today’s e-commerce landscape, product descriptions and SEO metadata are crucial to engaging customers and improving search visibility. At Wain.cr, a wine importer and online distributor, we started with only 180 products out of 786 having any meaningful product page descriptions, metadata, or proper categorization. To rectify this, we implemented a scalable solution using Anaconda Python, OpenAI’s API, and Azure's Bing Search API to generate full descriptions, optimize metadata, and categorize the entire product catalog.

This article outlines our journey in enriching Wain.cr's product catalog, from conceptualizing the approach to deploying Python scripts that leveraged AI to automate and scale the process. By the end, you’ll have a step-by-step guide, along with Python code examples, to DIY this process in your own business.

IMPORTANT: Not a single platform mentioned in this article is sponsoring or paying me for referrals. These are tools I have used and suggest in a professional manner for your own business benefit.

I. Phase 1: Product Title Optimization and Categorization
II. Phase 2: Product Description Generation
III. Conclusion: Scaling Ecommerce Content Enrichment with AI

I. Product Title Optimization and Categorization

Problem: Missing Product Descriptions and Metadata

Initially, Wain.cr's catalog had limited information—about 180 products had proper descriptions, SEO titles, and metadata. The rest were blank, hindering both customer experience and SEO performance. Our goal was to enrich the catalog in a way that provided real value to customers while improving search engine rankings.

Solution: DIY Step-by-Step Guide

Step 1: Defining the Strategy

The first task was to optimize the product titles and categories for 786 products. We began by gathering essential product details such as wine variety, region, country, and brand, all of which we could extract programmatically using a combination of OpenAI’s GPT model and Bing Search API.

Step 2: Python Script to Fetch Information

We used Python to automate the extraction of relevant product information. First, we created a script that would pull relevant data from both OpenAI and Bing Search API to match each product title with its correct metadata. The script split the process into:
- Structured Information: Extracting wine name, brand, region, and country.
- Bing Search Results: Using Bing Search API to gather additional context and verify the information.
This structured data allowed us to generate optimized titles, enriched product metadata, and better categorize each product.

Here’s an example of the Python code used for this phase:

import pandas as pd import openai import requests import time # Load your product catalog CSV df = pd.read_csv('wain-cr-product-catalog-titles.csv', encoding='UTF-8') # Set OpenAI API key openai.api_key = 'your-openai-api-key' # Set Bing Search API key bing_api_key = 'your-bing-api-key' # Function to search Bing API def search_bing(product_title): search_url = "https://api.bing.microsoft.com/v7.0/search" headers = {"Ocp-Apim-Subscription-Key": bing_api_key} params = {"q": product_title, "count": 1} response = requests.get(search_url, headers=headers, params=params) if response.status_code == 200: search_results = response.json() if 'webPages' in search_results: return search_results['webPages']['value'][0]['snippet'] return "No relevant data found" # Function to generate OpenAI enriched data def generate_openai_enrichment(product_title): prompt = f"""You are tasked with optimizing a product catalog for wine. Based on the product title: {product_title}, return a structured description including: - Grape Varietals - Alcohol Content - Professional Ratings - Tasting Notes (Appearance, Nose, Palate) - Recommended Pairings.""" response = openai.Completion.create( model="gpt-3.5-turbo", prompt=prompt, max_tokens=300, temperature=0.5 ) return response['choices'][0]['text'] # Create a new DataFrame for storing results output_data = [] for index, row in df.iterrows(): product_title = row['Title'] # Get Bing search result search_result = search_bing(product_title) # Get OpenAI enriched details enriched_details = generate_openai_enrichment(product_title) # Append results output_data.append({ 'Handle': row['Handle'], 'Title': product_title, 'Bing Search Result': search_result, 'Enriched Details': enriched_details }) # Sleep to avoid hitting API rate limits time.sleep(30) # Save the enriched data to CSV output_df = pd.DataFrame(output_data) output_df.to_csv('enriched_product_catalog.csv', index=False, encoding='UTF-8')

Step 3: Processing and Organizing Data

After running the script, we used the enriched data to create accurate and SEO-optimized product titles and metadata. This not only improved search engine visibility but also provided valuable context for customers browsing the site.

TOP

II. Product Description Generation

Problem: Generic and Missing Product Descriptions

Having optimized the product titles and categories, we moved on to generating detailed product descriptions for each wine in the full catalog (only 180 products out of 786 had any meaningful product descriptions showcasing any value to potential consumers -especially those not familiar with top rated / high priced wines). We needed descriptions that were:

Structured and informative for customers seeking detailed information.
Conversational and persuasive for customers looking for a quick, engaging read.

Solution: DIY Step-by-Step Guide

Step 1: Defining the Prompts

We set in place a process I learnt from the AI Exchange's Prompting for AI Operations Certification Program which aims to get the BEST prompt for a task through testing and iterating with Evals. An evals process in the domain of AI consists of creating and running small evaluation tests to improve prompt performance, by clearly defining prompt alternatives, examples, and definining evaluation criteria to choose the prompt with optimal results. In this case, the evals process actually resulted in actually generating two different prompts which generated two versions of product descriptions:

Structured and Detailed: To offer clear, fact-based information.
Conversational and Persuasive: To entice the reader with a more friendly and engaging tone.

Step 2: Python Script to Generate Descriptions

We used OpenAI to generate two types of descriptions for each product and a short summary for metadata. The script pulled product titles from a CSV file and generated descriptions in bulk.

import pandas as pd import openai import time # Load your product catalog CSV df = pd.read_csv('wain-cr-product-catalog-titles.csv', encoding='UTF-8') # Set OpenAI API key openai.api_key = 'your-openai-api-key' # Define the prompts prompt_1 = """ You are tasked with enriching and optimizing a product catalog for wine. Based on the wine title '{product_title}', generate an enriched, structured description: - Grape Varietals - Alcohol Content - Professional Ratings - Tasting Notes (Appearance, Nose, Palate) - Recommended Pairings """ prompt_2 = """ You are an AI copywriter for a high-end e-commerce wine store. Based on the wine title '{product_title}', write a persuasive, conversational description that speaks directly to the customer, highlighting: - Unique characteristics of the wine - Region and winery history - Food pairings - Notable awards or ratings End with a subtle call to action encouraging purchase. """ prompt_summary = """ Summarize the following product description into a 145-character short description: '{conversational_description}' """ # Function to generate OpenAI response def generate_gpt_response(prompt): response = openai.Completion.create( model="gpt-3.5-turbo", prompt=prompt, max_tokens=1000, temperature=0.4 ) return response['choices'][0]['text'] # Initialize an empty list to store output data output_data = [] # Loop through each product for index, row in df.iterrows(): product_title = row['Title'] # Generate structured details prompt_1_filled = prompt_1.format(product_title=product_title) structured_details = generate_gpt_response(prompt_1_filled) # Generate conversational description prompt_2_filled = prompt_2.format(product_title=product_title) conversational_description = generate_gpt_response(prompt_2_filled) # Generate short description prompt_summary_filled = prompt_summary.format(conversational_description=conversational_description) short_description = generate_gpt_response(prompt_summary_filled) # Store outputs output_data.append({ 'Handle': row['Handle'], 'Title': product_title, 'Structured Details': structured_details, 'Conversational Description': conversational_description, 'Short Description': short_description }) # Add a delay to avoid rate limits time.sleep(30) # Convert the data to DataFrame and save as CSV output_df = pd.DataFrame(output_data) output_df.to_csv('enriched_wain_cr_descriptions.csv', index=False, encoding='UTF-8')

Step 3: Results

By the end of this phase, all 786 products had fully enriched, categorized, inter-linked, SEO-optimized product descriptions with short summaries for both customer to quickly read through and for meta content purposes (check it out). The descriptions were a blend of factual, structured information and persuasive content, providing value to customers of varying preferences.
Rich content and categorization on product pages now also allows Wain.cr to develop more accurate applications such as RAG chatbots, predictive and personalization models based on matching customer preferences, as well as richer BI reporting over visitation and purchase events happening across their ecommerce product catalog.
Optimizing an ecommerce catalog at scale using AI is hands-down a winning move! That said though, a Word of Caution: The evals process proved that the LLM model used [gpt-3.5-turbo] resulted in accurate results due to the nature of the product category (wines), but this might not be the case for all product categories. In some (if not most) cases, a Retreival Augmented Generation (RAG) process might be best advised, where you would firstly encode data you already have at hand about your product catalog, and then request the LLM to paraphrase, categorize, structure and/or enrich it for you. This will provide results grounded on veritable data related to your actual product catalog and eliminate any potential hallucinations on the results.

TOP

III. Conclusion: Scaling Ecommerce Content Enrichment with AI

In just two phases, we transformed Wain.cr's product catalog from 180 incomplete listings to 786 enriched and optimized products. This not only improved SEO rankings but also enhanced the customer experience, driving more informed purchasing decisions.

If you're looking to implement something similar for your business, feel free to follow the steps outlined in this article and use the Python code provided. Should you need assistance, don’t hesitate to reach out to me—I'd be happy to help you scale your content optimization efforts.

TOP

View full post

Ecommerce Product Catalog Optimization at Scale with AI: Python, OpenAI, and Azure's Bing Search API

IMPORTANT: Not a single platform mentioned in this article is sponsoring or paying me for referrals. These are tools I have used and suggest in a professional manner for your own business benefit.

TABLE OF CONTENTS

I. Product Title Optimization and Categorization

Problem: Missing Product Descriptions and Metadata

Solution: DIY Step-by-Step Guide

II. Product Description Generation

Problem: Generic and Missing Product Descriptions

Solution: DIY Step-by-Step Guide

III. Conclusion: Scaling Ecommerce Content Enrichment with AI