Static Jekyll sites can leverage API-driven content to combine the performance of static generation with the dynamism of real-time data. By using Ruby for sophisticated API integration and Cloudflare Workers for edge API handling, you can build hybrid sites that fetch, process, and cache external data while maintaining Jekyll's simplicity. This guide explores advanced patterns for integrating APIs into Jekyll sites, including data fetching strategies, cache management, and real-time updates through WebSocket connections.
API integration for Jekyll requires a layered architecture that separates data fetching, processing, and rendering while maintaining site performance and reliability. The system must handle API failures, data transformation, and efficient caching.
The architecture employs three main layers: the data source layer (external APIs), the processing layer (Ruby clients and Workers), and the presentation layer (Jekyll templates). Ruby handles complex data transformations and business logic, while Cloudflare Workers provide edge caching and API aggregation. Data flows through a pipeline that includes validation, transformation, caching, and finally integration into Jekyll's static output.
# API Integration Architecture:
# 1. Data Sources:
# - External REST APIs (GitHub, Twitter, CMS, etc.)
# - GraphQL endpoints
# - WebSocket streams for real-time data
# - Database connections (via serverless functions)
#
# 2. Processing Layer (Ruby):
# - API client abstractions with retry logic
# - Data transformation and normalization
# - Cache management and invalidation
# - Error handling and fallback strategies
#
# 3. Edge Layer (Cloudflare Workers):
# - API proxy with edge caching
# - Request aggregation and batching
# - Authentication and rate limiting
# - WebSocket connections for real-time updates
#
# 4. Jekyll Integration:
# - Data file generation during build
# - Liquid filters for API data access
# - Incremental builds for API data updates
# - Preview generation with live data
# Data Flow:
# External API → Cloudflare Worker (edge cache) → Ruby processor →
# Jekyll data files → Static site generation → Edge delivery
Ruby API clients provide robust external API integration with features like retry logic, rate limiting, and data transformation. These clients abstract API complexities and provide clean interfaces for Jekyll integration.
# lib/api_integration/clients/base.rb
module ApiIntegration
class Client
include Retryable
include Cacheable
def initialize(config = {})
@config = default_config.merge(config)
@connection = build_connection
@cache = Cache.new(namespace: self.class.name.downcase)
end
def fetch(endpoint, params = {}, options = {})
cache_key = generate_cache_key(endpoint, params)
# Try cache first
if options[:cache] != false
cached = @cache.get(cache_key)
return cached if cached
end
# Fetch from API with retry logic
response = with_retries do
@connection.get(endpoint, params)
end
# Process response
data = process_response(response)
# Cache if requested
if options[:cache] != false
ttl = options[:ttl] || @config[:default_ttl]
@cache.set(cache_key, data, ttl: ttl)
end
data
rescue => e
handle_error(e, endpoint, params, options)
end
protected
def default_config
{
base_url: nil,
default_ttl: 300,
retry_count: 3,
retry_delay: 1,
timeout: 10
}
end
def build_connection
Faraday.new(url: @config[:base_url]) do |conn|
conn.request :retry, max: @config[:retry_count],
interval: @config[:retry_delay]
conn.request :timeout, @config[:timeout]
conn.request :authorization, auth_type, auth_token if auth_token
conn.response :json, content_type: /\bjson$/
conn.response :raise_error
conn.adapter Faraday.default_adapter
end
end
def process_response(response)
# Override in subclasses for API-specific processing
response.body
end
end
# GitHub API client
class GitHubClient < Client
def initialize(token = nil)
super(
base_url: 'https://api.github.com',
default_ttl: 600
)
@token = token || ENV['GITHUB_TOKEN']
end
def repository(repo_name)
fetch("/repos/#{repo_name}", cache: true, ttl: 3600)
end
def recent_commits(repo_name, limit = 10)
fetch("/repos/#{repo_name}/commits",
{ per_page: limit, page: 1 },
cache: true, ttl: 300)
end
def issues(repo_name, state = 'open', labels = nil)
params = { state: state }
params[:labels] = labels if labels
fetch("/repos/#{repo_name}/issues", params, cache: true, ttl: 180)
end
protected
def auth_token
@token
end
def auth_type
:token
end
end
# Twitter API client (using v2)
class TwitterClient < Client
def initialize(bearer_token = nil)
super(base_url: 'https://api.twitter.com/2')
@bearer_token = bearer_token || ENV['TWITTER_BEARER_TOKEN']
end
def user_tweets(username, max_results = 10)
# Get user ID first
user = user_by_username(username)
return [] unless user
fetch("/users/#{user['id']}/tweets", {
max_results: max_results,
'tweet.fields': 'created_at,public_metrics',
expansions: 'author_id'
}, cache: true, ttl: 300)
end
protected
def auth_token
@bearer_token
end
def auth_type
:bearer
end
end
# Data processor for API responses
class DataProcessor
def initialize(transformations = {})
@transformations = transformations
end
def process(data, type = nil)
return data unless type && @transformations[type]
transformation = @transformations[type]
apply_transformation(data, transformation)
end
private
def apply_transformation(data, transformation)
case transformation
when Proc
transformation.call(data)
when Symbol
send(transformation, data)
when Hash
transform_with_mapping(data, transformation)
else
data
end
end
def transform_with_mapping(data, mapping)
result = {}
mapping.each do |source_key, target_config|
next unless data.key?(source_key)
if target_config.is_a?(Hash)
# Nested transformation
result[target_config[:as]] = apply_transformation(data[source_key], target_config[:transform])
else
# Direct mapping
result[target_config] = data[source_key]
end
end
result
end
end
end
Cloudflare Workers act as an API proxy that provides edge caching, request aggregation, and security features for external API calls from Jekyll sites.
// workers/api-proxy.js
// API proxy with edge caching and request aggregation
export default {
async fetch(request, env, ctx) {
const url = new URL(request.url)
const apiEndpoint = extractApiEndpoint(url)
// Check for cached response
const cacheKey = generateCacheKey(request)
const cached = await getCachedResponse(cacheKey, env)
if (cached) {
return new Response(cached.body, {
headers: cached.headers,
status: cached.status
})
}
// Forward to actual API
const apiRequest = buildApiRequest(request, apiEndpoint)
const response = await fetch(apiRequest)
// Cache successful responses
if (response.ok) {
await cacheResponse(cacheKey, response.clone(), env, ctx)
}
return response
}
}
async function getCachedResponse(cacheKey, env) {
// Check KV cache
const cached = await env.API_CACHE_KV.get(cacheKey, { type: 'json' })
if (cached && !isCacheExpired(cached)) {
return {
body: cached.body,
headers: new Headers(cached.headers),
status: cached.status
}
}
return null
}
async function cacheResponse(cacheKey, response, env, ctx) {
const responseClone = response.clone()
const body = await responseClone.text()
const headers = Object.fromEntries(responseClone.headers.entries())
const status = responseClone.status
const cacheData = {
body: body,
headers: headers,
status: status,
cachedAt: Date.now(),
ttl: calculateTTL(responseClone)
}
// Store in KV with expiration
await env.API_CACHE_KV.put(cacheKey, JSON.stringify(cacheData), {
expirationTtl: cacheData.ttl
})
}
function extractApiEndpoint(url) {
// Extract actual API endpoint from proxy URL
const path = url.pathname.replace('/api/proxy/', '')
return `${url.protocol}//${path}${url.search}`
}
function generateCacheKey(request) {
const url = new URL(request.url)
// Include method, path, query params, and auth headers in cache key
const components = [
request.method,
url.pathname,
url.search,
request.headers.get('authorization') || 'no-auth'
]
return hashComponents(components)
}
// API aggregator for multiple endpoints
export class ApiAggregator {
constructor(state, env) {
this.state = state
this.env = env
}
async fetch(request) {
const url = new URL(request.url)
if (url.pathname === '/api/aggregate') {
return this.handleAggregateRequest(request)
}
return new Response('Not found', { status: 404 })
}
async handleAggregateRequest(request) {
const { endpoints } = await request.json()
// Execute all API calls in parallel
const promises = endpoints.map(endpoint =>
this.fetchEndpoint(endpoint)
)
const results = await Promise.allSettled(promises)
// Process results
const data = {}
const errors = {}
results.forEach((result, index) => {
const endpoint = endpoints[index]
if (result.status === 'fulfilled') {
data[endpoint.name || `endpoint_${index}`] = result.value
} else {
errors[endpoint.name || `endpoint_${index}`] = result.reason.message
}
})
return new Response(JSON.stringify({
data: data,
errors: errors.length > 0 ? errors : undefined,
timestamp: new Date().toISOString()
}), {
headers: { 'Content-Type': 'application/json' }
})
}
async fetchEndpoint(endpoint) {
const cacheKey = `aggregate_${hashString(JSON.stringify(endpoint))}`
// Check cache first
const cached = await this.env.API_CACHE_KV.get(cacheKey, { type: 'json' })
if (cached) {
return cached
}
// Fetch from API
const response = await fetch(endpoint.url, {
method: endpoint.method || 'GET',
headers: endpoint.headers || {}
})
if (!response.ok) {
throw new Error(`API request failed: ${response.status}`)
}
const data = await response.json()
// Cache response
await this.env.API_CACHE_KV.put(cacheKey, JSON.stringify(data), {
expirationTtl: endpoint.ttl || 300
})
return data
}
}
Jekyll integrates external API data through generators that fetch data during build time and plugins that provide Liquid filters for API data access.
# _plugins/api_data_generator.rb
module Jekyll
class ApiDataGenerator < Generator
priority :high
def generate(site)
@site = site
@config = site.config['api_integration'] || {}
return unless @config['enabled']
# Fetch and process API data
fetch_api_data
# Generate data files
generate_data_files
# Create API-driven pages
generate_api_pages if @config['generate_pages']
end
private
def fetch_api_data
@api_data = {}
@config['endpoints'].each do |endpoint_name, endpoint_config|
next unless endpoint_config['enabled']
begin
data = fetch_endpoint(endpoint_config)
@api_data[endpoint_name] = process_api_data(data, endpoint_config)
rescue => e
Jekyll.logger.error "API Error (#{endpoint_name}): #{e.message}"
# Use fallback data if configured
if endpoint_config['fallback']
@api_data[endpoint_name] = load_fallback_data(endpoint_config['fallback'])
end
end
end
end
def fetch_endpoint(config)
# Use appropriate client based on configuration
client = build_client(config)
client.fetch(
config['path'],
config['params'] || {},
cache: config['cache'] || true,
ttl: config['ttl'] || 300
)
end
def build_client(config)
case config['type']
when 'github'
ApiIntegration::GitHubClient.new(config['token'])
when 'twitter'
ApiIntegration::TwitterClient.new(config['bearer_token'])
when 'custom'
ApiIntegration::Client.new(
base_url: config['base_url'],
headers: config['headers'] || {}
)
else
raise "Unknown API type: #{config['type']}"
end
end
def process_api_data(data, config)
processor = ApiIntegration::DataProcessor.new(config['transformations'] || {})
processor.process(data, config['processor'])
end
def generate_data_files
@api_data.each do |name, data|
data_file_path = File.join(@site.source, '_data', "api_#{name}.json")
File.write(data_file_path, JSON.pretty_generate(data))
Jekyll.logger.debug "Generated API data file: #{data_file_path}"
end
end
def generate_api_pages
@api_data.each do |name, data|
next unless data.is_a?(Array)
data.each_with_index do |item, index|
create_api_page(name, item, index)
end
end
end
def create_api_page(collection_name, data, index)
page = ApiPage.new(@site, @site.source, collection_name, data, index)
@site.pages << page
end
end
# Custom page class for API-generated content
class ApiPage < Page
def initialize(site, base, collection, data, index)
@site = site
@base = base
# Generate slug from data
slug = data['slug'] || data['title']&.downcase&.gsub(/[^\w]+/, '-') || "item-#{index}"
@dir = "#{collection}/#{slug}"
@name = 'index.html'
self.process(@name)
self.data = {
'layout' => 'api_item',
'title' => data['title'] || "Item #{index + 1}",
'api_data' => data,
'collection' => collection
}
# Generate content from template
self.content = generate_content(data)
end
def generate_content(data)
# Use template from _layouts/api_item.html or generate dynamically
if File.exist?(File.join(@base, '_layouts/api_item.html'))
# Render with Liquid
render_with_liquid(data)
else
# Generate simple HTML
<<~HTML
#{data['title']}
#{data['content'] || data['body'] || ''}
HTML
end
end
end
# Liquid filters for API data access
module ApiFilters
def api_data(name, key = nil)
data = @context.registers[:site].data["api_#{name}"]
if key
data[key] if data.is_a?(Hash)
else
data
end
end
def api_item(collection, identifier)
data = @context.registers[:site].data["api_#{collection}"]
return nil unless data.is_a?(Array)
if identifier.is_a?(Integer)
data[identifier]
else
data.find { |item| item['id'] == identifier || item['slug'] == identifier }
end
end
def api_first(collection)
data = @context.registers[:site].data["api_#{collection}"]
data.is_a?(Array) ? data.first : nil
end
def api_last(collection)
data = @context.registers[:site].data["api_#{collection}"]
data.is_a?(Array) ? data.last : nil
end
end
end
Liquid::Template.register_filter(Jekyll::ApiFilters)
Real-time updates keep API data fresh between builds using WebSocket connections and incremental data updates through Cloudflare Workers.
# lib/api_integration/realtime.rb
module ApiIntegration
class RealtimeUpdater
def initialize(config)
@config = config
@connections = {}
@subscriptions = {}
@data_cache = {}
end
def start
# Start WebSocket connections for each real-time endpoint
@config['realtime_endpoints'].each do |endpoint|
start_websocket_connection(endpoint)
end
# Start periodic data refresh
start_refresh_timer
end
def subscribe(channel, &callback)
@subscriptions[channel] ||= []
@subscriptions[channel] << callback
end
def update_data(channel, data)
@data_cache[channel] = data
# Notify subscribers
notify_subscribers(channel, data)
# Persist to storage
persist_data(channel, data)
end
private
def start_websocket_connection(endpoint)
Thread.new do
begin
ws = WebSocket::Client::Simple.connect(endpoint['url'])
ws.on :message do |msg|
data = JSON.parse(msg.data)
process_websocket_message(endpoint['channel'], data)
end
ws.on :open do
# Send subscription message if required
if endpoint['subscribe']
ws.send(JSON.generate(endpoint['subscribe']))
end
end
ws.on :close do |e|
# Reconnect after delay
sleep endpoint['reconnect_delay'] || 5
start_websocket_connection(endpoint)
end
@connections[endpoint['channel']] = ws
rescue => e
log("WebSocket error for #{endpoint['channel']}: #{e.message}")
sleep 10
retry
end
end
end
def process_websocket_message(channel, data)
# Transform data based on endpoint configuration
transformed = transform_realtime_data(data, channel)
# Update cache and notify
update_data(channel, transformed)
end
def start_refresh_timer
Thread.new do
loop do
sleep 60 # Refresh every minute
@config['refresh_endpoints'].each do |endpoint|
refresh_endpoint(endpoint)
end
end
end
end
def refresh_endpoint(endpoint)
client = build_client(endpoint)
begin
data = client.fetch(endpoint['path'], endpoint['params'] || {})
update_data(endpoint['channel'], data)
rescue => e
log("Refresh error for #{endpoint['channel']}: #{e.message}")
end
end
def notify_subscribers(channel, data)
return unless @subscriptions[channel]
@subscriptions[channel].each do |callback|
begin
callback.call(data)
rescue => e
log("Subscriber error: #{e.message}")
end
end
end
def persist_data(channel, data)
# Save to Cloudflare KV via Worker
uri = URI.parse("https://your-worker.workers.dev/api/data/#{channel}")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Put.new(uri.path)
request['Authorization'] = "Bearer #{@config['worker_token']}"
request['Content-Type'] = 'application/json'
request.body = data.to_json
http.request(request)
end
end
# Jekyll integration for real-time data
class RealtimeDataGenerator < Generator
def generate(site)
return unless site.config['realtime_updates']
# Initialize real-time updater
@updater = RealtimeUpdater.new(site.config['realtime_updates'])
# Subscribe to data updates
site.config['realtime_updates']['channels'].each do |channel|
@updater.subscribe(channel) do |data|
update_site_data(site, channel, data)
end
end
# Start the updater
@updater.start
end
private
def update_site_data(site, channel, data)
# Update site data
site.data["realtime_#{channel}"] = data
# Trigger incremental rebuild if enabled
if site.config['incremental_rebuild']
trigger_incremental_rebuild(channel)
end
end
def trigger_incremental_rebuild(channel)
# Send webhook to trigger rebuild
uri = URI.parse(site.config['realtime_updates']['rebuild_webhook'])
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true if uri.scheme == 'https'
request = Net::HTTP::Post.new(uri.path)
request['Content-Type'] = 'application/json'
request.body = { channel: channel, timestamp: Time.now.utc.iso8601 }.to_json
http.request(request)
end
end
end
# Cloudflare Worker for real-time WebSocket connections
// workers/realtime.js
export class RealtimeWorker {
constructor(state, env) {
this.state = state
this.env = env
this.clients = new Set()
}
async fetch(request) {
const upgradeHeader = request.headers.get('Upgrade')
if (upgradeHeader === 'websocket') {
return this.handleWebSocket(request)
}
return new Response('Expected WebSocket upgrade', { status: 426 })
}
async handleWebSocket(request) {
const pair = new WebSocketPair()
const [client, server] = Object.values(pair)
this.state.acceptWebSocket(server)
return new Response(null, {
status: 101,
webSocket: client
})
}
async webSocketMessage(ws, message) {
// Handle incoming WebSocket messages
const data = JSON.parse(message)
switch (data.type) {
case 'subscribe':
this.handleSubscribe(ws, data.channel)
break
case 'unsubscribe':
this.handleUnsubscribe(ws, data.channel)
break
case 'update':
this.handleUpdate(ws, data)
break
}
}
handleSubscribe(ws, channel) {
// Add to channel subscription
const subscriptions = this.state.getWebSocketSubscriptions(ws)
subscriptions.add(channel)
// Send current data for channel
const currentData = this.env.REALTIME_KV.get(channel)
if (currentData) {
ws.send(JSON.stringify({
type: 'data',
channel: channel,
data: JSON.parse(currentData)
}))
}
}
async handleUpdate(ws, data) {
// Update data in KV
await this.env.REALTIME_KV.put(data.channel, JSON.stringify(data.data))
// Broadcast to all subscribers
const websockets = this.state.getWebSockets()
for (const subscriber of websockets) {
const subscriptions = this.state.getWebSocketSubscriptions(subscriber)
if (subscriptions.has(data.channel)) {
subscriber.send(JSON.stringify({
type: 'update',
channel: data.channel,
data: data.data,
timestamp: new Date().toISOString()
}))
}
}
}
}
API security protects against abuse and unauthorized access while rate limiting ensures fair usage and prevents service degradation.
# lib/api_integration/security.rb
module ApiIntegration
class SecurityManager
def initialize(config)
@config = config
@rate_limiters = {}
@api_keys = load_api_keys
end
def authenticate(request)
api_key = extract_api_key(request)
unless api_key && valid_api_key?(api_key)
raise AuthenticationError, 'Invalid API key'
end
# Check rate limits
unless within_rate_limit?(api_key, request)
raise RateLimitError, 'Rate limit exceeded'
end
true
end
def rate_limit(key, endpoint, cost = 1)
limiter = rate_limiter_for(key)
limiter.record_request(endpoint, cost)
unless limiter.within_limits?(endpoint)
raise RateLimitError, "Rate limit exceeded for #{endpoint}"
end
end
private
def extract_api_key(request)
request.headers['X-API-Key'] ||
request.params['api_key'] ||
request.env['HTTP_AUTHORIZATION']&.gsub(/^Bearer /, '')
end
def valid_api_key?(api_key)
@api_keys.key?(api_key) && !api_key_expired?(api_key)
end
def api_key_expired?(api_key)
expires_at = @api_keys[api_key]['expires_at']
expires_at && Time.parse(expires_at) < Time.now
end
def rate_limiter_for(key)
@rate_limiters[key] ||= RateLimiter.new(@config['rate_limits'])
end
def load_api_keys
# Load from environment or external source
keys_json = ENV['API_KEYS'] || '{}'
JSON.parse(keys_json)
end
end
class RateLimiter
def initialize(config)
@config = config
@requests = Hash.new { |h, k| h[k] = [] }
@window_start = Time.now
end
def record_request(endpoint, cost = 1)
now = Time.now
# Clean old requests outside window
cleanup_old_requests(now)
# Record new request
@requests[endpoint] << {
time: now,
cost: cost
}
end
def within_limits?(endpoint)
config = @config[endpoint] || @config['default']
return true unless config
window = config['window'] || 3600
limit = config['limit'] || 100
# Calculate total cost in current window
window_start = Time.now - window
total_cost = @requests[endpoint].select do |req|
req[:time] >= window_start
end.sum { |req| req[:cost] }
total_cost <= limit
end
private
def cleanup_old_requests(current_time)
@requests.each_key do |endpoint|
config = @config[endpoint] || @config['default']
window = config ? (config['window'] || 3600) : 3600
window_start = current_time - window
@requests[endpoint].reject! { |req| req[:time] < window_start }
end
end
end
# Cloudflare Worker for API security
// workers/api-security.js
export class ApiSecurityWorker {
constructor(state, env) {
this.state = state
this.env = env
this.rateLimiter = new RateLimiter(env)
}
async fetch(request) {
// Extract API key
const apiKey = this.extractApiKey(request)
if (!apiKey) {
return new Response('API key required', { status: 401 })
}
// Validate API key
const isValid = await this.validateApiKey(apiKey)
if (!isValid) {
return new Response('Invalid API key', { status: 401 })
}
// Check rate limits
const endpoint = new URL(request.url).pathname
const isRateLimited = await this.rateLimiter.check(apiKey, endpoint)
if (isRateLimited) {
return new Response('Rate limit exceeded', {
status: 429,
headers: {
'Retry-After': '60'
}
})
}
// Forward request to origin
return fetch(request)
}
extractApiKey(request) {
return request.headers.get('X-API-Key') ||
new URL(request.url).searchParams.get('api_key')
}
async validateApiKey(apiKey) {
// Check against stored API keys
const storedKey = await this.env.API_KEYS_KV.get(apiKey)
return storedKey !== null
}
}
class RateLimiter {
constructor(env) {
this.env = env
}
async check(apiKey, endpoint) {
const key = `rate_limit:${apiKey}:${endpoint}`
// Get current count
let count = await this.env.RATE_LIMIT_KV.get(key)
count = count ? parseInt(count) : 0
// Check if over limit (100 requests per hour)
if (count >= 100) {
return true
}
// Increment count
await this.env.RATE_LIMIT_KV.put(key, (count + 1).toString(), {
expirationTtl: 3600 // 1 hour
})
return false
}
}
end
This API-driven architecture transforms Jekyll sites into dynamic platforms that can integrate with any external API while maintaining the performance benefits of static site generation. The combination of Ruby for data processing and Cloudflare Workers for edge API handling creates a powerful, scalable solution for modern web development.