12 KiB
12 KiB
Import & Scraper-Integration für Laravel Event-Portal
📌 Übersicht
Die App unterstützt mehrere Integrationsoptionen für den Event-Import:
- Commands - Manuelle, einmalige Imports via Artisan-CLI
- Queue Jobs - Asynchrone, warteschlangen-basierte Imports
- Scheduler - Geplante, regelmäßige Imports (z.B. täglich)
- Webhooks/Events - Echtzeit-Updates von externen Quellen
🔧 Setup-Schritte
1. Abhängigkeiten installieren
# Für HTTP-Requests (externe APIs)
composer require laravel/http-client
# Für Web-Scraping (optional)
composer require symfony/dom-crawler symfony/http-client
# Für erweiterte Logging/Monitoring (optional)
composer require sentry/sentry-laravel
2. Queue-Konfiguration
Bearbeite .env:
QUEUE_CONNECTION=database # oder redis, beanstalkd, etc.
Erstelle Queue-Tabelle:
php artisan queue:table
php artisan migrate
3. Sources erstellen
Füge über Seeder oder Admin-Interface Source-Records hinzu:
// database/seeders/SourceSeeder.php
use App\Models\Source;
use Illuminate\Database\Seeder;
class SourceSeeder extends Seeder
{
public function run()
{
Source::create([
'name' => 'Stadt Dresden',
'description' => 'Offizielle Veranstaltungen der Landeshauptstadt Dresden',
'url' => 'https://stadt-dresden.de/veranstaltungen',
'status' => 'active',
]);
Source::create([
'name' => 'Kulturzentrum Hellerau',
'description' => 'Veranstaltungen des Kulturzentrums Hellerau',
'url' => 'https://hellerau.org',
'status' => 'active',
]);
}
}
Starten:
php artisan db:seed --class=SourceSeeder
👨💻 Verwendung
Option 1: Manueller Import via Command
# Alle aktiven Quellen importieren (asynchron)
php artisan events:import
# Nur eine spezifische Quelle (nach ID)
php artisan events:import --source=1
# Oder nach Name
php artisan events:import --source="Stadt Dresden"
# Synchron (blocking) ausführen
php artisan events:import --sync
Option 2: Programmgesteuert im Code
// In einem Controller, Service oder Command:
use App\Jobs\ImportEventsJob;
use App\Models\Source;
use App\Services\EventImportService;
// Via Service
$importService = app(EventImportService::class);
$importService->importFromAllSources($synchronous = false);
// Oder direkt Job Dispatchen
$source = Source::find(1);
ImportEventsJob::dispatch($source); // Asynchron
ImportEventsJob::dispatchSync($source); // Synchron
Option 3: Queue Worker ausführen
Damit die Jobs in der Queue abgearbeitet werden:
# Development: Ein Worker mit verbose Output
php artisan queue:work --verbose
# Production: Daemon-Mode mit Auto-Restart
php artisan queue:work --daemon --tries=3 --timeout=120
# Mit Supervisor für permanente Worker (Production)
# Siehe: https://laravel.com/docs/queues#supervisor-configuration
⏰ Scheduler-Integration
Täglicher Import via Scheduler
Bearbeite app/Console/Kernel.php:
<?php
namespace App\Console;
use App\Jobs\ImportEventsJob;
use App\Models\Source;
use Illuminate\Console\Scheduling\Schedule;
use Illuminate\Foundation\Console\Kernel as ConsoleKernel;
class Kernel extends ConsoleKernel
{
/**
* Register the commands for the application.
*/
protected function commands()
{
$this->load(__DIR__.'/Commands');
require base_path('routes/console.php');
}
/**
* Define the application's command schedule.
*/
protected function schedule(Schedule $schedule)
{
// ===== EVENT-IMPORTS =====
// Täglicher Import um 03:00 Uhr nachts
$schedule->command('events:import')
->dailyAt('03:00')
->name('events.daily_import')
->onFailure(function () {
\Illuminate\Support\Facades\Log::error('Daily event import failed');
})
->onSuccess(function () {
\Illuminate\Support\Facades\Log::info('Daily event import completed');
});
// Zusätzlich: Stündliche Importe (z.B. für häufig aktualisierte Quellen)
$schedule->command('events:import --source="Stadt Dresden"')
->hourly()
->name('events.hourly_import_dresden');
// ===== CLEANUP & MAINTENANCE =====
// Lösche abgelaufene Termine täglich
$schedule->call(function () {
\App\Models\EventOccurrence::where('status', 'scheduled')
->where('end_datetime', '<', now())
->update(['status' => 'completed']);
})
->daily()
->at('04:00')
->name('events.mark_completed');
// Lösche verwaiste Events ohne Termine
$schedule->call(function () {
\App\Models\Event::doesntHave('occurrences')
->where('status', 'published')
->where('created_at', '<', now()->subMonths(1))
->update(['status' => 'archived']);
})
->weekly()
->name('events.cleanup_orphaned');
// Runnable: Optional - teste dieSchedulerkonfiguration
if (app()->environment('local')) {
$schedule->command('inspire')->hourly();
}
}
/**
* Get the timezone that should be used by default for scheduled events.
*/
protected function scheduleTimezone(): string
{
return 'Europe/Berlin';
}
}
Scheduler im Production einrichten
Für Production brauchst du einen Cron-Job, der den Scheduler jede Minute aufruft:
# Crontab editieren
crontab -e
# Folgendes hinzufügen:
* * * * * cd /path/to/app && php artisan schedule:run >> /dev/null 2>&1
Oder mit systemd-Timer (Modern Alternative):
# /etc/systemd/system/laravel-scheduler.service
[Unit]
Description=Laravel Artisan Scheduler
Requires=laravel-scheduler.timer
[Service]
Type=oneshot
User=www-data
ExecStart=/usr/bin/php /path/to/app/artisan schedule:run
🔌 API-Integration: Beispiele für externe Quellen
Stadt Dresden API
// In ImportEventsJob::fetchExternalEvents()
use Illuminate\Support\Facades\Http;
$response = Http::withHeaders([
'Accept' => 'application/json',
'User-Agent' => 'Dresden-EventPortal/1.0',
])->get('https://api.stadt-dresden.de/v1/events', [
'limit' => 1000,
'filter[status]' => 'published',
]);
$events = $response->json('data');
iCal-Feed (z.B. von Google Calendar)
use Spatie\IcalendarParser\InvitationParser;
$feed = file_get_contents('https://calendar.google.com/calendar/ical/.../public/basic.ics');
$event = InvitationParser::parse($feed);
foreach ($event as $entry) {
$events[] = [
'external_id' => $entry['uid'],
'title' => $entry['summary'],
'location' => $entry['location'] ?? 'TBD',
'description' => $entry['description'] ?? null,
'occurrences' => [
[
'start_datetime' => $entry['dtstart'],
'end_datetime' => $entry['dtend'] ?? null,
]
]
];
}
Web-Scraping mit DOM-Crawler
use Symfony\Component\DomCrawler\Crawler;
use Symfony\Component\HttpClient\HttpClient;
$client = HttpClient::create();
$response = $client->request('GET', 'https://example.com/events');
$html = $response->getContent();
$crawler = new Crawler($html);
$events = [];
$crawler->filter('.event-card')->each(function (Crawler $event) use (&$events) {
$events[] = [
'external_id' => $event->filter('[data-event-id]')->attr('data-event-id'),
'title' => $event->filter('.event-title')->text(),
'description' => $event->filter('.event-desc')->text(),
'location' => $event->filter('.event-location')->text(),
'occurrences' => [
[
'start_datetime' => $event->filter('[data-date]')->attr('data-date'),
]
]
];
});
🔄 Upsert-Logik erklärt
Die App verwendet Laravel's updateOrCreate() für Event-Duplikat-Handling:
// Suche Event mit (source_id, external_id)
// Falls existiert: Update mit neuen Daten
// Falls nicht: Erstelle neuen Record
$event = Event::updateOrCreate(
[
'source_id' => $source->id,
'external_id' => $externalData['external_id'],
],
[
'title' => $externalData['title'],
'description' => $externalData['description'] ?? null,
'location' => $externalData['location'],
// ... mehr Felder
]
);
if ($event->wasRecentlyCreated) {
// Neuer Event
} else {
// Event aktualisiert
}
Vorteile:
- ✅ Verhindert Duplikate (unique index auf
[source_id, external_id]) - ✅ Aktualisiert existierende Events
- ✅ Einfaches Handling bei mehreren Importen
- ✅ Atomare Operation (transaktional)
📊 Monitoring & Logging
Job-Übersicht
# Anstehende Jobs in der Queue anschauen
php artisan queue:work --verbose
# Log-Output für Failure
tail -f storage/logs/laravel.log | grep ImportEventsJob
Custom Queue-Monitor Dashboard
// Beispiel: Dashboard für laufende Imports
Route::get('/admin/imports', function () {
$failed = \Illuminate\Support\Facades\DB::table('failed_jobs')
->where('queue', 'default')
->latest()
->limit(20)
->get();
$pending = \Illuminate\Support\Facades\DB::table('jobs')
->where('queue', 'default')
->count();
return response()->json([
'pending_jobs' => $pending,
'failed_jobs' => $failed,
]);
});
🚀 Best Practices
1. Skalierung bei vielen Events
Für große Mengen an Events (1000+) pro Import:
- Nutze Chunking:
$externalEvents->chunk(100) - Batch-Processing mit
InsertOnDuplicateKeyUpdateCommand - Disable Query Logging im Job
// In handle():
\Illuminate\Support\Facades\DB::disableQueryLog();
foreach ($externalEvents->chunk(100) as $chunk) {
foreach ($chunk as $event) {
$this->upsertEvent($event);
}
}
2. Error Handling & Retries
// In ImportEventsJob versuchweise 3x erneut:
class ImportEventsJob implements ShouldQueue
{
public $tries = 3;
public $backoff = [60, 300, 900]; // Backoff: 1min, 5min, 15min
}
3. Rate Limiting für externe APIs
use Illuminate\Support\Facades\RateLimiter;
protected function fetchExternalEvents()
{
return RateLimiter::attempt(
'dresden-api-import',
$perMinute = 10,
function () {
return Http::get('https://api.stadt-dresden.de/events')->json();
},
$decay = 60
);
}
4. Transaction für Atomarität
use Illuminate\Support\Facades\DB;
DB::transaction(function () {
foreach ($externalEvents as $externalEvent) {
$this->upsertEvent($externalEvent);
}
});
🔍 Troubleshooting
Queue-Jobs werden nicht verarbeitet
# 1. Checke Queue-Konfiguration
php artisan config:show queue
# 2. Starte einem Artisan Queue Worker
php artisan queue:work
# 3. Prüfe failed_jobs table
php artisan queue:failed
Import schlägt fehl - Externe API nicht erreichbar
// Nutze Http withoutVerifying für HTTPS-Fehler (nur dev!)
Http::withoutVerifying()->get('https://...');
// Oder mit Custom Timeout
Http::timeout(30)->get('https://...');
Duplicate Key Errors
// Prüfe Unique Index:
DB::raw('SHOW INDEX FROM events')
// Falls fehlt:
Schema::table('events', function (Blueprint $table) {
$table->unique(['source_id', 'external_id']);
});