# Import & Scraper-Integration für Laravel Event-Portal ## 📌 Übersicht Die App unterstützt mehrere Integrationsoptionen für den Event-Import: 1. **Commands** - Manuelle, einmalige Imports via Artisan-CLI 2. **Queue Jobs** - Asynchrone, warteschlangen-basierte Imports 3. **Scheduler** - Geplante, regelmäßige Imports (z.B. täglich) 4. **Webhooks/Events** - Echtzeit-Updates von externen Quellen --- ## 🔧 Setup-Schritte ### 1. Abhängigkeiten installieren ```bash # Für HTTP-Requests (externe APIs) composer require laravel/http-client # Für Web-Scraping (optional) composer require symfony/dom-crawler symfony/http-client # Für erweiterte Logging/Monitoring (optional) composer require sentry/sentry-laravel ``` ### 2. Queue-Konfiguration Bearbeite `.env`: ```env QUEUE_CONNECTION=database # oder redis, beanstalkd, etc. ``` Erstelle Queue-Tabelle: ```bash php artisan queue:table php artisan migrate ``` ### 3. Sources erstellen Füge über Seeder oder Admin-Interface Source-Records hinzu: ```php // database/seeders/SourceSeeder.php use App\Models\Source; use Illuminate\Database\Seeder; class SourceSeeder extends Seeder { public function run() { Source::create([ 'name' => 'Stadt Dresden', 'description' => 'Offizielle Veranstaltungen der Landeshauptstadt Dresden', 'url' => 'https://stadt-dresden.de/veranstaltungen', 'status' => 'active', ]); Source::create([ 'name' => 'Kulturzentrum Hellerau', 'description' => 'Veranstaltungen des Kulturzentrums Hellerau', 'url' => 'https://hellerau.org', 'status' => 'active', ]); } } ``` Starten: ```bash php artisan db:seed --class=SourceSeeder ``` --- ## 👨‍💻 Verwendung ### Option 1: Manueller Import via Command ```bash # Alle aktiven Quellen importieren (asynchron) php artisan events:import # Nur eine spezifische Quelle (nach ID) php artisan events:import --source=1 # Oder nach Name php artisan events:import --source="Stadt Dresden" # Synchron (blocking) ausführen php artisan events:import --sync ``` ### Option 2: Programmgesteuert im Code ```php // In einem Controller, Service oder Command: use App\Jobs\ImportEventsJob; use App\Models\Source; use App\Services\EventImportService; // Via Service $importService = app(EventImportService::class); $importService->importFromAllSources($synchronous = false); // Oder direkt Job Dispatchen $source = Source::find(1); ImportEventsJob::dispatch($source); // Asynchron ImportEventsJob::dispatchSync($source); // Synchron ``` ### Option 3: Queue Worker ausführen Damit die Jobs in der Queue abgearbeitet werden: ```bash # Development: Ein Worker mit verbose Output php artisan queue:work --verbose # Production: Daemon-Mode mit Auto-Restart php artisan queue:work --daemon --tries=3 --timeout=120 # Mit Supervisor für permanente Worker (Production) # Siehe: https://laravel.com/docs/queues#supervisor-configuration ``` --- ## ⏰ Scheduler-Integration ### Täglicher Import via Scheduler Bearbeite `app/Console/Kernel.php`: ```php load(__DIR__.'/Commands'); require base_path('routes/console.php'); } /** * Define the application's command schedule. */ protected function schedule(Schedule $schedule) { // ===== EVENT-IMPORTS ===== // Täglicher Import um 03:00 Uhr nachts $schedule->command('events:import') ->dailyAt('03:00') ->name('events.daily_import') ->onFailure(function () { \Illuminate\Support\Facades\Log::error('Daily event import failed'); }) ->onSuccess(function () { \Illuminate\Support\Facades\Log::info('Daily event import completed'); }); // Zusätzlich: Stündliche Importe (z.B. für häufig aktualisierte Quellen) $schedule->command('events:import --source="Stadt Dresden"') ->hourly() ->name('events.hourly_import_dresden'); // ===== CLEANUP & MAINTENANCE ===== // Lösche abgelaufene Termine täglich $schedule->call(function () { \App\Models\EventOccurrence::where('status', 'scheduled') ->where('end_datetime', '<', now()) ->update(['status' => 'completed']); }) ->daily() ->at('04:00') ->name('events.mark_completed'); // Lösche verwaiste Events ohne Termine $schedule->call(function () { \App\Models\Event::doesntHave('occurrences') ->where('status', 'published') ->where('created_at', '<', now()->subMonths(1)) ->update(['status' => 'archived']); }) ->weekly() ->name('events.cleanup_orphaned'); // Runnable: Optional - teste dieSchedulerkonfiguration if (app()->environment('local')) { $schedule->command('inspire')->hourly(); } } /** * Get the timezone that should be used by default for scheduled events. */ protected function scheduleTimezone(): string { return 'Europe/Berlin'; } } ``` ### Scheduler im Production einrichten Für Production brauchst du einen Cron-Job, der den Scheduler jede Minute aufruft: ```bash # Crontab editieren crontab -e # Folgendes hinzufügen: * * * * * cd /path/to/app && php artisan schedule:run >> /dev/null 2>&1 ``` Oder mit systemd-Timer (Modern Alternative): ```ini # /etc/systemd/system/laravel-scheduler.service [Unit] Description=Laravel Artisan Scheduler Requires=laravel-scheduler.timer [Service] Type=oneshot User=www-data ExecStart=/usr/bin/php /path/to/app/artisan schedule:run ``` --- ## 🔌 API-Integration: Beispiele für externe Quellen ### Stadt Dresden API ```php // In ImportEventsJob::fetchExternalEvents() use Illuminate\Support\Facades\Http; $response = Http::withHeaders([ 'Accept' => 'application/json', 'User-Agent' => 'Dresden-EventPortal/1.0', ])->get('https://api.stadt-dresden.de/v1/events', [ 'limit' => 1000, 'filter[status]' => 'published', ]); $events = $response->json('data'); ``` ### iCal-Feed (z.B. von Google Calendar) ```php use Spatie\IcalendarParser\InvitationParser; $feed = file_get_contents('https://calendar.google.com/calendar/ical/.../public/basic.ics'); $event = InvitationParser::parse($feed); foreach ($event as $entry) { $events[] = [ 'external_id' => $entry['uid'], 'title' => $entry['summary'], 'location' => $entry['location'] ?? 'TBD', 'description' => $entry['description'] ?? null, 'occurrences' => [ [ 'start_datetime' => $entry['dtstart'], 'end_datetime' => $entry['dtend'] ?? null, ] ] ]; } ``` ### Web-Scraping mit DOM-Crawler ```php use Symfony\Component\DomCrawler\Crawler; use Symfony\Component\HttpClient\HttpClient; $client = HttpClient::create(); $response = $client->request('GET', 'https://example.com/events'); $html = $response->getContent(); $crawler = new Crawler($html); $events = []; $crawler->filter('.event-card')->each(function (Crawler $event) use (&$events) { $events[] = [ 'external_id' => $event->filter('[data-event-id]')->attr('data-event-id'), 'title' => $event->filter('.event-title')->text(), 'description' => $event->filter('.event-desc')->text(), 'location' => $event->filter('.event-location')->text(), 'occurrences' => [ [ 'start_datetime' => $event->filter('[data-date]')->attr('data-date'), ] ] ]; }); ``` --- ## 🔄 Upsert-Logik erklärt Die App verwendet Laravel's `updateOrCreate()` für Event-Duplikat-Handling: ```php // Suche Event mit (source_id, external_id) // Falls existiert: Update mit neuen Daten // Falls nicht: Erstelle neuen Record $event = Event::updateOrCreate( [ 'source_id' => $source->id, 'external_id' => $externalData['external_id'], ], [ 'title' => $externalData['title'], 'description' => $externalData['description'] ?? null, 'location' => $externalData['location'], // ... mehr Felder ] ); if ($event->wasRecentlyCreated) { // Neuer Event } else { // Event aktualisiert } ``` **Vorteile:** - ✅ Verhindert Duplikate (unique index auf `[source_id, external_id]`) - ✅ Aktualisiert existierende Events - ✅ Einfaches Handling bei mehreren Importen - ✅ Atomare Operation (transaktional) --- ## 📊 Monitoring & Logging ### Job-Übersicht ```bash # Anstehende Jobs in der Queue anschauen php artisan queue:work --verbose # Log-Output für Failure tail -f storage/logs/laravel.log | grep ImportEventsJob ``` ### Custom Queue-Monitor Dashboard ```php // Beispiel: Dashboard für laufende Imports Route::get('/admin/imports', function () { $failed = \Illuminate\Support\Facades\DB::table('failed_jobs') ->where('queue', 'default') ->latest() ->limit(20) ->get(); $pending = \Illuminate\Support\Facades\DB::table('jobs') ->where('queue', 'default') ->count(); return response()->json([ 'pending_jobs' => $pending, 'failed_jobs' => $failed, ]); }); ``` --- ## 🚀 Best Practices ### 1. Skalierung bei vielen Events Für große Mengen an Events (1000+) pro Import: - Nutze **Chunking**: `$externalEvents->chunk(100)` - **Batch-Processing** mit `InsertOnDuplicateKeyUpdateCommand` - Disable **Query Logging** im Job ```php // In handle(): \Illuminate\Support\Facades\DB::disableQueryLog(); foreach ($externalEvents->chunk(100) as $chunk) { foreach ($chunk as $event) { $this->upsertEvent($event); } } ``` ### 2. Error Handling & Retries ```php // In ImportEventsJob versuchweise 3x erneut: class ImportEventsJob implements ShouldQueue { public $tries = 3; public $backoff = [60, 300, 900]; // Backoff: 1min, 5min, 15min } ``` ### 3. Rate Limiting für externe APIs ```php use Illuminate\Support\Facades\RateLimiter; protected function fetchExternalEvents() { return RateLimiter::attempt( 'dresden-api-import', $perMinute = 10, function () { return Http::get('https://api.stadt-dresden.de/events')->json(); }, $decay = 60 ); } ``` ### 4. Transaction für Atomarität ```php use Illuminate\Support\Facades\DB; DB::transaction(function () { foreach ($externalEvents as $externalEvent) { $this->upsertEvent($externalEvent); } }); ``` --- ## 🔍 Troubleshooting ### Queue-Jobs werden nicht verarbeitet ```bash # 1. Checke Queue-Konfiguration php artisan config:show queue # 2. Starte einem Artisan Queue Worker php artisan queue:work # 3. Prüfe failed_jobs table php artisan queue:failed ``` ### Import schlägt fehl - Externe API nicht erreichbar ```php // Nutze Http withoutVerifying für HTTPS-Fehler (nur dev!) Http::withoutVerifying()->get('https://...'); // Oder mit Custom Timeout Http::timeout(30)->get('https://...'); ``` ### Duplicate Key Errors ```php // Prüfe Unique Index: DB::raw('SHOW INDEX FROM events') // Falls fehlt: Schema::table('events', function (Blueprint $table) { $table->unique(['source_id', 'external_id']); }); ``` --- ## 📚 Ressourcen - [Laravel Queue Documentation](https://laravel.com/docs/queues) - [Laravel Scheduler](https://laravel.com/docs/scheduling) - [Laravel HTTP Client](https://laravel.com/docs/http-client) - [Symfony DomCrawler (Web Scraping)](https://symfony.com/doc/current/components/dom_crawler.html)