506 lines
12 KiB
Markdown
506 lines
12 KiB
Markdown
# Import & Scraper-Integration für Laravel Event-Portal
|
|
|
|
## 📌 Übersicht
|
|
|
|
Die App unterstützt mehrere Integrationsoptionen für den Event-Import:
|
|
|
|
1. **Commands** - Manuelle, einmalige Imports via Artisan-CLI
|
|
2. **Queue Jobs** - Asynchrone, warteschlangen-basierte Imports
|
|
3. **Scheduler** - Geplante, regelmäßige Imports (z.B. täglich)
|
|
4. **Webhooks/Events** - Echtzeit-Updates von externen Quellen
|
|
|
|
---
|
|
|
|
## 🔧 Setup-Schritte
|
|
|
|
### 1. Abhängigkeiten installieren
|
|
|
|
```bash
|
|
# Für HTTP-Requests (externe APIs)
|
|
composer require laravel/http-client
|
|
|
|
# Für Web-Scraping (optional)
|
|
composer require symfony/dom-crawler symfony/http-client
|
|
|
|
# Für erweiterte Logging/Monitoring (optional)
|
|
composer require sentry/sentry-laravel
|
|
```
|
|
|
|
### 2. Queue-Konfiguration
|
|
|
|
Bearbeite `.env`:
|
|
```env
|
|
QUEUE_CONNECTION=database # oder redis, beanstalkd, etc.
|
|
```
|
|
|
|
Erstelle Queue-Tabelle:
|
|
```bash
|
|
php artisan queue:table
|
|
php artisan migrate
|
|
```
|
|
|
|
### 3. Sources erstellen
|
|
|
|
Füge über Seeder oder Admin-Interface Source-Records hinzu:
|
|
|
|
```php
|
|
// database/seeders/SourceSeeder.php
|
|
|
|
use App\Models\Source;
|
|
use Illuminate\Database\Seeder;
|
|
|
|
class SourceSeeder extends Seeder
|
|
{
|
|
public function run()
|
|
{
|
|
Source::create([
|
|
'name' => 'Stadt Dresden',
|
|
'description' => 'Offizielle Veranstaltungen der Landeshauptstadt Dresden',
|
|
'url' => 'https://stadt-dresden.de/veranstaltungen',
|
|
'status' => 'active',
|
|
]);
|
|
|
|
Source::create([
|
|
'name' => 'Kulturzentrum Hellerau',
|
|
'description' => 'Veranstaltungen des Kulturzentrums Hellerau',
|
|
'url' => 'https://hellerau.org',
|
|
'status' => 'active',
|
|
]);
|
|
}
|
|
}
|
|
```
|
|
|
|
Starten:
|
|
```bash
|
|
php artisan db:seed --class=SourceSeeder
|
|
```
|
|
|
|
---
|
|
|
|
## 👨💻 Verwendung
|
|
|
|
### Option 1: Manueller Import via Command
|
|
|
|
```bash
|
|
# Alle aktiven Quellen importieren (asynchron)
|
|
php artisan events:import
|
|
|
|
# Nur eine spezifische Quelle (nach ID)
|
|
php artisan events:import --source=1
|
|
|
|
# Oder nach Name
|
|
php artisan events:import --source="Stadt Dresden"
|
|
|
|
# Synchron (blocking) ausführen
|
|
php artisan events:import --sync
|
|
```
|
|
|
|
### Option 2: Programmgesteuert im Code
|
|
|
|
```php
|
|
// In einem Controller, Service oder Command:
|
|
|
|
use App\Jobs\ImportEventsJob;
|
|
use App\Models\Source;
|
|
use App\Services\EventImportService;
|
|
|
|
// Via Service
|
|
$importService = app(EventImportService::class);
|
|
$importService->importFromAllSources($synchronous = false);
|
|
|
|
// Oder direkt Job Dispatchen
|
|
$source = Source::find(1);
|
|
ImportEventsJob::dispatch($source); // Asynchron
|
|
ImportEventsJob::dispatchSync($source); // Synchron
|
|
```
|
|
|
|
### Option 3: Queue Worker ausführen
|
|
|
|
Damit die Jobs in der Queue abgearbeitet werden:
|
|
|
|
```bash
|
|
# Development: Ein Worker mit verbose Output
|
|
php artisan queue:work --verbose
|
|
|
|
# Production: Daemon-Mode mit Auto-Restart
|
|
php artisan queue:work --daemon --tries=3 --timeout=120
|
|
|
|
# Mit Supervisor für permanente Worker (Production)
|
|
# Siehe: https://laravel.com/docs/queues#supervisor-configuration
|
|
```
|
|
|
|
---
|
|
|
|
## ⏰ Scheduler-Integration
|
|
|
|
### Täglicher Import via Scheduler
|
|
|
|
Bearbeite `app/Console/Kernel.php`:
|
|
|
|
```php
|
|
<?php
|
|
|
|
namespace App\Console;
|
|
|
|
use App\Jobs\ImportEventsJob;
|
|
use App\Models\Source;
|
|
use Illuminate\Console\Scheduling\Schedule;
|
|
use Illuminate\Foundation\Console\Kernel as ConsoleKernel;
|
|
|
|
class Kernel extends ConsoleKernel
|
|
{
|
|
/**
|
|
* Register the commands for the application.
|
|
*/
|
|
protected function commands()
|
|
{
|
|
$this->load(__DIR__.'/Commands');
|
|
require base_path('routes/console.php');
|
|
}
|
|
|
|
/**
|
|
* Define the application's command schedule.
|
|
*/
|
|
protected function schedule(Schedule $schedule)
|
|
{
|
|
// ===== EVENT-IMPORTS =====
|
|
|
|
// Täglicher Import um 03:00 Uhr nachts
|
|
$schedule->command('events:import')
|
|
->dailyAt('03:00')
|
|
->name('events.daily_import')
|
|
->onFailure(function () {
|
|
\Illuminate\Support\Facades\Log::error('Daily event import failed');
|
|
})
|
|
->onSuccess(function () {
|
|
\Illuminate\Support\Facades\Log::info('Daily event import completed');
|
|
});
|
|
|
|
// Zusätzlich: Stündliche Importe (z.B. für häufig aktualisierte Quellen)
|
|
$schedule->command('events:import --source="Stadt Dresden"')
|
|
->hourly()
|
|
->name('events.hourly_import_dresden');
|
|
|
|
// ===== CLEANUP & MAINTENANCE =====
|
|
|
|
// Lösche abgelaufene Termine täglich
|
|
$schedule->call(function () {
|
|
\App\Models\EventOccurrence::where('status', 'scheduled')
|
|
->where('end_datetime', '<', now())
|
|
->update(['status' => 'completed']);
|
|
})
|
|
->daily()
|
|
->at('04:00')
|
|
->name('events.mark_completed');
|
|
|
|
// Lösche verwaiste Events ohne Termine
|
|
$schedule->call(function () {
|
|
\App\Models\Event::doesntHave('occurrences')
|
|
->where('status', 'published')
|
|
->where('created_at', '<', now()->subMonths(1))
|
|
->update(['status' => 'archived']);
|
|
})
|
|
->weekly()
|
|
->name('events.cleanup_orphaned');
|
|
|
|
// Runnable: Optional - teste dieSchedulerkonfiguration
|
|
if (app()->environment('local')) {
|
|
$schedule->command('inspire')->hourly();
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Get the timezone that should be used by default for scheduled events.
|
|
*/
|
|
protected function scheduleTimezone(): string
|
|
{
|
|
return 'Europe/Berlin';
|
|
}
|
|
}
|
|
```
|
|
|
|
### Scheduler im Production einrichten
|
|
|
|
Für Production brauchst du einen Cron-Job, der den Scheduler jede Minute aufruft:
|
|
|
|
```bash
|
|
# Crontab editieren
|
|
crontab -e
|
|
|
|
# Folgendes hinzufügen:
|
|
* * * * * cd /path/to/app && php artisan schedule:run >> /dev/null 2>&1
|
|
```
|
|
|
|
Oder mit systemd-Timer (Modern Alternative):
|
|
|
|
```ini
|
|
# /etc/systemd/system/laravel-scheduler.service
|
|
[Unit]
|
|
Description=Laravel Artisan Scheduler
|
|
Requires=laravel-scheduler.timer
|
|
|
|
[Service]
|
|
Type=oneshot
|
|
User=www-data
|
|
ExecStart=/usr/bin/php /path/to/app/artisan schedule:run
|
|
```
|
|
|
|
---
|
|
|
|
## 🔌 API-Integration: Beispiele für externe Quellen
|
|
|
|
### Stadt Dresden API
|
|
|
|
```php
|
|
// In ImportEventsJob::fetchExternalEvents()
|
|
|
|
use Illuminate\Support\Facades\Http;
|
|
|
|
$response = Http::withHeaders([
|
|
'Accept' => 'application/json',
|
|
'User-Agent' => 'Dresden-EventPortal/1.0',
|
|
])->get('https://api.stadt-dresden.de/v1/events', [
|
|
'limit' => 1000,
|
|
'filter[status]' => 'published',
|
|
]);
|
|
|
|
$events = $response->json('data');
|
|
```
|
|
|
|
### iCal-Feed (z.B. von Google Calendar)
|
|
|
|
```php
|
|
use Spatie\IcalendarParser\InvitationParser;
|
|
|
|
$feed = file_get_contents('https://calendar.google.com/calendar/ical/.../public/basic.ics');
|
|
$event = InvitationParser::parse($feed);
|
|
|
|
foreach ($event as $entry) {
|
|
$events[] = [
|
|
'external_id' => $entry['uid'],
|
|
'title' => $entry['summary'],
|
|
'location' => $entry['location'] ?? 'TBD',
|
|
'description' => $entry['description'] ?? null,
|
|
'occurrences' => [
|
|
[
|
|
'start_datetime' => $entry['dtstart'],
|
|
'end_datetime' => $entry['dtend'] ?? null,
|
|
]
|
|
]
|
|
];
|
|
}
|
|
```
|
|
|
|
### Web-Scraping mit DOM-Crawler
|
|
|
|
```php
|
|
use Symfony\Component\DomCrawler\Crawler;
|
|
use Symfony\Component\HttpClient\HttpClient;
|
|
|
|
$client = HttpClient::create();
|
|
$response = $client->request('GET', 'https://example.com/events');
|
|
$html = $response->getContent();
|
|
|
|
$crawler = new Crawler($html);
|
|
$events = [];
|
|
|
|
$crawler->filter('.event-card')->each(function (Crawler $event) use (&$events) {
|
|
$events[] = [
|
|
'external_id' => $event->filter('[data-event-id]')->attr('data-event-id'),
|
|
'title' => $event->filter('.event-title')->text(),
|
|
'description' => $event->filter('.event-desc')->text(),
|
|
'location' => $event->filter('.event-location')->text(),
|
|
'occurrences' => [
|
|
[
|
|
'start_datetime' => $event->filter('[data-date]')->attr('data-date'),
|
|
]
|
|
]
|
|
];
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## 🔄 Upsert-Logik erklärt
|
|
|
|
Die App verwendet Laravel's `updateOrCreate()` für Event-Duplikat-Handling:
|
|
|
|
```php
|
|
// Suche Event mit (source_id, external_id)
|
|
// Falls existiert: Update mit neuen Daten
|
|
// Falls nicht: Erstelle neuen Record
|
|
|
|
$event = Event::updateOrCreate(
|
|
[
|
|
'source_id' => $source->id,
|
|
'external_id' => $externalData['external_id'],
|
|
],
|
|
[
|
|
'title' => $externalData['title'],
|
|
'description' => $externalData['description'] ?? null,
|
|
'location' => $externalData['location'],
|
|
// ... mehr Felder
|
|
]
|
|
);
|
|
|
|
if ($event->wasRecentlyCreated) {
|
|
// Neuer Event
|
|
} else {
|
|
// Event aktualisiert
|
|
}
|
|
```
|
|
|
|
**Vorteile:**
|
|
- ✅ Verhindert Duplikate (unique index auf `[source_id, external_id]`)
|
|
- ✅ Aktualisiert existierende Events
|
|
- ✅ Einfaches Handling bei mehreren Importen
|
|
- ✅ Atomare Operation (transaktional)
|
|
|
|
---
|
|
|
|
## 📊 Monitoring & Logging
|
|
|
|
### Job-Übersicht
|
|
|
|
```bash
|
|
# Anstehende Jobs in der Queue anschauen
|
|
php artisan queue:work --verbose
|
|
|
|
# Log-Output für Failure
|
|
tail -f storage/logs/laravel.log | grep ImportEventsJob
|
|
```
|
|
|
|
### Custom Queue-Monitor Dashboard
|
|
|
|
```php
|
|
// Beispiel: Dashboard für laufende Imports
|
|
|
|
Route::get('/admin/imports', function () {
|
|
$failed = \Illuminate\Support\Facades\DB::table('failed_jobs')
|
|
->where('queue', 'default')
|
|
->latest()
|
|
->limit(20)
|
|
->get();
|
|
|
|
$pending = \Illuminate\Support\Facades\DB::table('jobs')
|
|
->where('queue', 'default')
|
|
->count();
|
|
|
|
return response()->json([
|
|
'pending_jobs' => $pending,
|
|
'failed_jobs' => $failed,
|
|
]);
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## 🚀 Best Practices
|
|
|
|
### 1. Skalierung bei vielen Events
|
|
|
|
Für große Mengen an Events (1000+) pro Import:
|
|
- Nutze **Chunking**: `$externalEvents->chunk(100)`
|
|
- **Batch-Processing** mit `InsertOnDuplicateKeyUpdateCommand`
|
|
- Disable **Query Logging** im Job
|
|
|
|
```php
|
|
// In handle():
|
|
\Illuminate\Support\Facades\DB::disableQueryLog();
|
|
|
|
foreach ($externalEvents->chunk(100) as $chunk) {
|
|
foreach ($chunk as $event) {
|
|
$this->upsertEvent($event);
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Error Handling & Retries
|
|
|
|
```php
|
|
// In ImportEventsJob versuchweise 3x erneut:
|
|
class ImportEventsJob implements ShouldQueue
|
|
{
|
|
public $tries = 3;
|
|
public $backoff = [60, 300, 900]; // Backoff: 1min, 5min, 15min
|
|
}
|
|
```
|
|
|
|
### 3. Rate Limiting für externe APIs
|
|
|
|
```php
|
|
use Illuminate\Support\Facades\RateLimiter;
|
|
|
|
protected function fetchExternalEvents()
|
|
{
|
|
return RateLimiter::attempt(
|
|
'dresden-api-import',
|
|
$perMinute = 10,
|
|
function () {
|
|
return Http::get('https://api.stadt-dresden.de/events')->json();
|
|
},
|
|
$decay = 60
|
|
);
|
|
}
|
|
```
|
|
|
|
### 4. Transaction für Atomarität
|
|
|
|
```php
|
|
use Illuminate\Support\Facades\DB;
|
|
|
|
DB::transaction(function () {
|
|
foreach ($externalEvents as $externalEvent) {
|
|
$this->upsertEvent($externalEvent);
|
|
}
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## 🔍 Troubleshooting
|
|
|
|
### Queue-Jobs werden nicht verarbeitet
|
|
|
|
```bash
|
|
# 1. Checke Queue-Konfiguration
|
|
php artisan config:show queue
|
|
|
|
# 2. Starte einem Artisan Queue Worker
|
|
php artisan queue:work
|
|
|
|
# 3. Prüfe failed_jobs table
|
|
php artisan queue:failed
|
|
```
|
|
|
|
### Import schlägt fehl - Externe API nicht erreichbar
|
|
|
|
```php
|
|
// Nutze Http withoutVerifying für HTTPS-Fehler (nur dev!)
|
|
Http::withoutVerifying()->get('https://...');
|
|
|
|
// Oder mit Custom Timeout
|
|
Http::timeout(30)->get('https://...');
|
|
```
|
|
|
|
### Duplicate Key Errors
|
|
|
|
```php
|
|
// Prüfe Unique Index:
|
|
DB::raw('SHOW INDEX FROM events')
|
|
|
|
// Falls fehlt:
|
|
Schema::table('events', function (Blueprint $table) {
|
|
$table->unique(['source_id', 'external_id']);
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## 📚 Ressourcen
|
|
|
|
- [Laravel Queue Documentation](https://laravel.com/docs/queues)
|
|
- [Laravel Scheduler](https://laravel.com/docs/scheduling)
|
|
- [Laravel HTTP Client](https://laravel.com/docs/http-client)
|
|
- [Symfony DomCrawler (Web Scraping)](https://symfony.com/doc/current/components/dom_crawler.html)
|