Categories
Disaster Recovery

**

Kubunjiniyela be-DevOps, Abaphathi be-Database (DBAs), kanye nabaklami bezinhlelo ze-IT, i-Recovery Time Objective (RTO) kanye ne-Recovery Point Objective (RPO) kungaphezu nje kwamagama asetshenziswa ebhizinisini—kuyizimiso eziqinile zobunjiniyela. Lapho uphatha ama-database abaluleke kakhulu, ukwehluleka ukubala ngokunembile, ukuhlela, nokuqinisekisa lezi zilinganiso kungaholela ekulahlekeni kwedatha okuyinhlekelele kanye nesikhathi sokuphumula eside.

Ezindaweni zanamuhla zebhizinisi, ukubala i-RTO ne-RPO kudinga ukuqonda okujulile kwangaphakathi kwe-database, i-storage I/O, ukuhamba kwenethiwekhi, kanye nemishini yamalogi okwenziwayo (transaction logs). Lo mhlahlandlela uhlola izindlela zobuchwepheshe zokubala, ukuhlola, nokuthuthukisa i-RTO ne-RPO yezinhlelo ze-database zokukhiqiza.

Ukuhlakaza i-RPO (Recovery Point Objective) Ezinhlelweni ze-Database

I-RPO ichaza inani eliphezulu elamukelekayo lokulahleka kwedatha elilinganiswa ngesikhathi. Uma i-RPO yakho ingemizuzu engu-15, inhlekelele eyenzeka ngo-12:00 PM isho ukuthi kufanele ukwazi ukubuyisela konke okwenziwe (committed transactions) kuze kube ngu-11:45 AM.

Ezinhlelweni ze-database, i-RPO inqunywa isu lakho lokuphatha amalogi okwenziwayo (WAL ku-PostgreSQL, Redo Logs ku-Oracle, Transaction Logs ku-SQL Server).

Imishini Yokulahleka Kwedatha Nokukhiqizwa Kwamalogi

Ukuze ubale i-RPO engafinyeleleka, kufanele uqale uqonde izinga lokukhiqizwa kwamalogi okwenziwayo database yakho. Uma uthumela amalogi endaweni yokugcina (backup repository) njalo ngemizuzu engu-15, kodwa inethiwekhi yakho ingakwazi ukudlulisa amalogi angemizuzu engu-15 phakathi naleso sikhathi, i-RPO yakho yangempela izoqhubeka yehla.

Ungabeka izinga lokukhiqizwa kwamalogi usebenzisa imiyalo yomdabu ye-SQL. Isibonelo, ku-PostgreSQL (inguqulo 10+), ungakala izinga lokukhiqizwa kwe-Write-Ahead Log (WAL) esikhathini esithile:

-- Run this at T=0
SELECT pg_current_wal_lsn() AS start_lsn;

-- Wait exactly 5 minutes (300 seconds), then run:
SELECT pg_current_wal_lsn() AS end_lsn,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), 'START_LSN_VALUE')) AS wal_generated_size,
       pg_wal_lsn_diff(pg_current_wal_lsn(), 'START_LSN_VALUE') / 300 AS bytes_per_second;

Uma lo mbuzo uveza ukuthi ukhiqiza u-50 MB/s wedatha ye-WAL phakathi nomthwalo omkhulu, i-RPO yemizuzu engu-15 idinga ukudluliswa kwedatha yamalogi engu-45 GB endaweni yakho yokugcina. Inethiwekhi yakho nezindawo zokugcina kufanele zisekele isivinini sokubhala esingaphezu kuka-50 MB/s ukuze ugcine le RPO.

Umthelela we-Synchronous vs. Asynchronous Replication

Abaningi be-DBA bathembele ku-High Availability (HA) replication ukuze banelise i-RPO. Nokho, i-replication akuyona i-backup. Ithebula elisusiwe (DROP TABLE users;) liyaphindwa ngokushesha.

Lapho usebenzisa i-replication ye-Disaster Recovery (DR), imodi ye-replication ithinta ngokuqondile i-RPO:
* Synchronous Replication: Iqinisekisa i-RPO enguziro (RPO=0). I-database eyinhloko ngeke iqinisekise ukwenziwa (commit) kuze kube yilapho isistimu esekelayo (standby) ivuma ukuthi ikutholile. Inkinga ukuthi kukhona ukubambezeleka (latency) ekubhaleni okuyinhloko.
* Asynchronous Replication: Iletha ukubambezeleka kwe-replication. I-RPO yakho ilingana nokubambezeleka kwakho kwe-replication kwamanje.

Ukuze ubheke ukubambezeleka kwe-asynchronous replication ku-PostgreSQL, sebenzisa:

SELECT application_name,
       client_addr,
       state,
       sync_state,
       pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;

Ukuhlakaza i-RTO (Recovery Time Objective) Ezinhlelweni ze-Database Ezinkulu

I-RTO yisikhathi eside kunazo zonke esamukelekayo sokuphumula. Ukubala i-RTO ye-database kuyinkimbinkimbi ngoba akusona nje isikhathi esithathwayo ukukopisha amafayela emuva kuseva.

Imodeli Yezibalo Yokubala i-RTO

Ukubalwa kwe-RTO ye-database okungokoqobo kufanele kubheke izigaba ezine ezihlukene:

RTO = T(infra) + T(transfer) + T(restore) + T(recovery)

  1. T(infra) – Ukulungiswa Kwengqalasizinda: Isikhathi sokuvula ikhompyutha nendawo yokugcina esikhundleni saleyo elahlekile. (Kungaba cishe kuziro ngezindawo ze-DR ezilungiswe kusengaphambili noma amapayipi e-Infrastructure-as-Code).
  2. T(transfer) – Ukudluliswa Kwedatha: Isikhathi sokuhambisa i-backup kusuka endaweni yokugcina kuya kuseva ye-database.
  3. T(restore) – Ukubuyiselwa Komzimba: Isikhathi sokubhala amafayela edatha kudiski eqondiwe.
  4. T(recovery) – Ukubuyiselwa Kwengozi ye-Database: Isikhathi sokuthi injini ye-database iphinde idlale amalogi okwenziwayo, iqhubekisele phambili ukwenziwa okuqinisekisiwe, futhi ibuyisele emuva okungakaqinisekiswa.

Ukubala Izikhathi Zokudlulisa Nokubuyisela

Ukuze ubale i-T(transfer) kanye ne-T(restore), kufanele ube nesisekelo somkhawulokudonsa wenethiwekhi yakho kanye ne-disk IOPS/throughput. Ungathembi izinombolo eziphezulu ezingokombono; hlola ingqalasizinda yakho yangempela.

Sebenzisa i-iperf3 ukuhlola ukuhamba kwenethiwekhi phakathi kwendawo yakho yokugcina (backup repository) neseva ye-database:

# On the backup repository (server)
iperf3 -s

# On the database server (client)
iperf3 -c <backup_repo_ip> -t 60 -P 4

Sebenzisa i-fio ukuhlola ukusebenza kokubhala okulandelanayo (sequential write) kwama-volume akho okugcina e-database, ulingisa ukusebenza kokubuyisela i-database:

fio --name=restore_sim --ioengine=libaio --rw=write --bs=1M --size=10G --numjobs=4 --iodepth=32 --direct=1 --filename=/var/lib/postgresql/data/testfile

Uma i-database yakho ingu-5 TB, futhi izivivinyo zakho ze-fio zibonisa isivinini sokubhala esiphezulu esingu-500 MB/s, i-T(restore) yakho encane kakhulu icishe ibe amahora angu-2.8. Uma i-SLA yebhizinisi lakho idinga i-RTO yehora elilodwa, ukubuyisela okujwayelekile (streaming restores) kuzohluleka. Kufanele ushintshe ingqalasizinda yakho ibe yizithombe zokugcina (storage-level snapshots) noma i-block-level replication.

Isicupho Esifihliwe: T(recovery)

Okuguquguqukayo okuvame ukubukelwa phansi kakhulu yi-T(recovery). Uma ubuyisela i-backup ephelele yeviki lonke futhi udinga ukusebenzisa amalogi okwenziwayo ezinsuku eziyisi-6 ukuze ufinyelele i-RPO yakho, injini ye-database kufanele iphinde idlale konke okwenziwe ngokulandelana.

Ukudlala kabusha amalogi okwenziwayo angu-500 GB kungathatha amahora, kubambezeleke kakhulu ngokusebenza kwe-CPU okukodwa (single-threaded) kanye ne-storage IOPS. Ukuze unciphise i-T(recovery), khulisa imvamisa yama-backup akho aphelele noma ahlukene.

Ukuvala Igebe: Izinyathelo Ezisebenzayo Zokuqinisekisa i-RTO ne-RPO

Ukubala i-RTO ne-RPO okungokombono kuyisinyathelo sokuqala kuphela. Izindawo ezibaluleke kakhulu zidinga ukuqinisekiswa okuqhubekayo.

Isinyathelo 1: Sebenzisa i-Continuous Archiving

Ukuze ufinyelele ama-RPO angaphansi komzuzu ngaphandle kokwehlisa ukusebenza kwe-synchronous replication, sebenzisa i-continuous log archiving. Esikhundleni sokulinda ifayela lelogi ukuthi ligcwale (okungathatha amahora ngezikhathi zethrafikhi ephansi), phoqelela ukushintsha kwamalogi ngezikhathi ezithile.

Ku-SQL Server, ungenza ngokuzenzakalelayo ama-backup amaningi e-Transaction Log:

BACKUP LOG [MissionCriticalDB] 
TO DISK = N'\BackupRepoSQLMissionCriticalDB_Log.trn' 
WITH NOFORMAT, NOINIT, 
NAME = N'MissionCriticalDB-Transaction Log Backup', 
SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10;

Isimiso Esihle: Hlela lo msebenzi ukuthi usebenze njalo ngemizuzu engu-1-5 kuye ngezidingo zakho ze-RPO.

Isinyathelo 2: Yenza Ukuhlolwa Kokubuyisela (Restore Testing) Kube Okuzenzakalelayo

I-backup engahloliwe ingumqondo nje ongokombono. Ukuze uqinisekise i-RTO yakho ebaliwe, kufanele wenze ukuhlolwa kokubuyisela okuzenzakalelayo.

Izinkundla zama-backup zebhizinisi ezifana ne-CloudSave zikwenza lokhu kube lula ngokunikeza ukuhlolwa kokubuyisela okuzenzakalelayo nokuzimele. I-CloudSave ingavula ngokuzenzakalelayo indawo ye-sandbox, ifake i-backup yakamuva, yenze ukubuyiselwa kwe-database okugcwele, futhi yenze imibhalo yokuqinisekisa yangokwezifiso (isb., DBCC CHECKDB ye-SQL Server) ukuze ikale i-RTO eqondile futhi iqinisekise ubuqotho bedatha. Lokhu kuguqula i-RTO isuke ekubeni ukuqagela okubaliwe ibe isilinganiso esifakazelwe nesibikwayo.

Isinyathelo 3: Bheka futhi Uqaphele Ukwephulwa kwe-SLA

Isitaki sakho sokuqapha (Prometheus, Datadog, Zabbix) kufanele silandele ngenkuthalo izilinganiso ezisongela ama-SLA akho e-RTO/RPO. Imithetho yezaziso kufanele ilungiselelwe lokhu okulandelayo:
* Ukwehluleka Komsebenzi we-Backup: Usongo olusheshayo ku-RPO.
* Ukubambezeleka Kokuthunyelwa Kwamalogi: Uma ukudluliswa kwelogi kuthatha isikhathi eside kunesikhathi sokukhiqiza.
* Ukunciphisa i-Storage IOPS: Abahlinzeki bamafu (njenge-AWS EBS) banciphisa i-IOPS uma amakhredithi okuqhuma (burst credits) ephelile, okuzobhubhisa buthule i-RTO yakho phakathi nesimo esiphuthumayo sangempela.

Ukuthuthukisa Ingqalasizinda ye-Database Backup ukuze Uhlangabezane nama-SLA Aqinile

Lapho izibalo zembula ukuthi ingqalasizinda yakho yamanje ayikwazi ukuhlangabezana nama-SLA ebhizinisi, kufanele uthuthukise isu lakho le-backup.

1. Sebenzisa ama-Block-Level Incremental Backups

Ama-dump e-database ajwayelekile (logical backups njenge-pg_dump noma mysqldump) ahamba kancane kakhulu kuma-RTO abaluleke kakhulu. Sebenzisa ama-backup omzimba, asezingeni le-block. Ama-block-level incremental backups akopisha kuphela amabhulokhi ediski ashintshile kusukela ku-backup yokugcina, okunciphisa kakhulu i-T(transfer) kanye nomthwalo wenethiwekhi.

2. Sebenzisa Izithombe Zokugcina (Storage Snapshots)

Ezinhlelweni ze-database ezinkulu (multi-terabyte) ezidinga i-RTO engaphansi kwemizuzu engu-15, ukukopisha amafayela okujwayelekile akunakwenzeka ngokomzimba phezu kwamanethiwekhi ajwayelekile. Ukuhlanganiswa ne-SAN noma izithombe zokugcina zamafu (isb., AWS EBS Snapshots, Pure Storage) kuvumela i-T(restore) esheshayo. Injini ye-database idinga kuphela ukwenza ukubuyiselwa kwengozi (crash recovery) kusithombe esithathiwe.

3. Sebenzisa i-Parallelism

Qinisekisa ukuthi amathuluzi akho e-backup nawokubuyisela asebenzisa i-multi-threading. Lapho ubuyisela i-database ye-PostgreSQL usebenzisa i-pgbackrest noma i-database ye-SQL Server, chaza ngokucacile imicu yabasebenzi abahambisanayo (parallel worker threads) ukuze ugcwalise inethiwekhi yakho kanye nomkhawulokudonsa wediski otholakalayo.

# Example of parallel restore in pgBackRest
pgbackrest --stanza=prod_db --process-max=8 restore

Isiphetho

Ukubala i-RTO ne-RPO yama-database abaluleke kakhulu kuwumsebenzi onzima wobunjiniyela bezinhlelo. Kudinga ukuthi ama-DBA adlule ekulungiselelweni okuzenzakalelayo kwe-backup futhi abale ngokwezibalo i-storage I/O yabo, umthamo wenethiwekhi, kanye nemishini yokubuyisela i-database.

Ngokubeka izisekelo zamazinga okukhiqizwa kwamalogi, ukuqonda izigaba ezihlukene zokubuyiselwa kwe-database, nokusebenzisa ukuhlolwa okuzenzakalelayo ngezinkundla eziqinile ezifana ne-CloudSave, amaqembu e-IT angaqinisekisa ngokuzethemba ama-SLA abo okubuyisela ezinhlekeleleni. Khumbula: emkhakheni wokuphathwa kwe-database, ithemba akulona isu, futhi ama-backup angahloliwe ayisikweletu.

Funda ukuthi onjiniyela be-DevOps nama-DBA bangabala kanjani ngokunembile, bahlole, futhi bathuthukise i-RTO ne-RPO yama-database abaluleke kakhulu besebenzisa imishini yokubuyisela ethuthukisiwe, amathuluzi e-CLI, nokuhlolwa okuzenzakalelayo.