Skip to content

DevOps vs SRE

JayBeeDe edited this page Jan 6, 2024 · 1 revision

Devops

Software development methodology that combines software development and operation in order to shorten the system development cycle while delivering features, fixes and update.

Process

Uses practice to automate processes which were previously manual and slow. They use a technological stack and tools that help them to operate and make operation on infrastructures that evolve.

Two possible schemas:

  • Developers are devops: freedom on deployments, but no administrator support for monitoring, security, redundancy and other service related to production.
  • Administrators (operational) are devops: they don’t have mandatory skills and/experience on programming to automate.

Devops Toolchain

Tool combination to contribute to development, distribution and software management:

  • Plan

Define indicators, requirements, measurements, metric, analysis and security policies

  • Create
    • Design et software configuration
    • Coding
    • Software build
    • Delivery preparation
  • Verify
    • Ensure quality:
    • Acceptance Criteria
    • Regression tests
    • Security & vulnerability analysis
    • Load & Performances tests
    • Resilience Tests
    • Configuration Tests
  • Package
    • Delivery Configuration
    • Delivery packaging
  • Release (livrer)
    • Programing, orchestrating, provisioning & software deploying:
    • Coordinating releases
    • Deploying and promoting applications
    • Backtracking
    • Scheduled releases Configure
    • Exploitation side of the Devops
    • Application Configuration
    • Storage Infrastructure Monitor
    • Infrastructure performances
    • User feedback

SRE

Site reliability engineers create a bridge between development and operations by applying a software engineering mindset (mentalité) to system administration topics.

  1. Encourages infra/product reliability, efficiency, scalability, accountability (responsibility), and innovation.

  2. SRE encourages highly motivated, dedicated and effective teamwork.

  3. Also work with release engineers to ensure that software delivery pipeline is as efficiency as possible.

Split their time between operations/on-call duties and development systems & software. At Google puts lot of emphasis on their SRE not spending more than 50% of their time on operations. 50% other part of the time is reserved for ops related work such as issues.

automate their way out a job

Build self-service tools for colleagues to rely on their service:

  • automatic provisioning of test environments (comes from telecoms at the origin: means act of acquiring a service but automatically)

  • logs & statistic visualization

Software Engineer with what is called operations. SRE prevents production incidents.

SLA (Service-Level agreement) at 99,99%. Which means that we tolerate time with errors/outage (eventually).

If Error/Outage Budget has been exceeded, then deployment is frozen till SLA quota is available again.

-> Possible to be moved to deployment team if fewer SRE needed.

Developing automated solutions for operational aspects such as: -on-call monitoring,

  • performance,
  • capacity planning,
  • disaster response.

Interesting and competitive career that allow to experience full power of devops .

If you are a system engineer and want to improve your programming skills , or if you are a software engineer and want to learn how to manage large-scale systems, this role is for you. The ideal SRE candidate is a highly skilled system administrator with knowledge of code and automation.

__

  • Safety = Sécurité = accidents (involontaires)
  • Security = Sûreté = actes malintentionnés (volontaires)

SRE vs Devops

SRE is a specific implementation of Devops with some extensions. Devops defines 5 key pillars of success:

SRE satisfies the Devops pillars as follow:
1.Reduce organizational silos - Shares ownership with developers to created shared responsibility- Use the same tools that developers use, and vice versa
2.Accept failure as normal - Take risk- Quantifies failure and availability (using SLI and SLO)
3.Implement gradual changes - Encourages developers and product owners to move quickly by reducing the cost of failure
4.Leverage tooling and automation - Automate menial (subalterns) tasks
5.Measure everything - Define prescriptive ways to measure values- Believe that systems operation is a software problem
__
  • SLI = Service Level Indicator: X should be true
  • SLO = Service Level Objective: Y proportion of the time
  • SLA = Service Level Agreement: Or else

https://blog.newrelic.com/engineering/best-practices-for-setting-slos-and-slis-for- modern-complex-systems/

SRE DO
Resp. Assurer le temps de disponibilité des systèmes/du service. Développement/tests infra pour suivre croissance rapide des systèmes.
Maintenir monitoring et alerte de l’infra Améliorer/automatiser configuration, déploiement, monitoring, gestion des incidents.
Résoudre des problèmes complexes et implanter solutions aux problèmes récurrents. => process Automatiser problèmes complexes. => script
Construire des outils internes
Configurer infrastructure.
Travailler de manière fonctionnelle avec les équipes de services et d'ingénierie.
Exig. Expérience confirmée du déploiement, gestion/exploitation évolutive et tolérante aux pannes. Configuration et exploitation de systèmes distribués.
Expertise des systèmes d'exploitation Linux. Assurer l'administration du système, la configuration et le dépannage d'environnement Linux.
Excellentes compétences en résolution de problèmes et en communication. Rigueur dans la qualité du code, les tests automatisés et autres bonnes pratiques d'ingénierie.
Analyse comparative des performances et outils de diagnostic.
Expérience avec CloudFormation, Kubernetes et Docker.
Expérience de support en tant qu'administrateur DevOps ou système pour des solutions SaaS commerciales. Expérience avec une infrastructure de diffusion en temps réel.
Injection de données, traitement de queues, calculs.
Expérience Ruby/Python, Scripting Ruby/Python/Bash.
Automatisation et surveillance des systèmes. Expérience avec les plates-formes de surveillance et d'alerte.
Expérience avec Cassandra (ou une autre alternative NoSQL). Connaissance pratique des bases de données relationnelles / SQL.
Expérience avec Puppet ou Chef. Expérience avec Puppet ou Chef.
Amazon Web Services. Amazon Web Services.
Expérience confirmée exploitation infrastructure basée sur JVM Optionnel : Expérience avec Java / Scala.
Clone this wiki locally