DevSecOps

Development and operations

Dev (Development Team)

Primary Goal: Deliver new features, fix bugs, and meet product deadlines (speed & change).

Responsibilities:

Writing Code: Building applications, APIs, and services.

Adding Features: Implementing business requirements and user stories.

Unit Testing: Ensuring their code works in isolation.

Environment: Primarily worked on local machines and shared development servers.

Handoff: Threw the code "over the wall" to Ops with a deployment guide (or just a binary).

Mindset: "It works on my machine." If it breaks in production, it's likely an environment problem.

Ops (Operations Team)

Primary Goal: Keep the production environment stable, secure, and available (stability & control).

Responsibilities:

Infrastructure: Managing servers, storage, networking, and data centers (or early cloud VMs).

Deployments: Manually pushing code to production (often during strict change windows, e.g., "Tuesday at 2 AM").

Configuration: Managing OS settings, web servers (Apache/Nginx), application servers, and databases.

Monitoring: Watching CPU, memory, disk space, and uptime (infrastructure health, not application health).

Security & Backups: Patching OS vulnerabilities and restoring data from backups.

Mindset: "Don't touch the running system." Change is the enemy of uptime.

What is DevOps?

DevOps is a set of practices and culture that combines Development (the people who write code) and Operations (the people who run and maintain systems). The goal is simple but powerful: deliver software faster, more reliably, and continuously.

Let's look at the core principles:

Collaboration: Developers and Operations teams stop being seperated. They work as one team, sharing responsibility for the entire lifecycle.

Continuous Everything:
- Integration: code is added frequently
- Delivery: always ready to release
- Deployment: automatically released
- Monitoring: always tracking system performance

Feedback Loops: Problems are reported immediately so they can be fixed quickly.

Automation: Everything from building code, to testing it, to deploying it is automated. No more manual clicking, no more human error in deployments.

What is DevSecOps?

DevSecOps is the natural evolution of DevOps.

DevOps = Speed + Automation: The focus is on delivering features quickly and reliably.

DevSecOps = Speed + Automation + Security: We add security as a core component, not an afterthought. The goal is to deliver features quickly, reliably, and securely.

The key concept here is Shift Left Security.

CI/CD Pipeline

What is the basic problem CI/CD solves?

When people write software, they make many small changes over time. Each change can break the software. Without automation, people have to manually test and move each change to the live system. This is slow and error-prone. CI/CD automates these steps.

Step 1: Understand the pieces of software work

Before CI/CD, you need to know these terms:

Source code: The text instructions written by developers.

Repository: A folder where source code is stored, usually on a server so multiple people can share it.

Build: The act of converting source code into a runnable program. For some languages (like Python), this step is minimal. For others (like C++), it creates executable files.

Test: A small program that checks if a specific part of the software works correctly.

Deployment: The act of moving the built software onto a server where users can access it.

Production: The live environment where real users interact with the software.

Step 2: What is Continuous Integration (CI)?

Continuous Integration means: every time a developer saves a new change to the shared repository, an automated system immediately does three things:

Gets the latest code from the repository.

Builds the software from that code.

Runs all tests on the built software.

If the build fails or any test fails, the system sends an alert to the developers. The developers must fix the problem before moving forward.

Why this matters: Without CI, developers might work for weeks before combining their changes. When they finally combine them, many conflicts and failures appear at once, and it takes a long time to find the cause. With CI, problems are found within minutes of being added.

Step 3: What is Continuous Delivery (CD)?

Continuous Delivery means: after the CI steps pass successfully, the system automatically prepares the software for release to production. This includes:

Creating a deployable package (a single file or folder containing everything needed to run the software).

Deploying that package to a staging environment (a copy of production used only for final testing).

Running additional tests on the staging environment (e.g., tests that check how the software behaves with many users, or tests that check security).

Storing the package in an artifact repository (a storage system for deployable packages).

At this point, the package is ready. A human must manually click a button to send it to production. That manual step is the difference between Continuous Delivery and Continuous Deployment.

Step 4: What is Continuous Deployment (also CD)?

Continuous Deployment removes the manual button. If all CI and staging tests pass, the system automatically deploys the package to production immediately.

This requires more trust in the tests and better monitoring, because there is no human to stop a bad change before it reaches users.

Test Types

1. Unit Tests

What they check: Individual small pieces of code, such as a single function or a single method in a class. Unit tests verify that each piece works correctly in isolation, separate from the rest of the system.

When they run: Very early in the CI pipeline, usually within seconds of the build completing.

How they work: The test calls a function with specific inputs and checks that the output matches the expected value.

Example: A function that adds two numbers. A unit test calls it with inputs 2 and 3 and checks that the result is 5.

Who writes them: Developers, at the same time they write the code.

Failure meaning: A specific small part of the code is logically wrong.

2. Integration Tests

What they check: Whether multiple pieces of code work correctly when combined. For example, does the database access code correctly talk to the actual database? Does the payment module correctly call the shipping module?

When they run: After unit tests pass. They run in the CI pipeline but require external systems (like a database or a message queue) to be available.

How they work: The test sets up real dependencies (or lightweight versions of them), runs a sequence of operations across multiple components, and verifies the final state.

Example: A test that creates a user account in the database using the user registration function, then logs in with that account, and checks that the login succeeds.

Failure meaning: Components are not communicating correctly. The interfaces between them have mismatches.

3. Static Analysis Tests

What they check: The source code itself, without running the program. These tests look for style violations, potential bugs, security issues, or code that does not follow team rules.

When they run: Before or during the build step, because they do not require the program to run.

How they work: A tool reads the source code files and checks them against a set of rules.

Examples of rules:

No unused variables

No hardcoded passwords in code

Consistent indentation and spacing

Functions are not too long

Failure meaning: The code is poorly formatted or contains a potential risk, even if it would run correctly.

4. Security Tests (SAST and DAST)

These come in two forms:

SAST (Static Application Security Testing):

Runs on source code without executing the program.

Looks for known insecure patterns, such as SQL injection vulnerabilities or unsafe handling of user input.

DAST (Dynamic Application Security Testing):

Runs on a running application (usually in staging).

Sends malicious or malformed inputs to the application and checks if it responds insecurely.

When they run: SAST runs early in CI (similar to static analysis). DAST runs later, after the application is deployed to a staging environment.

Failure meaning: The software contains a security vulnerability that could be exploited by an attacker.

Where Security Risks Appear - Code

Now let's look at the specific security risks that can appear at each stage of this pipeline. First, at the Code stage.

The biggest risks here are things developers do accidentally or without thinking about security:

Hardcoded credentials: This is a classic, and it's still incredibly common.
A developer needs to connect to a database, so they write the username and password directly into the code.
Now that secret is in the codebase, accessible to anyone who can see the code.
If the code is in a public repository, it's instantly compromised.

Insecure coding practices: This includes things like not validating user input, which can lead to SQL injection or cross-site scripting attacks.

Dangerous functions: Some programming functions are inherently dangerous because they can execute arbitrary code. The example here is eval() in Python.

So what's the right way to handle secrets?

You never hardcode them. Instead, you load them from a secret manager or environment variables.

Where Security Risks Appear - Build

Next, risks appear at the Build stage. This is when we pull in dependencies external libraries that our code relies on.

"The risks here are:

Malicious or compromised dependencies: An attacker could compromise a popular open-source library and inject malicious code into it. When your build pulls in that library, you're unknowingly including the malicious code.

Package hijacking / typosquatting: This is a sneaky attack. An attacker publishes a package with a name very similar to a popular one. For example, if you meant to install requests, but you accidentally type requestz (with a 'z'), you might install the attacker's malicious package. This is called typosquatting.

"The code example shows outdated library versions. requests==2.19.0 and pyaml==5.1 are old versions with known vulnerabilities (CVEs). If you don't regularly update your dependencies, you're building known vulnerabilities into your application."

Where Security Risks Appear - Test

Risk 1: No security test coverage

You run unit tests (test business logic) and integration tests (test components).

But you never run security tests – SAST, DAST, dependency scans.

Result: Vulnerabilities sail through because no one looks for them.

Example: A developer adds a new API endpoint that accidentally exposes user emails. Unit tests pass because they only check HTTP 200 status. A SAST tool would have flagged the missing access control.

Risk 2: Bypassing quality/security gates

In many CI systems (GitLab, GitHub Actions, Jenkins), you can add [skip ci] to a commit message.

The pipeline sees that and skips all tests.

An attacker who compromises a developer's account could push code with [skip ci] and the tests would never run.

Example: A malicious insider adds a backdoor and commits with git commit -m "Hotfix [skip ci]". The pipeline does not run SAST. The backdoor reaches production.

Security-relevant observations for the slide’s example:

No security testing steps are present, no SAST, no DAST, no dependency scanning, no container scanning, no secret detection.

The npm test --if-present step may execute nothing if the project lacks a test script definition.

The workflow makes no attempt to prevent or detect [skip ci] usage. Any commit containing [skip ci] in its message will skip this entire workflow execution.

Countermeasure: Make security scans mandatory (not skippable) and enforce that tests must pass before merge.

Where Security Risks Appear - Deploy

The Deploy stage is where configuration mistakes happen.

This is particularly common with containerized environments like Kubernetes and Docker.

Look at this security context.

privileged: true means the container has full, unrestricted access to the host machine.

runAsUser: 0 means it's running as root. If an attacker compromises this container, they own the entire host machine.

Also note the hostPath volume. This mounts a real folder from the host machine into the container

/etc = very sensitive directory
- Contains configs like:
  - users
  - passwords (hashed)
  - system settings

So the container can:

Read host configs

Modify them

Another deployment risk is exposed cloud storage.

This is a Terraform configuration for AWS S3.

acl = "public-read" makes this bucket readable by anyone on the internet. This is a classic misconfiguration. Someone intended to store logs, but accidentally made them public. Sensitive data gets exposed.

You see headlines about this all the time: 'Company X exposed millions of customer records in an unsecured S3 bucket.' This is how it happens.

Where Security Risks Appear - Monitor

Finally, at the Monitor stage, the risk is that we're logging sensitive information.

logger.info("Login for user", {
 email: req.body.email,
 password: req.body.password, // sensitive!
 token: user.sessionToken // secret!
});

Look at this logging line. It's logging the user's email, password, and session token. The password is sensitive. The session token is a secret that can be used to impersonate the user. Logging this is a massive security violation. Anyone with access to the logs—which might include many developers and operators—now has these secrets.

Good practice: Never log passwords, tokens, or personally identifiable information (PII) like credit card numbers or social security numbers. If you need to log something for debugging, redact it or use a unique identifier instead.

Why SAST is Important

Look at this flow in the slide. A developer writes code and pushes it to the code hosting platform (like GitHub). That push triggers the CI/CD pipeline. As part of that pipeline, a SAST tool runs automatically. Within minutes, the developer gets a report of any vulnerabilities found.

This is powerful for two reasons:

Real-time feedback: Developers get feedback immediately, when the code is still fresh in their minds. They can fix the issue right away.

Prevents security from being an afterthought: When security testing is automated and runs on every commit, security becomes part of the normal development process, not a separate gate at the end.

Basic Techniques to Automate SAST

One of the simplest ways to start automating security checks is using regular expressions, or regex.

"This is a simple grep command. grep -ir 'eval(' searches recursively through all files for the string 'eval('.

The example finds the eval() call in the Python file. This is a quick way to find a dangerous pattern.

But this is a very basic approach. It's like using a flashlight to search a dark room. It works, but it's limited.

Can Regex Catch It All?

Let's see the limitations of regex. Here are different uses of eval() in Python.

[Walk through each example]

eval("ls") → This is a string. It's actually just a string, not a call to eval(). Regex would flag it as a false positive.

eval(var) → This is a real call. Regex can catch this.

eval(var) with spaces → This is still a real call, but the regex pattern eval( might not catch it if there's a space.

safe_eval(var) → This is a different function name. It's safe, but a regex that matches 'eval(' would incorrectly flag it.

A comment with eval(var) → This is in a comment, not actual code. Regex would flag it incorrectly.

A string print("eval(var)") → This is a string being printed, not code execution. Regex would flag it incorrectly.

So, if regex isn't enough, what's the solution? What do SAST tools use to understand code at a deeper level? The answer is the Abstract Syntax Tree, or AST. Let's explore that.

Abstract Syntax Tree (AST) - Introduction

An Abstract Syntax Tree is a data structure that represents the structure of a program. Think of it as a map of your code that shows how all the pieces relate to each other.

Instead of seeing code as a flat sequence of characters, the AST represents it as a tree. Each node in the tree is a piece of the program: a function definition, a variable, a function call, an operator.

The advantages of working with ASTs are:

Contextual Understanding: The AST knows what each part of the code is. It knows if something is a function call, a variable assignment, or just a string. This eliminates the false positives that plague regex.

Understanding Control Flow and Data Flow: The AST can track how data moves through the program. Where does a variable come from? Where does it go? This is essential for detecting complex vulnerabilities.

Comprehensive Code Analysis: With an AST, you can analyze the entire codebase structurally, not just search for patterns.

Abstract Syntax Tree (AST) - Example

Let's look at a concrete example. This is a simple Python function that uses eval().

The AST for this code is a tree structure. At the top is the function definition node. Inside it, there are nodes for the assignment to data, the call to eval(), and the return statement.

With an AST, a SAST tool can see that eval() is being called with data['expressions'] as its argument.

It can then ask: where does data['expressions'] come from? It comes from user input via request.get_json(). Now the tool knows that user-controlled data is flowing into a dangerous eval() function. That's a real vulnerability not a false positive.

AST - Rule Example

Here's a simple rule in a SAST tool like Semgrep. The rule is: find any call to eval().

The pattern: eval(...) tells the tool to look for function calls where the function name is eval. The tool uses the AST to identify this structure precisely.

This is still a simple rule, but because it's based on the AST, it won't match strings, comments, or different function names. It's more accurate than regex.

Slide 30: AST - Challenges

Building a SAST tool that uses AST analysis is not trivial. There are significant challenges:

Complexity of AST Structures: Each programming language has its own AST structure. Python's AST is different from JavaScript's, which is different from Java's.

Language Variability: New languages emerge. Existing languages evolve. The AST for Python 3.8 is different from Python 3.12.

Handling Language Evolution: When a language adds a new feature, the AST changes. Your parser needs to keep up.

Scalability Concerns: Analyzing a million-line codebase requires efficient algorithms. AST traversal can be expensive.

Patterns Matching: Writing rules that find vulnerabilities without generating too many false positives is an art. You need to balance detection and noise.

This is why most organizations don't build their own SAST tools. They use existing solutions.

Slide 31: SAST Tools

The key takeaway here is that building a SAST tool from scratch is complex and resource-intensive. It's not something most organizations should attempt.

Instead, we leverage existing solutions. There are many SAST tools available, both open-source and commercial. They differ in:

Language support: Some tools support 10+ languages, others are specialized.

False positive rates: This is a critical metric. A tool with high false positives will be ignored by developers.

Analysis techniques: Some do simple pattern matching. Others do sophisticated taint analysis across multiple files.

Simple Analysis vs Taint Analysis

Simple Analysis: Examines code based on syntax and structure. It identifies common coding flaws using predefined rules. A simple rule might be: 'if you see eval(), flag it.' This is fast but can produce false positives.

Taint Analysis: This is more sophisticated. It tracks the flow of untrusted or tainted data through a program.
The analysis asks: where does this data come from? Is it from a trusted source (like a configuration file) or an untrusted source (like user input or an environment variable)? And where does it go? If untrusted data reaches a dangerous function like eval(), that's a real vulnerability.

Taint analysis is what makes SAST tools truly powerful. It finds issues that simple pattern matching misses.

Simple Analysis vs Taint Analysis - Example

Code example:

import os

def get_hostname():
    expression_from_env = os.environ.get('EXPRESSION')
    return eval(expression_from_env)

Simple analysis:

Sees eval(expression_from_env) and flags it.

It does not care where expression_from_env came from.

Result: Finds the issue. Good enough for many cases.

Taint analysis:

Marks os.environ.get() as a source of untrusted data (environment variables can be set by attackers).

Marks eval() as a sink (dangerous function).

Traces data flow: os.environ.get() → expression_from_env → eval().

No sanitization function (like escape()) in between.

Result: Flags as critical because tainted data reaches a dangerous sink.

The Bandit output on the slide shows taint analysis in action:

Issue: [B307:blacklist] Use of possibly insecure function - eval
Severity: Medium
Confidence: High
CWE: CWE-78 (OS Command Injection)
Location: test.py:5

Bandit (a Python SAST tool) uses taint analysis. It knows that os.environ.get() is a source.

Why taint analysis is more powerful:

Simple analysis might miss eval(os.environ.get('EXPRESSION')) if the variable is renamed or passed through functions.

Taint analysis follows the data across assignments, function calls, and even across files (in advanced tools).

Why taint analysis is harder:

You must define sources, sinks, and sanitizers for every language.

You must handle aliasing (x = y; y = user_input; eval(x)).

You must handle control flow (if user_input: eval(user_input) else: safe).

Many SAST tools (including open-source Semgrep) have limited taint analysis in their free version.

Slide 34: Taint Analysis

(Silent slide - allows students to absorb the previous one.)

SAST Tools Comparison

License Type: Open Source vs Commercial: Open-source tools like Semgrep are free and transparent. Commercial tools like Checkmarx or Fortify offer support and enterprise features. The choice depends on your budget and needs.

Language Coverage: Evaluate based on your application stack. If you use Python, Go, and JavaScript, you need a tool that supports all three. If you use a niche language, you need specific support.

Flexibility: Can the tool allow custom rules? In many cases, you'll want to write rules for organization-specific patterns or banned APIs. Semgrep excels here.

Ease of Use: How easy is it to run the tool? Does it integrate with your IDE? Can developers run it locally? If it's hard to use, developers won't use it.

Integration: Does the tool work with your CI/CD systems? GitHub Actions, GitLab CI, Jenkins? This is critical for automation.

Analysis Method: What kind of analysis does it do? Simple pattern matching? Taint analysis across a single file? Taint analysis across multiple files? Cross-file analysis is the most powerful but also the most resource-intensive."

What is Semgrep?

Semgrep is a fast, open-source static analysis tool.

Originally called sgrep (sgrep = ‘Semantic grep’), written at Facebook in 2009 to enforce nearly 1000 rules.

Supports 30+ languages – Python, JavaScript, Java, Go, Ruby, Rust, etc.

Can run in your IDE (VS Code extension), as a pre-commit hook, or in CI/CD workflows.

Easy to install: pip install semgrep or brew install semgrep.

Actively maintained by ReturnToCorp (r2c) and the community.

Semgrep’s killer feature: custom rules are very easy to write using a YAML syntax that looks like the code you’re searching for.

Semgrep Registry & Rules

The Semgrep Registry is a public collection of ready-made security rules for the static analysis tool Semgrep.

You can browse the registry, find rules relevant to your stack, and start using them immediately. For example, there are rules for finding hardcoded secrets, SQL injection vulnerabilities, dangerous function usage, and many other issues.

It’s basically a library of detection rules that help you find:

Security vulnerabilities (like XSS, SQL injection)

Bad coding practices

Misconfigurations

Instead of writing rules from scratch, you can:

Browse the registry

Pick rules relevant to your language (Python, JavaScript, etc.)

Run them directly on your code

And of course, you can also write your own custom rules for organization-specific needs.

Example (exploring the website)

1. Go to the Semgrep Registry (website)

Search for:

“Python eval”

or browse Python → Security rules

You’ll find rules like:

python.lang.security.audit.eval

2. Open the eval rule

When you click it, you’ll see:

Rule details

What vulnerability it detects (unsafe eval)

Why it’s dangerous

Examples of bad code

Pattern (important part)

You’ll see something like:

pattern: eval(...)

Sometimes more advanced patterns too (taint tracking, etc.)

3. Try it directly in the browser

The website usually has a “Try this rule” or playground editor.

You can paste code like:

user_input=input()
result=eval(user_input)

It will highlight the vulnerable line immediately.

4. Copy the CLI command (Run Locally)

Each rule page gives you a ready command:

semgrep --config=p/python.lang.security.audit.eval

So you don’t need to memorize anything.

Example (Using from the registry in your code)

1. Vulnerable Python example

user_input=input("Enter something: ")
result=eval(user_input)
print(result)

If a user enters:

__import__('os').system('rm -rf /')

You just gave them code execution

2. Run Semgrep using a registry rule

With Semgrep, you don’t need to write anything.

Run this command:

semgrep --config r/python.lang.security.audit.eval-detected .

p/... → pulls rule from registry

eval → targets unsafe eval usage

. → scans your current project

3. Example output

You’ll see something like:

eval-use:
  Dangerous use of eval detected
  --> app.py:2
   result = eval(user_input)

4. What rule is actually doing

Behind the scenes, the rule looks for patterns like:

eval(...)

But smarter than simple grep—it understands structure (AST), not just text.

5. Pro tip (real-world usage)

You can scan for multiple Python issues at once:

semgrep --config=p/python .

This includes:

eval misuse

subprocess issues

hardcoded secrets

etc.

semgrep scan --config auto
Runs Semgrep with automatic configuration selection
Semgrep decides which rules to run based on your codebase (language, frameworks, etc.)
Good for general-purpose scanning without specific rule knowledge
semgrep --config=p/python
Runs Semgrep using the "p/python" rule pack (community rules for Python)
Targets Python-specific security and correctness issues
More focused and deterministic than auto

GitHub Workflow

Running Semgrep locally is great, but in DevSecOps, we want automation. We want security tests to run on every code change. That's where CI/CD systems like GitHub Actions come in.

GitHub Actions is a continuous integration and continuous delivery platform built into GitHub. It allows you to automate your build, test, and deployment pipeline.

In GitHub Actions, workflows are defined by YAML files stored in your repository.

Specifically, they go in the .github/workflows directory. A repository can have multiple workflows for different purposes.

Each workflow has basic components:

Events: What triggers the workflow? A push? A pull request? A schedule?

Jobs: A workflow consists of one or more jobs. Jobs run in parallel by default.

Steps: Each job consists of steps. Steps can be either a script (like running a shell command) or an action (a reusable unit of code from the marketplace).

GitHub Workflow - Events

Events are the triggers that cause a workflow to run. Here are some common events:

Push: When code is pushed to a branch.

Pull_Request: When a pull request is opened, reopened, edited, or closed.

Issues: When an issue is opened, deleted, transferred.

Fork: When someone forks the repository.

You can also trigger workflows on a schedule (like daily scans) or manually. The complete list of events is available in the GitHub documentation.

GitHub Workflow – Runner

A runner is a server that executes your workflow jobs.

GitHub provides hosted runners:

Ubuntu Linux (most common)

Windows

macOS

You can also host your own self-hosted runners if you need special hardware or security isolation.

Each runner can run one job at a time. For a small team, GitHub’s free runners are sufficient.

GitHub Workflow - Actions

An action is a custom application for the GitHub Actions platform that performs a complex but frequently repeated task.

For example, there's an action to set up Python, an action to check out your code, and importantly an action to run Semgrep. Instead of writing complex scripts to install and run Semgrep, you can use a pre-built action from the GitHub Marketplace.

The marketplace has thousands of actions. You can find actions for security scanning, code quality, deployment, and much more.

Full Example: GitHub Workflow (.github/workflows/semgrep.yml)

Inside your project folder, create this path:

your-project/
├── .github/
│   └── workflows/
│       └── semgrep.yml   ← HERE
├── app.py
├── requirements.txt

The key rule:

Folder must be named .github/workflows/

File must be a .yml or .yaml

name: Semgrep Security Scan

# -----------------------------
# EVENTS (Triggers)
# -----------------------------
on:
  push:
    branches: ["main","dev" ]
  pull_request:
    branches: ["main"]
  schedule:
    - cron:"0 2 * * *"# Runs daily at 2 AM UTC
  workflow_dispatch:# Manual trigger

# -----------------------------
# JOBS
# -----------------------------
jobs:
  semgrep-scan:

# -----------------------------
# RUNNER
# -----------------------------
    runs-on: ubuntu-latest

# -----------------------------
# STEPS
# -----------------------------
    steps:

# Step 1: Checkout repository code
      - name: Checkout code
        uses: actions/checkout@v4

# Step 2: Set up Python (required for Semgrep)
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version:"3.11"

# Step 3: Install Semgrep
      - name: Install Semgrep
        run: pip install semgrep

# Step 4: Run Semgrep scan
      - name: Run Semgrep scan
        run: semgrep scan --config=auto

# Step 5: Upload results (optional)
      - name: Upload scan results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: semgrep-results
          path: semgrep-report.json

Breakdown (mapped to concepts)

1. Events (Triggers)

on:
  push:
  pull_request:
  schedule:
  workflow_dispatch:

This means the workflow runs when:

You push code

You open/update a PR

Every day automatically (cron job)

You manually click "Run workflow"

2. Runner

runs-on: ubuntu-latest

This is the machine that executes your job

GitHub spins up a fresh Linux VM for you

Other options:

windows-latest

macos-latest

3. Jobs

jobs:
  semgrep-scan:

This defines a job called semgrep-scan

You can add multiple jobs (e.g., build, test, deploy)

Jobs run in parallel by default

4. Steps

Each job is made of steps:

Action example

uses: actions/checkout@v4

This is a prebuilt action

It pulls your repo code into the runner

Script example

run: pip install semgrep

This runs a shell command directly

5. Actions (Marketplace)

Examples used:

actions/checkout → clone repo

actions/setup-python → install Python

actions/upload-artifact → save results

Instead of writing everything manually, you reuse these.

As soon as the file is pushed:

GitHub automatically detects it

Then:

It appears in the Actions tab

It runs based on your triggers (push, PR, etc.)

Real-world mental model

Your repo becomes like this:

Code → Push → GitHub Actions → Security Scan → Result

Demo to try

Step 1: Make sure your structure is correct

Inside your project folder, you should have:

semgrep-demo/
├── demo.py
├── eval_rule.yaml
└── .github/
    └── workflows/
        └── semgrep.yml

If this structure is wrong, the workflow will not run.

Step 2: Initialize Git (if you didn’t already)

In VS Code terminal:

git init
git add .
git commit -m "Initial commit with Semgrep demo"

Step 3: Create a github account and create a repo

Set a name for you repo (e.g. semgrep-demo)

Step 4: Connect to your GitHub repo

If you already created a repo on GitHub, link it:

git remote add origin https://github.com/<your-username>/semgrep-demo.git
git branch -M main
git push -u origin main

Step 4: Trigger the workflow

As soon as you push:

The workflow runs automatically

Go to:

Your repo on GitHub

Click Actions tab

You should see:

Semgrep Security Scan

Step 5: Check results

Click the workflow run → open the job → look at logs

You should see something like:

Avoid using eval() - it's unsafe!

Step 6:

If you wanna make any changes to your code locally then upload it to GitHub:

git add .
git commit -m "Your message"
git push origin main

git add stages your changes, preparing them for a snapshot.

git commit takes the snapshot of the staged changes and saves it to your local project history.

git push uploads your local commits to a remote repository (like GitHub or GitLab) to share them with others.

Then it will automate the process and scan your code automatically.

DevSecOps

Development and operations

Dev (Development Team)

Primary Goal: Deliver new features, fix bugs, and meet product deadlines (speed & change).

Responsibilities:

Writing Code: Building applications, APIs, and services.

Adding Features: Implementing business requirements and user stories.

Unit Testing: Ensuring their code works in isolation.

Environment: Primarily worked on local machines and shared development servers.

Handoff: Threw the code "over the wall" to Ops with a deployment guide (or just a binary).

Mindset: "It works on my machine." If it breaks in production, it's likely an environment problem.

Ops (Operations Team)

Primary Goal: Keep the production environment stable, secure, and available (stability & control).

Responsibilities:

Infrastructure: Managing servers, storage, networking, and data centers (or early cloud VMs).

Deployments: Manually pushing code to production (often during strict change windows, e.g., "Tuesday at 2 AM").

Configuration: Managing OS settings, web servers (Apache/Nginx), application servers, and databases.

Monitoring: Watching CPU, memory, disk space, and uptime (infrastructure health, not application health).

Security & Backups: Patching OS vulnerabilities and restoring data from backups.

Mindset: "Don't touch the running system." Change is the enemy of uptime.

What is DevOps?

Let's look at the core principles:

Collaboration: Developers and Operations teams stop being seperated. They work as one team, sharing responsibility for the entire lifecycle.

Continuous Everything:
- Integration: code is added frequently
- Delivery: always ready to release
- Deployment: automatically released
- Monitoring: always tracking system performance

Feedback Loops: Problems are reported immediately so they can be fixed quickly.

Automation: Everything from building code, to testing it, to deploying it is automated. No more manual clicking, no more human error in deployments.

What is DevSecOps?

DevSecOps is the natural evolution of DevOps.

DevOps = Speed + Automation: The focus is on delivering features quickly and reliably.

DevSecOps = Speed + Automation + Security: We add security as a core component, not an afterthought. The goal is to deliver features quickly, reliably, and securely.

The key concept here is Shift Left Security.

CI/CD Pipeline

What is the basic problem CI/CD solves?

Step 1: Understand the pieces of software work

Before CI/CD, you need to know these terms:

Source code: The text instructions written by developers.

Repository: A folder where source code is stored, usually on a server so multiple people can share it.

Build: The act of converting source code into a runnable program. For some languages (like Python), this step is minimal. For others (like C++), it creates executable files.

Test: A small program that checks if a specific part of the software works correctly.

Deployment: The act of moving the built software onto a server where users can access it.

Production: The live environment where real users interact with the software.

Step 2: What is Continuous Integration (CI)?

Continuous Integration means: every time a developer saves a new change to the shared repository, an automated system immediately does three things:

Gets the latest code from the repository.

Builds the software from that code.

Runs all tests on the built software.

If the build fails or any test fails, the system sends an alert to the developers. The developers must fix the problem before moving forward.

Step 3: What is Continuous Delivery (CD)?

Continuous Delivery means: after the CI steps pass successfully, the system automatically prepares the software for release to production. This includes:

Creating a deployable package (a single file or folder containing everything needed to run the software).

Deploying that package to a staging environment (a copy of production used only for final testing).

Running additional tests on the staging environment (e.g., tests that check how the software behaves with many users, or tests that check security).

Storing the package in an artifact repository (a storage system for deployable packages).

At this point, the package is ready. A human must manually click a button to send it to production. That manual step is the difference between Continuous Delivery and Continuous Deployment.

Step 4: What is Continuous Deployment (also CD)?

Continuous Deployment removes the manual button. If all CI and staging tests pass, the system automatically deploys the package to production immediately.

This requires more trust in the tests and better monitoring, because there is no human to stop a bad change before it reaches users.

Test Types

1. Unit Tests

When they run: Very early in the CI pipeline, usually within seconds of the build completing.

How they work: The test calls a function with specific inputs and checks that the output matches the expected value.

Example: A function that adds two numbers. A unit test calls it with inputs 2 and 3 and checks that the result is 5.

Who writes them: Developers, at the same time they write the code.

Failure meaning: A specific small part of the code is logically wrong.

2. Integration Tests

When they run: After unit tests pass. They run in the CI pipeline but require external systems (like a database or a message queue) to be available.

How they work: The test sets up real dependencies (or lightweight versions of them), runs a sequence of operations across multiple components, and verifies the final state.

Example: A test that creates a user account in the database using the user registration function, then logs in with that account, and checks that the login succeeds.

Failure meaning: Components are not communicating correctly. The interfaces between them have mismatches.

3. Static Analysis Tests

What they check: The source code itself, without running the program. These tests look for style violations, potential bugs, security issues, or code that does not follow team rules.

When they run: Before or during the build step, because they do not require the program to run.

How they work: A tool reads the source code files and checks them against a set of rules.

Examples of rules:

No unused variables

No hardcoded passwords in code

Consistent indentation and spacing

Functions are not too long

Failure meaning: The code is poorly formatted or contains a potential risk, even if it would run correctly.

4. Security Tests (SAST and DAST)

These come in two forms:

SAST (Static Application Security Testing):

Runs on source code without executing the program.

Looks for known insecure patterns, such as SQL injection vulnerabilities or unsafe handling of user input.

DAST (Dynamic Application Security Testing):

Runs on a running application (usually in staging).

Sends malicious or malformed inputs to the application and checks if it responds insecurely.

When they run: SAST runs early in CI (similar to static analysis). DAST runs later, after the application is deployed to a staging environment.

Failure meaning: The software contains a security vulnerability that could be exploited by an attacker.

Where Security Risks Appear - Code

Now let's look at the specific security risks that can appear at each stage of this pipeline. First, at the Code stage.

The biggest risks here are things developers do accidentally or without thinking about security:

Hardcoded credentials: This is a classic, and it's still incredibly common.
A developer needs to connect to a database, so they write the username and password directly into the code.
Now that secret is in the codebase, accessible to anyone who can see the code.
If the code is in a public repository, it's instantly compromised.

Insecure coding practices: This includes things like not validating user input, which can lead to SQL injection or cross-site scripting attacks.

Dangerous functions: Some programming functions are inherently dangerous because they can execute arbitrary code. The example here is eval() in Python.

So what's the right way to handle secrets?

You never hardcode them. Instead, you load them from a secret manager or environment variables.

Where Security Risks Appear - Build

Next, risks appear at the Build stage. This is when we pull in dependencies external libraries that our code relies on.

"The risks here are:

Malicious or compromised dependencies: An attacker could compromise a popular open-source library and inject malicious code into it. When your build pulls in that library, you're unknowingly including the malicious code.

Package hijacking / typosquatting: This is a sneaky attack. An attacker publishes a package with a name very similar to a popular one. For example, if you meant to install requests, but you accidentally type requestz (with a 'z'), you might install the attacker's malicious package. This is called typosquatting.

Where Security Risks Appear - Test

Risk 1: No security test coverage

You run unit tests (test business logic) and integration tests (test components).

But you never run security tests – SAST, DAST, dependency scans.

Result: Vulnerabilities sail through because no one looks for them.

Example: A developer adds a new API endpoint that accidentally exposes user emails. Unit tests pass because they only check HTTP 200 status. A SAST tool would have flagged the missing access control.

Risk 2: Bypassing quality/security gates

In many CI systems (GitLab, GitHub Actions, Jenkins), you can add [skip ci] to a commit message.

The pipeline sees that and skips all tests.

An attacker who compromises a developer's account could push code with [skip ci] and the tests would never run.

Example: A malicious insider adds a backdoor and commits with git commit -m "Hotfix [skip ci]". The pipeline does not run SAST. The backdoor reaches production.

Security-relevant observations for the slide’s example:

No security testing steps are present, no SAST, no DAST, no dependency scanning, no container scanning, no secret detection.

The npm test --if-present step may execute nothing if the project lacks a test script definition.

The workflow makes no attempt to prevent or detect [skip ci] usage. Any commit containing [skip ci] in its message will skip this entire workflow execution.

Countermeasure: Make security scans mandatory (not skippable) and enforce that tests must pass before merge.

Where Security Risks Appear - Deploy

The Deploy stage is where configuration mistakes happen.

This is particularly common with containerized environments like Kubernetes and Docker.

Look at this security context.

privileged: true means the container has full, unrestricted access to the host machine.

runAsUser: 0 means it's running as root. If an attacker compromises this container, they own the entire host machine.

Also note the hostPath volume. This mounts a real folder from the host machine into the container

/etc = very sensitive directory
- Contains configs like:
  - users
  - passwords (hashed)
  - system settings

So the container can:

Read host configs

Modify them

Another deployment risk is exposed cloud storage.

This is a Terraform configuration for AWS S3.

You see headlines about this all the time: 'Company X exposed millions of customer records in an unsecured S3 bucket.' This is how it happens.

Where Security Risks Appear - Monitor

Finally, at the Monitor stage, the risk is that we're logging sensitive information.

logger.info("Login for user", {
 email: req.body.email,
 password: req.body.password, // sensitive!
 token: user.sessionToken // secret!
});

Why SAST is Important

This is powerful for two reasons:

Real-time feedback: Developers get feedback immediately, when the code is still fresh in their minds. They can fix the issue right away.

Prevents security from being an afterthought: When security testing is automated and runs on every commit, security becomes part of the normal development process, not a separate gate at the end.

Basic Techniques to Automate SAST

One of the simplest ways to start automating security checks is using regular expressions, or regex.

"This is a simple grep command. grep -ir 'eval(' searches recursively through all files for the string 'eval('.

The example finds the eval() call in the Python file. This is a quick way to find a dangerous pattern.

But this is a very basic approach. It's like using a flashlight to search a dark room. It works, but it's limited.

Can Regex Catch It All?

Let's see the limitations of regex. Here are different uses of eval() in Python.

[Walk through each example]

eval("ls") → This is a string. It's actually just a string, not a call to eval(). Regex would flag it as a false positive.

eval(var) → This is a real call. Regex can catch this.

eval(var) with spaces → This is still a real call, but the regex pattern eval( might not catch it if there's a space.

safe_eval(var) → This is a different function name. It's safe, but a regex that matches 'eval(' would incorrectly flag it.

A comment with eval(var) → This is in a comment, not actual code. Regex would flag it incorrectly.

A string print("eval(var)") → This is a string being printed, not code execution. Regex would flag it incorrectly.

So, if regex isn't enough, what's the solution? What do SAST tools use to understand code at a deeper level? The answer is the Abstract Syntax Tree, or AST. Let's explore that.

Abstract Syntax Tree (AST) - Introduction

An Abstract Syntax Tree is a data structure that represents the structure of a program. Think of it as a map of your code that shows how all the pieces relate to each other.

The advantages of working with ASTs are:

Contextual Understanding: The AST knows what each part of the code is. It knows if something is a function call, a variable assignment, or just a string. This eliminates the false positives that plague regex.

Understanding Control Flow and Data Flow: The AST can track how data moves through the program. Where does a variable come from? Where does it go? This is essential for detecting complex vulnerabilities.

Comprehensive Code Analysis: With an AST, you can analyze the entire codebase structurally, not just search for patterns.

Abstract Syntax Tree (AST) - Example

Let's look at a concrete example. This is a simple Python function that uses eval().

The AST for this code is a tree structure. At the top is the function definition node. Inside it, there are nodes for the assignment to data, the call to eval(), and the return statement.

With an AST, a SAST tool can see that eval() is being called with data['expressions'] as its argument.

AST - Rule Example

Here's a simple rule in a SAST tool like Semgrep. The rule is: find any call to eval().

The pattern: eval(...) tells the tool to look for function calls where the function name is eval. The tool uses the AST to identify this structure precisely.

This is still a simple rule, but because it's based on the AST, it won't match strings, comments, or different function names. It's more accurate than regex.

Slide 30: AST - Challenges

Building a SAST tool that uses AST analysis is not trivial. There are significant challenges:

Complexity of AST Structures: Each programming language has its own AST structure. Python's AST is different from JavaScript's, which is different from Java's.

Language Variability: New languages emerge. Existing languages evolve. The AST for Python 3.8 is different from Python 3.12.

Handling Language Evolution: When a language adds a new feature, the AST changes. Your parser needs to keep up.

Scalability Concerns: Analyzing a million-line codebase requires efficient algorithms. AST traversal can be expensive.

Patterns Matching: Writing rules that find vulnerabilities without generating too many false positives is an art. You need to balance detection and noise.

This is why most organizations don't build their own SAST tools. They use existing solutions.

Slide 31: SAST Tools

The key takeaway here is that building a SAST tool from scratch is complex and resource-intensive. It's not something most organizations should attempt.

Instead, we leverage existing solutions. There are many SAST tools available, both open-source and commercial. They differ in:

Language support: Some tools support 10+ languages, others are specialized.

False positive rates: This is a critical metric. A tool with high false positives will be ignored by developers.

Analysis techniques: Some do simple pattern matching. Others do sophisticated taint analysis across multiple files.

Simple Analysis vs Taint Analysis

Simple Analysis: Examines code based on syntax and structure. It identifies common coding flaws using predefined rules. A simple rule might be: 'if you see eval(), flag it.' This is fast but can produce false positives.

Taint Analysis: This is more sophisticated. It tracks the flow of untrusted or tainted data through a program.
The analysis asks: where does this data come from? Is it from a trusted source (like a configuration file) or an untrusted source (like user input or an environment variable)? And where does it go? If untrusted data reaches a dangerous function like eval(), that's a real vulnerability.

Taint analysis is what makes SAST tools truly powerful. It finds issues that simple pattern matching misses.

Simple Analysis vs Taint Analysis - Example

Code example:

import os

def get_hostname():
    expression_from_env = os.environ.get('EXPRESSION')
    return eval(expression_from_env)

Simple analysis:

Sees eval(expression_from_env) and flags it.

It does not care where expression_from_env came from.

Result: Finds the issue. Good enough for many cases.

Taint analysis:

Marks os.environ.get() as a source of untrusted data (environment variables can be set by attackers).

Marks eval() as a sink (dangerous function).

Traces data flow: os.environ.get() → expression_from_env → eval().

No sanitization function (like escape()) in between.

Result: Flags as critical because tainted data reaches a dangerous sink.

The Bandit output on the slide shows taint analysis in action:

Issue: [B307:blacklist] Use of possibly insecure function - eval
Severity: Medium
Confidence: High
CWE: CWE-78 (OS Command Injection)
Location: test.py:5

Bandit (a Python SAST tool) uses taint analysis. It knows that os.environ.get() is a source.

Why taint analysis is more powerful:

Simple analysis might miss eval(os.environ.get('EXPRESSION')) if the variable is renamed or passed through functions.

Taint analysis follows the data across assignments, function calls, and even across files (in advanced tools).

Why taint analysis is harder:

You must define sources, sinks, and sanitizers for every language.

You must handle aliasing (x = y; y = user_input; eval(x)).

You must handle control flow (if user_input: eval(user_input) else: safe).

Many SAST tools (including open-source Semgrep) have limited taint analysis in their free version.

Slide 34: Taint Analysis

(Silent slide - allows students to absorb the previous one.)

SAST Tools Comparison

License Type: Open Source vs Commercial: Open-source tools like Semgrep are free and transparent. Commercial tools like Checkmarx or Fortify offer support and enterprise features. The choice depends on your budget and needs.

Language Coverage: Evaluate based on your application stack. If you use Python, Go, and JavaScript, you need a tool that supports all three. If you use a niche language, you need specific support.

Flexibility: Can the tool allow custom rules? In many cases, you'll want to write rules for organization-specific patterns or banned APIs. Semgrep excels here.

Ease of Use: How easy is it to run the tool? Does it integrate with your IDE? Can developers run it locally? If it's hard to use, developers won't use it.

Integration: Does the tool work with your CI/CD systems? GitHub Actions, GitLab CI, Jenkins? This is critical for automation.

Analysis Method: What kind of analysis does it do? Simple pattern matching? Taint analysis across a single file? Taint analysis across multiple files? Cross-file analysis is the most powerful but also the most resource-intensive."

What is Semgrep?

Semgrep is a fast, open-source static analysis tool.

Originally called sgrep (sgrep = ‘Semantic grep’), written at Facebook in 2009 to enforce nearly 1000 rules.

Supports 30+ languages – Python, JavaScript, Java, Go, Ruby, Rust, etc.

Can run in your IDE (VS Code extension), as a pre-commit hook, or in CI/CD workflows.

Easy to install: pip install semgrep or brew install semgrep.

Actively maintained by ReturnToCorp (r2c) and the community.

Semgrep’s killer feature: custom rules are very easy to write using a YAML syntax that looks like the code you’re searching for.

Semgrep Registry & Rules

The Semgrep Registry is a public collection of ready-made security rules for the static analysis tool Semgrep.

It’s basically a library of detection rules that help you find:

Security vulnerabilities (like XSS, SQL injection)

Bad coding practices

Misconfigurations

Instead of writing rules from scratch, you can:

Browse the registry

Pick rules relevant to your language (Python, JavaScript, etc.)

Run them directly on your code

And of course, you can also write your own custom rules for organization-specific needs.

Example (exploring the website)

1. Go to the Semgrep Registry (website)

Search for:

“Python eval”

or browse Python → Security rules

You’ll find rules like:

python.lang.security.audit.eval

2. Open the eval rule

When you click it, you’ll see:

Rule details

What vulnerability it detects (unsafe eval)

Why it’s dangerous

Examples of bad code

Pattern (important part)

You’ll see something like:

pattern: eval(...)

Sometimes more advanced patterns too (taint tracking, etc.)

3. Try it directly in the browser

The website usually has a “Try this rule” or playground editor.

You can paste code like:

user_input=input()
result=eval(user_input)

It will highlight the vulnerable line immediately.

4. Copy the CLI command (Run Locally)

Each rule page gives you a ready command:

semgrep --config=p/python.lang.security.audit.eval

So you don’t need to memorize anything.

Example (Using from the registry in your code)

1. Vulnerable Python example

user_input=input("Enter something: ")
result=eval(user_input)
print(result)

If a user enters:

__import__('os').system('rm -rf /')

You just gave them code execution

2. Run Semgrep using a registry rule

With Semgrep, you don’t need to write anything.

Run this command:

semgrep --config r/python.lang.security.audit.eval-detected .

p/... → pulls rule from registry

eval → targets unsafe eval usage

. → scans your current project

3. Example output

You’ll see something like:

eval-use:
  Dangerous use of eval detected
  --> app.py:2
   result = eval(user_input)

4. What rule is actually doing

Behind the scenes, the rule looks for patterns like:

eval(...)

But smarter than simple grep—it understands structure (AST), not just text.

5. Pro tip (real-world usage)

You can scan for multiple Python issues at once:

semgrep --config=p/python .

This includes:

eval misuse

subprocess issues

hardcoded secrets

etc.

semgrep scan --config auto
Runs Semgrep with automatic configuration selection
Semgrep decides which rules to run based on your codebase (language, frameworks, etc.)
Good for general-purpose scanning without specific rule knowledge
semgrep --config=p/python
Runs Semgrep using the "p/python" rule pack (community rules for Python)
Targets Python-specific security and correctness issues
More focused and deterministic than auto

GitHub Workflow

Running Semgrep locally is great, but in DevSecOps, we want automation. We want security tests to run on every code change. That's where CI/CD systems like GitHub Actions come in.

GitHub Actions is a continuous integration and continuous delivery platform built into GitHub. It allows you to automate your build, test, and deployment pipeline.

In GitHub Actions, workflows are defined by YAML files stored in your repository.

Specifically, they go in the .github/workflows directory. A repository can have multiple workflows for different purposes.

Each workflow has basic components:

Events: What triggers the workflow? A push? A pull request? A schedule?

Jobs: A workflow consists of one or more jobs. Jobs run in parallel by default.

Steps: Each job consists of steps. Steps can be either a script (like running a shell command) or an action (a reusable unit of code from the marketplace).

GitHub Workflow - Events

Events are the triggers that cause a workflow to run. Here are some common events:

Push: When code is pushed to a branch.

Pull_Request: When a pull request is opened, reopened, edited, or closed.

Issues: When an issue is opened, deleted, transferred.

Fork: When someone forks the repository.

You can also trigger workflows on a schedule (like daily scans) or manually. The complete list of events is available in the GitHub documentation.

GitHub Workflow – Runner

A runner is a server that executes your workflow jobs.

GitHub provides hosted runners:

Ubuntu Linux (most common)

Windows

macOS

You can also host your own self-hosted runners if you need special hardware or security isolation.

Each runner can run one job at a time. For a small team, GitHub’s free runners are sufficient.

GitHub Workflow - Actions

An action is a custom application for the GitHub Actions platform that performs a complex but frequently repeated task.

The marketplace has thousands of actions. You can find actions for security scanning, code quality, deployment, and much more.

Full Example: GitHub Workflow (.github/workflows/semgrep.yml)

Inside your project folder, create this path:

your-project/
├── .github/
│   └── workflows/
│       └── semgrep.yml   ← HERE
├── app.py
├── requirements.txt

The key rule:

Folder must be named .github/workflows/

File must be a .yml or .yaml

name: Semgrep Security Scan

# -----------------------------
# EVENTS (Triggers)
# -----------------------------
on:
  push:
    branches: ["main","dev" ]
  pull_request:
    branches: ["main"]
  schedule:
    - cron:"0 2 * * *"# Runs daily at 2 AM UTC
  workflow_dispatch:# Manual trigger

# -----------------------------
# JOBS
# -----------------------------
jobs:
  semgrep-scan:

# -----------------------------
# RUNNER
# -----------------------------
    runs-on: ubuntu-latest

# -----------------------------
# STEPS
# -----------------------------
    steps:

# Step 1: Checkout repository code
      - name: Checkout code
        uses: actions/checkout@v4

# Step 2: Set up Python (required for Semgrep)
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version:"3.11"

# Step 3: Install Semgrep
      - name: Install Semgrep
        run: pip install semgrep

# Step 4: Run Semgrep scan
      - name: Run Semgrep scan
        run: semgrep scan --config=auto

# Step 5: Upload results (optional)
      - name: Upload scan results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: semgrep-results
          path: semgrep-report.json

Breakdown (mapped to concepts)

1. Events (Triggers)

on:
  push:
  pull_request:
  schedule:
  workflow_dispatch:

This means the workflow runs when:

You push code

You open/update a PR

Every day automatically (cron job)

You manually click "Run workflow"

2. Runner

runs-on: ubuntu-latest

This is the machine that executes your job

GitHub spins up a fresh Linux VM for you

Other options:

windows-latest

macos-latest

3. Jobs

jobs:
  semgrep-scan:

This defines a job called semgrep-scan

You can add multiple jobs (e.g., build, test, deploy)

Jobs run in parallel by default

4. Steps

Each job is made of steps:

Action example

uses: actions/checkout@v4

This is a prebuilt action

It pulls your repo code into the runner

Script example

run: pip install semgrep

This runs a shell command directly

5. Actions (Marketplace)

Examples used:

actions/checkout → clone repo

actions/setup-python → install Python

actions/upload-artifact → save results

Instead of writing everything manually, you reuse these.

As soon as the file is pushed:

GitHub automatically detects it

Then:

It appears in the Actions tab

It runs based on your triggers (push, PR, etc.)

Real-world mental model

Your repo becomes like this:

Code → Push → GitHub Actions → Security Scan → Result

Demo to try

Step 1: Make sure your structure is correct

Inside your project folder, you should have:

semgrep-demo/
├── demo.py
├── eval_rule.yaml
└── .github/
    └── workflows/
        └── semgrep.yml

If this structure is wrong, the workflow will not run.

Step 2: Initialize Git (if you didn’t already)

In VS Code terminal:

git init
git add .
git commit -m "Initial commit with Semgrep demo"

Step 3: Create a github account and create a repo

Set a name for you repo (e.g. semgrep-demo)

Step 4: Connect to your GitHub repo

If you already created a repo on GitHub, link it:

git remote add origin https://github.com/<your-username>/semgrep-demo.git
git branch -M main
git push -u origin main

Step 4: Trigger the workflow

As soon as you push:

The workflow runs automatically

Go to:

Your repo on GitHub

Click Actions tab

You should see:

Semgrep Security Scan

Step 5: Check results

Click the workflow run → open the job → look at logs

You should see something like:

Avoid using eval() - it's unsafe!

Step 6:

If you wanna make any changes to your code locally then upload it to GitHub:

git add .
git commit -m "Your message"
git push origin main

git add stages your changes, preparing them for a snapshot.

git commit takes the snapshot of the staged changes and saves it to your local project history.

git push uploads your local commits to a remote repository (like GitHub or GitLab) to share them with others.

Then it will automate the process and scan your code automatically.

DevSecOps

Development and operations

What is DevOps?

Expanded automation

What is DevSecOps?

CI/CD Pipeline

Workflow example

Test Types

Where Security Risks Appear - Code

Where Security Risks Appear - Build

Where Security Risks Appear - Test

Where Security Risks Appear - Deploy

Where Security Risks Appear - Monitor

Why SAST is Important

Basic Techniques to Automate SAST

Can Regex Catch It All?

Abstract Syntax Tree (AST) - Introduction

Abstract Syntax Tree (AST) - Example

AST - Rule Example

Slide 30: AST - Challenges

Slide 31: SAST Tools

Simple Analysis vs Taint Analysis

Simple Analysis vs Taint Analysis - Example

Slide 34: Taint Analysis

SAST Tools Comparison

What is Semgrep?

Semgrep Registry & Rules

GitHub Workflow

GitHub Workflow - Events

GitHub Workflow – Runner

GitHub Workflow - Actions

Full Example: GitHub Workflow (.github/workflows/semgrep.yml)

Demo to try

demo.py

eval_rule.yaml

semgrep.yml

DevSecOps

Development and operations

What is DevOps?

Expanded automation

What is DevSecOps?

CI/CD Pipeline

Workflow example

Test Types

Where Security Risks Appear - Code

Where Security Risks Appear - Build

Where Security Risks Appear - Test

Where Security Risks Appear - Deploy

Where Security Risks Appear - Monitor

Why SAST is Important

Basic Techniques to Automate SAST

Can Regex Catch It All?

Abstract Syntax Tree (AST) - Introduction

Abstract Syntax Tree (AST) - Example

AST - Rule Example

Slide 30: AST - Challenges

Slide 31: SAST Tools

Simple Analysis vs Taint Analysis

Simple Analysis vs Taint Analysis - Example

Slide 34: Taint Analysis

SAST Tools Comparison

What is Semgrep?

Semgrep Registry & Rules

GitHub Workflow

GitHub Workflow - Events

GitHub Workflow – Runner

GitHub Workflow - Actions

Full Example: GitHub Workflow (.github/workflows/semgrep.yml)

Demo to try

demo.py

eval_rule.yaml

semgrep.yml