Skip to content

Commit

Permalink
chore(iast): add local debug scripts to find leaks (#9318)
Browse files Browse the repository at this point in the history
Add to the repository the scripts that @juanjux and I use to debug C++
leaks. This folder (scripts/iast/) contains some scripts to check the
memory usage of native code.

### 1. Build the docker image

```sh
docker build . -f docker/Dockerfile_py311_debug_mode -t python_311_debug
```

### 2. Run the docker container

#### 2.1. Run the container with the script to find references (this
script will run the memory usage check)

```sh
docker run --rm -it -v ${PWD}:/ddtrace python_311_debug /bin/bash -c "cd /ddtrace && scripts/iast/run_references.sh"
>> References: 1003
>> References: 2
>> References: 2
>> References: 2
>> References: 2
>> References: 2
```

#### 2.2. Run the container with the script with memray usage check

```sh
docker run --rm -it -v ${PWD}:/ddtrace python_311_debug /bin/bash -c "cd /ddtrace && scripts/iast/run_memray.sh"
google-chrome file://$PWD/memray-flamegraph-lel.html
```

#### 2.3. Run the container with the script with Max RSS

```sh
docker run --rm -it -v ${PWD}:/ddtrace python_311_debug /bin/bash -c "cd /ddtrace && scripts/iast/run_memory.sh"
>> Round 0 Max RSS: 41.9453125
>> 42.2109375
```

## Checklist

- [x] Change(s) are motivated and described in the PR description
- [x] Testing strategy is described if automated tests are not included
in the PR
- [x] Risks are described (performance impact, potential for breakage,
maintainability)
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed or label `changelog/no-changelog` is set
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/))
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))
- [x] If this PR changes the public interface, I've notified
`@DataDog/apm-tees`.
- [x] If change touches code that signs or publishes builds or packages,
or handles credentials of any kind, I've requested a review from
`@DataDog/security-design-and-guidance`.

## Reviewer Checklist

- [x] Title is accurate
- [x] All changes are related to the pull request's stated goal
- [x] Description motivates each change
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- [x] Testing strategy adequately addresses listed risks
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] Release note makes sense to a user of the library
- [x] Author has acknowledged and discussed the performance implications
of this PR as reported in the benchmarks PR comment
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

---------

Co-authored-by: erikayasuda <[email protected]>
  • Loading branch information
avara1986 and erikayasuda authored May 23, 2024
1 parent f471e6b commit 9e3bd1f
Show file tree
Hide file tree
Showing 12 changed files with 24,653 additions and 0 deletions.
91 changes: 91 additions & 0 deletions docker/Dockerfile_py311_debug_mode
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# DEV: Use `debian:slim` instead of an `alpine` image to support installing wheels from PyPI
# this drastically improves test execution time since python dependencies don't all
# have to be built from source all the time (grpcio takes forever to install)
FROM debian:buster-20221219-slim

# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8

# https://support.circleci.com/hc/en-us/articles/360045268074-Build-Fails-with-Too-long-with-no-output-exceeded-10m0s-context-deadline-exceeded-
ENV PYTHONUNBUFFERED=1
# Configure PATH environment for pyenv
ENV PYTHON_SOURCE=/root/python_source
ENV PYTHON_DEBUG=/root/env/python_debug
ENV PATH=$PATH:${PYTHON_DEBUG}/bin
ENV PYTHON_CONFIGURE_OPTS=--enable-shared

RUN \
# Install system dependencies
apt-get update \
&& apt-get install -y --no-install-recommends \
apt-transport-https \
build-essential \
ca-certificates \
clang-format \
curl \
git \
gnupg \
jq \
libbz2-dev \
libenchant-dev \
libffi-dev \
liblzma-dev \
libmemcached-dev \
libncurses5-dev \
libncursesw5-dev \
libpq-dev \
libreadline-dev \
libsasl2-dev \
libsqlite3-dev \
libsqliteodbc \
libssh-dev \
libssl-dev \
patch \
python-openssl\
unixodbc-dev \
wget \
zlib1g-dev \
valgrind \
# Cleaning up apt cache space
&& rm -rf /var/lib/apt/lists/*

# Install pyenv and necessary Python versions
# `--with-pydebug`: [Add options](https://docs.python.org/3/using/configure.html#python-debug-build) like count references, sanity checks...
# `--with-valgrind`: Enable Valgrind support (default is no).
# `--without-pymalloc`: Python has a pymalloc allocator optimized for small objects (smaller or equal to 512 bytes) with a short lifetime. We remove this functionality to not hide errors
RUN git clone --depth 1 --branch v3.11.6 https://github.com/python/cpython/ "${PYTHON_SOURCE}" \
&& cd ${PYTHON_SOURCE} \
&& ./configure --with-pydebug --without-pymalloc --with-valgrind --prefix ${PYTHON_DEBUG} \
&& make OPT=-g \
&& make install \
&& cd -

RUN python3.11 -m pip install -U pip \
&& python3.11 -m pip install six cattrs setuptools cython wheel cmake pytest pytest-cov hypothesis pytest-memray\
memray==1.12.0 \
requests==2.31.0 \
attrs>=20 \
bytecode>=0.14.0 \
cattrs \
ddsketch>=3.0.0 \
envier~=0.5 \
opentelemetry-api>=1 \
protobuf>=3 \
six>=1.12.0 \
typing_extensions \
xmltodict>=0.12


CMD ["/bin/bash"]
#docker build . -f docker/Dockerfile_py311_debug_mode -t python_311_debug
#docker run --rm -it -v ${PWD}:/ddtrace python_311_debug
#
# Now, you can check IAST leaks:
#cd /ddtrace
#export PATH=$PATH:$PWD
#export PYTHONPATH=$PYTHONPATH:$PWD
#export PYTHONMALLOC=malloc
#python3.11 ddtrace/appsec/_iast/leak.py
#python3.11 -m memray run --trace-python-allocators --native -o lel.bin -f prueba.py
#python3.11 -m memray flamegraph lel.bin --leaks -f
91 changes: 91 additions & 0 deletions docker/Dockerfile_py312_debug_mode
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# DEV: Use `debian:slim` instead of an `alpine` image to support installing wheels from PyPI
# this drastically improves test execution time since python dependencies don't all
# have to be built from source all the time (grpcio takes forever to install)
FROM debian:buster-20221219-slim

# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8

# https://support.circleci.com/hc/en-us/articles/360045268074-Build-Fails-with-Too-long-with-no-output-exceeded-10m0s-context-deadline-exceeded-
ENV PYTHONUNBUFFERED=1
# Configure PATH environment for pyenv
ENV PYTHON_SOURCE=/root/python_source
ENV PYTHON_DEBUG=/root/env/python_debug
ENV PATH=$PATH:${PYTHON_DEBUG}/bin
ENV PYTHON_CONFIGURE_OPTS=--enable-shared

RUN \
# Install system dependencies
apt-get update \
&& apt-get install -y --no-install-recommends \
apt-transport-https \
build-essential \
ca-certificates \
clang-format \
curl \
git \
gnupg \
jq \
libbz2-dev \
libenchant-dev \
libffi-dev \
liblzma-dev \
libmemcached-dev \
libncurses5-dev \
libncursesw5-dev \
libpq-dev \
libreadline-dev \
libsasl2-dev \
libsqlite3-dev \
libsqliteodbc \
libssh-dev \
libssl-dev \
patch \
python-openssl\
unixodbc-dev \
wget \
zlib1g-dev \
valgrind \
# Cleaning up apt cache space
&& rm -rf /var/lib/apt/lists/*

# Install pyenv and necessary Python versions
# `--with-pydebug`: [Add options](https://docs.python.org/3/using/configure.html#python-debug-build) like count references, sanity checks...
# `--with-valgrind`: Enable Valgrind support (default is no).
# `--without-pymalloc`: Python has a pymalloc allocator optimized for small objects (smaller or equal to 512 bytes) with a short lifetime. We remove this functionality to not hide errors
RUN git clone --depth 1 --branch v3.12.3 https://github.com/python/cpython/ "${PYTHON_SOURCE}" \
&& cd ${PYTHON_SOURCE} \
&& ./configure --with-pydebug --without-pymalloc --with-valgrind --prefix ${PYTHON_DEBUG} \
&& make OPT=-g \
&& make install \
&& cd -

RUN python3.12 -m pip install -U pip \
&& python3.12 -m pip install six cattrs setuptools cython wheel cmake pytest pytest-cov hypothesis pytest-memray\
memray==1.12.0 \
requests==2.31.0 \
attrs>=20 \
bytecode>=0.14.0 \
cattrs \
ddsketch>=3.0.0 \
envier~=0.5 \
opentelemetry-api>=1 \
protobuf>=3 \
six>=1.12.0 \
typing_extensions \
xmltodict>=0.12


CMD ["/bin/bash"]
#docker build . -f docker/Dockerfile_py311_debug_mode -t python_311_debug
#docker run --rm -it -v ${PWD}:/ddtrace python_311_debug
#
# Now, you can check IAST leaks:
#cd /ddtrace
#export PATH=$PATH:$PWD
#export PYTHONPATH=$PYTHONPATH:$PWD
#export PYTHONMALLOC=malloc
#python3.12 ddtrace/appsec/_iast/leak.py
#python3.12 -m memray run --trace-python-allocators --native -o lel.bin -f prueba.py
#python3.12 -m memray flamegraph lel.bin --leaks -f
12 changes: 12 additions & 0 deletions scripts/iast/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
export PATH=$PATH:$PWD
export PYTHONPATH=$PYTHONPATH:$PWD
export PYTHON_VERSION=python3.11
export PYTHONMALLOC=malloc
export DD_COMPILE_DEBUG=true
export DD_TRACE_ENABLED=true
export DD_IAST_ENABLED=true
export _DD_IAST_DEBUG=true
export DD_IAST_REQUEST_SAMPLING=100
export _DD_APPSEC_DEDUPLICATION_ENABLED=false
export DD_INSTRUMENTATION_TELEMETRY_ENABLED=true
export DD_REMOTE_CONFIGURATION_ENABLED=false
77 changes: 77 additions & 0 deletions scripts/iast/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
This folder (scripts/iast/) contains some scripts to check memory usage of native code.

## How to use

### 1. Build the docker image

```sh
docker build . -f docker/Dockerfile_py311_debug_mode -t python_311_debug
```

### 2. Run the docker container

#### 2.1. Run the container with the script to find references (this script will run the memory usage check)

```sh
docker run --rm -it -v ${PWD}:/ddtrace python_311_debug /bin/bash -c "cd /ddtrace && source scripts/iast/.env && \
sh scripts/iast/run_references.sh"
>> References: 1003
>> References: 2
>> References: 2
>> References: 2
>> References: 2
>> References: 2
```

#### 2.2. Run the container with the script with memray usage check

```sh
docker run --rm -it -v ${PWD}:/ddtrace python_311_debug /bin/bash -c "cd /ddtrace && source scripts/iast/.env && \
sh scripts/iast/run_memray.sh"
google-chrome file://$PWD/memray-flamegraph-lel.html
```

#### 2.3. Run the container with the script with Max RSS

```sh
docker run --rm -it -v ${PWD}:/ddtrace python_311_debug /bin/bash -c "cd /ddtrace && source scripts/iast/.env && \
sh scripts/iast/run_memory.sh"
>> Round 0 Max RSS: 41.9453125
>> 42.2109375
```

#### 2.4. Run the container with valgrind

- `--tool`: default: memcheck, other options: cachegrind, callgrind, helgrind, drd, massif, dhat, lackey, none, exp-bbv
- memcheck:
- `--leak-check`: options summary/full/yes
- massif: heap profiler, see below
- `--track-origins`: increases the size of the basic block translations
- `--suppressions`: path to our suppression file: `scripts/iast/valgrind-python.supp`
- `--log-file`: Valgrind report a lot information, we store this info in a file to analyze carefully the reports

docker run --rm -it -v ${PWD}:/ddtrace python_311_debug /bin/bash -c "cd /ddtrace && source scripts/iast/.env && \
valgrind --tool=memcheck --leak-check=full --log-file=scripts/iast/valgrind_bench_overload.out --track-origins=yes \
--suppressions=scripts/iast/valgrind-python.supp --show-leak-kinds=all \
python3.11 scripts/iast/test_leak_functions.py 100"

##### Understanding results of memcheck

Valgrind Memcheck returns all traces of C and C++ files. Most of them are Python core traces. These traces could be
memory leaks in our Python code, but we can't interpret them at the moment. Therefore, all of them are in the
suppression file.


The valid traces of our C files, are like that:
```
==324555== 336 bytes in 1 blocks are possibly lost in loss record 4,806 of 5,852
==324555== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==324555== by 0x40149CA: allocate_dtv (dl-tls.c:286)
==324555== by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==324555== by 0x486E322: allocate_stack (allocatestack.c:622)
==324555== by 0x486E322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==324555== by 0xFBF078E: ??? (in /root/ddtrace/native-core.so)
==324555== by 0x19D312C7: ???
==324555== by 0x1FFEFEFAFF: ???
==324555==
```
76 changes: 76 additions & 0 deletions scripts/iast/mod_leak_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
import os
import random
import subprocess

import requests

from ddtrace.appsec._iast._utils import _is_iast_enabled


if _is_iast_enabled():
from ddtrace.appsec._iast._taint_tracking import OriginType
from ddtrace.appsec._iast._taint_tracking import taint_pyobject


def test_doit():
origin_string1 = "hiroot"

if _is_iast_enabled():
tainted_string_2 = taint_pyobject(
pyobject="1234", source_name="abcdefghijk", source_value="1234", source_origin=OriginType.PARAMETER
)
else:
tainted_string_2 = "1234"

string1 = str(origin_string1) # String with 1 propagation range
string2 = str(tainted_string_2) # String with 1 propagation range

string3 = string1 + string2 # 2 propagation ranges: hiroot1234
string4 = "-".join([string3, string3, string3]) # 6 propagation ranges: hiroot1234-hiroot1234-hiroot1234
string5 = string4[0:20] # 1 propagation range: hiroot1234-hiroot123
string6 = string5.title() # 1 propagation range: Hiroot1234-Hiroot123
string7 = string6.upper() # 1 propagation range: HIROOT1234-HIROOT123
string8 = "%s_notainted" % string7 # 1 propagation range: HIROOT1234-HIROOT123_notainted
string9 = "notainted_{}".format(string8) # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted
string10 = "nottainted\n" + string9 # 2 propagation ranges: notainted\nnotainted_HIROOT1234-HIROOT123_notainted
string11 = string10.splitlines()[1] # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted
string12 = string11 + "_notainted" # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted_notainted
string13 = string12.rsplit("_", 1)[0] # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted

try:
# Path traversal vulnerability
m = open("/" + string13 + ".txt")
_ = m.read()
except Exception:
pass

try:
# Command Injection vulnerability
_ = subprocess.Popen("ls " + string9)
except Exception:
pass

try:
# SSRF vulnerability
requests.get("http://" + "foobar")
# urllib3.request("GET", "http://" + "foobar")
except Exception:
pass

# Weak Randomness vulnerability
_ = random.randint(1, 10)

# os path propagation
string14 = os.path.join(string13, "a") # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted/a
string15 = os.path.split(string14)[0] # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted
string16 = os.path.dirname(
string15 + "/" + "foobar"
) # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted
string17 = os.path.basename("/foobar/" + string16) # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted
string18 = os.path.splitext(string17 + ".jpg")[0] # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted
string19 = os.path.normcase(string18) # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted
string20 = os.path.splitdrive(string19)[1] # 1 propagation range: notainted_HIROOT1234-HIROOT123_notainted

expected = "notainted_HIROOT1234-HIROOT123_notainted" # noqa: F841
# assert string20 == expected
return string20
1 change: 1 addition & 0 deletions scripts/iast/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
memray==1.12.0
3 changes: 3 additions & 0 deletions scripts/iast/run_memory.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
PYTHON="${PYTHON_VERSION:-python3.11}"
$PYTHON -m pip install -r scripts/iast/requirements.txt
$PYTHON scripts/iast/test_leak_functions.py 1000000
4 changes: 4 additions & 0 deletions scripts/iast/run_memray.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
PYTHON="${PYTHON_VERSION:-python3.11}"
$PYTHON -m pip install -r scripts/iast/requirements.txt
$PYTHON -m memray run --trace-python-allocators --aggregate --native -o lel.bin -f scripts/iast/test_leak_functions.py 100
$PYTHON -m memray flamegraph lel.bin --leaks -f
4 changes: 4 additions & 0 deletions scripts/iast/run_references.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
PYTHON="${PYTHON_VERSION:-python3.11}"
# $PYTHON setup.py build_ext --inplace
${PYTHON} -m pip install -r scripts/iast/requirements.txt
${PYTHON} -m ddtrace.commands.ddtrace_run ${PYTHON} scripts/iast/test_references.py
Loading

0 comments on commit 9e3bd1f

Please sign in to comment.