Smart Contract Security and Testing

SEC / 01

Smart contract security

Threat modeling before code

Every engagement starts with a threat model. Before a single function is written, the team maps every asset the contract controls, every actor who can interact with it, and every path through which value can move. The goal is a document that enumerates trust boundaries, privileged roles, external dependencies, and the invariants that must hold under all conditions. STRIDE categorization and attack tree construction surface risks that would otherwise hide until a mainnet incident forces them into the open.

The threat model is a living artifact. It updates as the codebase evolves and as new integrations introduce new attack surfaces. When a new external call is added, the model expands. When an admin function gains a new parameter, the trust boundary is reexamined. This is the single most effective practice in contract security and the one most teams skip.

Common vulnerability classes

Smart contract vulnerabilities are well documented, but they continue to cause losses in the billions because teams underestimate the subtlety of each class. A security review must account for every known category and understand how they interact in combination.

Reentrancy

The classic. An external call hands control to an untrusted contract before state updates are finalized. The attacker reenters the original function and drains funds because the balance has not yet been decremented. The checks effects interactions pattern is the baseline defense. Reentrancy guards add a second layer. But cross function reentrancy and cross contract reentrancy through callback patterns require deeper analysis that simple modifiers cannot catch.

Access control

Missing or misconfigured access control on privileged functions. An unprotected initialize function, an onlyOwner modifier that references a mutable owner variable without a timelock, or a role based system where the admin can grant themselves any role without governance approval. Access control failures are the most frequent critical finding in audit reports.

Oracle manipulation

Any contract that reads a price feed or external data source is vulnerable to oracle manipulation. Spot price oracles on AMMs can be moved in a single transaction by a well funded attacker. TWAP oracles reduce this risk but introduce latency. Chainlink feeds are more resistant but have their own failure modes, including stale data and deviations below reporting thresholds. The contract must handle every oracle failure gracefully, including returning zero, reverting, and returning stale values.

Front running and MEV

Every pending transaction in the mempool is visible to searchers. Sandwich attacks wrap a victim's swap with a buy before and a sell after. Generalized front running copies profitable transactions and submits them with higher gas. Contracts that rely on transaction ordering for fairness, such as NFT mints, auction bids, and liquidations, need commit reveal schemes, Flashbots Protect, or private mempools to mitigate MEV extraction.

Flash loan attacks

Flash loans give any attacker access to unbounded capital within a single transaction. This amplifies every other vulnerability class. A governance attack that would require millions in tokens becomes free. A price manipulation that requires deep liquidity to move a pool becomes trivial. Every contract that reads balances, prices, or voting power within a single transaction must assume the attacker has unlimited capital for the duration of that transaction.

Integer overflow and precision loss

Solidity 0.8+ includes built in overflow checks, but precision loss from integer division remains a pervasive source of rounding errors, especially in token distribution, fee calculation, and interest accrual logic. Consistently rounding in favor of the protocol rather than the user prevents slow drains. Explicit ordering of multiplication before division preserves precision.

Storage collision and proxy pitfalls

Upgradeable contracts using proxy patterns (UUPS, Transparent, Beacon) introduce storage layout risks. Adding a new state variable between existing ones in an upgraded implementation overwrites live data. Uninitialized implementation contracts can be claimed by attackers who call initialize directly on the logic contract. Storage gaps, OpenZeppelin's Initializable library, and rigorous upgrade testing with layout comparison tools are essential for any proxy based architecture.

Checks effects interactions Reentrancy guards TWAP oracles Commit reveal Flashbots Protect Storage gaps OpenZeppelin Initializable

How the security review works

A security review is not a line by line reading of the code. It is a structured adversarial exercise that combines manual analysis with automated tooling. The process follows a repeatable sequence.

Scope definition. Which contracts, which functions, which external integrations are in scope. What is the expected behavior. What are the known risks the team has already accepted.
Architecture review. How do the contracts interact. Where are the trust boundaries. Which external protocols does the system depend on. What happens when one of those protocols fails or is exploited.
Line by line manual review. Every state changing function is traced through its full execution path, including external calls, delegate calls, and callback patterns. Reviewers look for logic errors, edge cases, missing validation, and incorrect assumptions about external contract behavior.
Automated analysis. Static analyzers and fuzzers run in parallel with the manual review. Their findings feed back into the manual process, surfacing paths the reviewer might not have explored.
Findings report. Each issue is classified by severity. Critical means immediate fund loss. High means fund loss under specific conditions. Medium means protocol malfunction. Low and informational findings describe code quality, gas inefficiency, or best practice deviations. Every finding includes a proof of concept or a clear description of the attack path.
Remediation review. After the team fixes identified issues, a second pass confirms that each fix is correct and that no new issues were introduced by the changes.

SEC / 02

Testing methodology

Unit testing and integration testing

Unit tests verify individual functions in isolation. Every public and external function gets at least one test for the happy path, one for each revert condition, and one for each boundary value. Foundry's forge test is the primary runner for Solidity contracts because it executes tests natively in the EVM without JavaScript bridging overhead. Hardhat with Chai assertions remains in use for projects with existing TypeScript test suites. Anchor test suites handle Solana program testing.

Integration tests exercise the system as a whole. A deposit followed by a swap followed by a withdrawal, tested end to end against a local fork, confirms that the contracts interact correctly when composed. Integration tests catch errors that unit tests structurally cannot, including incorrect assumptions about return values from external protocols, unexpected state changes from callbacks, and gas limit issues in complex call chains. These tests run against every pull request in CI.

Foundry forge test Hardhat Anchor Jest Vitest

Property based testing and fuzzing

Unit tests verify what the developer thought to check. Fuzzers find what the developer did not think of. Property based testing defines invariants, statements that must always be true, and then generates thousands or millions of random inputs to try to break them. "The total supply must never exceed the cap." "A user's balance after withdrawal must equal their balance before withdrawal minus the withdrawal amount." "No sequence of deposits and withdrawals can result in the contract holding fewer tokens than the sum of all user balances."

Echidna is a grammar based fuzzer specifically designed for Solidity. It generates sequences of function calls with random parameters and checks whether any sequence violates the defined assertions. Foundry fuzz runs property tests as part of the standard test suite, making it easy to add fuzz targets alongside unit tests. Medusa provides coverage guided fuzzing that prioritizes inputs reaching unexplored code paths. For deeper exploration, stateful fuzzing (also called invariant testing in Foundry) generates entire sequences of transactions across multiple actors, simulating realistic usage patterns rather than isolated function calls.

Fuzzing campaigns run for hours or days, not seconds. A short fuzz run provides false confidence. The campaign duration and corpus quality determine whether the fuzzer can reach deep state transitions that only emerge after specific sequences of calls.

Echidna Foundry fuzz Medusa Foundry invariant

Fork testing against mainnet state

Testing against a local devnet tells you whether the logic is correct in isolation. Fork testing tells you whether the logic is correct against real protocol state. Foundry's fork mode pins a test to a specific block number on mainnet, Arbitrum, Optimism, Base, or any other EVM chain. The test executes against real Uniswap pools, real Aave markets, real Chainlink oracles, and real token balances.

This matters because external protocols behave differently in production than in mocks. A Uniswap V3 pool with concentrated liquidity produces slippage curves that flat mocks cannot replicate. An Aave market at 95% utilization rate returns different interest rate values than a freshly deployed mock. Chainlink price feeds have heartbeat intervals and deviation thresholds that affect how stale data propagates into downstream calculations. Fork tests catch these integration errors before deployment.

Every contract that interacts with an external protocol is fork tested. The fork block is pinned to ensure reproducibility. When mainnet state changes in a way that affects the tests, the pin is updated and the new behavior is documented.

Foundry fork mode Anvil Tenderly forks

Coverage targets and what they actually mean

Line coverage measures how much of the code the test suite executes. Branch coverage measures how many conditional paths are explored. 100% line coverage does not mean the code is correct. It means every line ran at least once. A test that calls a function with one input and asserts that it does not revert achieves line coverage without verifying behavior.

The meaningful metric is branch coverage combined with property coverage. Every conditional branch, every require statement, every ternary expression must have tests that exercise both outcomes. Property based tests add a second dimension by verifying that invariants hold across the entire input space, not just the handful of values a developer chose.

For contracts managing user funds, the target is 100% branch coverage on all state changing functions plus invariant tests covering every documented property. For peripheral contracts like view helpers and deployment scripts, pragmatic coverage is acceptable. The coverage report is reviewed as part of every pull request, and regressions block merging.

SEC / 03

Static analysis and formal verification

Static analysis in the CI pipeline

Static analyzers read the code without executing it and flag patterns that match known vulnerability classes. They run on every commit and every pull request. A developer does not need to remember every reentrancy variant or every unchecked return value. The analyzer catches them automatically.

Slither

The most widely used static analyzer for Solidity. Slither detects over 80 vulnerability classes including reentrancy, unused return values, dangerous delegatecalls, uninitialized storage pointers, and incorrect ERC20 implementations. It also provides printers that output contract inheritance graphs, function summaries, and variable ordering, all useful during manual review. Slither runs in seconds and integrates directly into GitHub Actions.

Mythril

A symbolic execution engine that explores all reachable states of a contract. Where Slither matches patterns, Mythril actually executes paths symbolically and can prove that certain states are reachable or unreachable. It detects integer overflows, assertion violations, ether theft opportunities, and self destruct reachability. Mythril is slower than Slither but finds deeper bugs that pattern matching misses. It is most effective when pointed at specific high risk functions rather than run against an entire codebase.

Aderyn

A newer Rust based static analyzer that complements Slither with additional detectors and faster execution. Aderyn is particularly useful for identifying gas inefficiencies, redundant storage reads, and code patterns that increase attack surface without adding functionality. Its output is clean, low noise, and easy to triage.

All three tools are configured in CI with a baseline of accepted findings. New findings block the merge. False positives are explicitly documented and suppressed with inline annotations so future reviewers understand why the suppression exists.

Slither Mythril Aderyn Semgrep GitHub Actions

Formal verification

Formal verification mathematically proves that a contract satisfies a specification. Where testing shows the absence of bugs for the inputs tested, formal verification proves the absence of bugs for all possible inputs within the defined specification. It is the strongest guarantee available for smart contract correctness.

Certora Prover

Certora uses a specification language (CVL) to define rules about contract behavior. "The sum of all user balances must always equal the contract's token balance." "A user cannot withdraw more than their deposited amount." "Only the admin can pause the contract." The Prover then exhaustively checks these rules against the Solidity bytecode using SMT solvers. If a rule can be violated, Certora produces a concrete counterexample showing the exact sequence of calls that breaks it. Certora is appropriate for core financial logic, token contracts, vault contracts, and any function where correctness is existentially important.

Halmos

Halmos performs symbolic testing using the Foundry test format. Developers write tests that look like normal Foundry tests, but Halmos executes them symbolically, exploring all possible input values simultaneously. This bridges the gap between fuzzing (fast, random, incomplete) and full formal verification (slow, exhaustive, complete). Halmos is a practical choice when the team wants stronger guarantees than fuzzing without the overhead of learning a separate specification language.

What formal verification can and cannot prove

Formal verification proves that the implementation matches the specification. It cannot prove that the specification itself is correct. If the spec says "the admin can drain all funds" and the code correctly implements that, the prover will not flag it. Formal verification also cannot catch economic design flaws, governance attacks through legitimate voting, social engineering, or bugs in the compiler or EVM itself. It is the strongest tool in the security toolbox, but it is not a replacement for threat modeling, testing, or external review. It is one layer in a defense in depth strategy.

Certora Prover Halmos CVL SMT solvers Symbolic execution

SEC / 04

External audit coordination

Preparing for an audit

An audit firm reviewing messy code with no documentation will spend half their time understanding what the code is supposed to do instead of finding what it does wrong. Preparation directly affects audit quality.

Freeze the codebase. No commits during the audit window. Every change after the freeze requires a re review of the affected components.
Write a comprehensive specification document that describes every contract, every function's intended behavior, every role, every trust assumption, and every known limitation the team has accepted.
Ensure the test suite passes cleanly with no skipped tests and no intermittent failures. The auditors will run the tests. Broken tests waste their time and yours.
Run Slither, Mythril, and Aderyn before submission and fix or document every finding. Auditors should not be discovering issues that automated tools catch for free.
Provide a deployment plan that explains which contracts are proxied, which are immutable, which have admin keys, and what the upgrade path looks like.
Include a list of known risks and accepted tradeoffs so the auditors focus on unknown risks rather than re documenting decisions the team already made.

When to schedule and how to read findings

Schedule the audit after the architecture is stable but before mainnet deployment. Too early, and the code will change significantly, invalidating the review. Too late, and the launch pressure creates incentives to dismiss findings. The ideal window is after internal testing is complete and the team believes the code is production ready. The audit should challenge that belief.

Most reputable firms are booked four to eight weeks out. For high value protocols, lead times of three months or more are common. Plan accordingly. A rushed engagement with a less experienced firm provides less value than a well timed engagement with a top tier team.

Reading the findings report

Critical findings mean immediate, exploitable fund loss. These must be fixed before deployment, no exceptions. High severity findings describe fund loss under specific but plausible conditions. These also require fixes. Medium findings describe protocol malfunction, griefing vectors, or denial of service paths. They should be fixed unless the team documents a clear reason for accepting the risk. Low and informational findings cover gas optimization, code clarity, and best practice adherence. They improve code quality but do not represent security risk.

Every finding deserves a written response. "Fixed" with a commit hash. "Acknowledged" with an explanation of why the risk is accepted. "Disputed" with a technical argument. The response document becomes part of the project's permanent security record.

Recommended audit firms and platforms

The right firm depends on the protocol's complexity, the chain it deploys to, the timeline, and the budget. No single firm is best for every project. The following are consistently recommended based on track record, depth of findings, and auditor expertise.

Trail of Bits

Traditional firm

OpenZeppelin

Traditional firm

Consensys Diligence

Traditional firm

Spearbit

Distributed network

Code4rena

Competitive audit

Sherlock

Competitive audit

Traditional firms assign a dedicated team who review the codebase over a fixed engagement window. Distributed networks like Spearbit match individual senior security researchers to the project. Competitive audit platforms like Code4rena and Sherlock open the codebase to a pool of independent researchers who compete for bounties based on the severity of their findings. Many protocols benefit from a traditional audit followed by a competitive audit for additional coverage.

The audit is not a guarantee of safety. It is a point in time review by experienced humans who can miss things. Audited contracts have been exploited. The audit is one layer in a defense in depth strategy that includes threat modeling, comprehensive testing, formal verification, monitoring, and incident response.

SEC / 05

Backend and infrastructure security

API security and authentication

Smart contract security means nothing if the backend that feeds transactions to the contract is compromised. The API layer is the bridge between users, frontend applications, and on chain logic. Every endpoint is authenticated, rate limited, and validated against a strict input schema.

Authentication uses JWT tokens with short expiration windows and refresh token rotation. API keys for service to service communication are scoped to the minimum required permissions. OAuth 2.0 flows handle third party integrations. Rate limiting is enforced per endpoint, per user, and per IP, with configurable thresholds that tighten automatically when anomalous traffic patterns are detected. Input validation rejects malformed requests before they reach the business logic layer. CORS policies restrict which origins can call the API. TLS termination happens at the load balancer, and all internal service communication uses mTLS.

JWT OAuth 2.0 mTLS Rate limiting CORS

Secrets management and key management

Private keys, API credentials, database passwords, and encryption keys never exist in source code, environment variables on disk, or unencrypted configuration files. They are stored in dedicated secrets management infrastructure and accessed at runtime through authenticated, audited API calls.

HashiCorp Vault provides secrets management for self hosted infrastructure with dynamic secrets that rotate automatically, lease based access control, and a complete audit log of every secret access. AWS Secrets Manager handles secrets for cloud native deployments with automatic rotation, fine grained IAM policies, and encryption at rest via KMS. For blockchain private keys specifically, the key management strategy depends on the risk profile. Hot wallets use encrypted keystores with hardware security module (HSM) backed signing for high value operations. MPC wallets distribute key shares across multiple parties so no single compromised node can sign a transaction. Multisig wallets require multiple independent signers for treasury operations, with timelocks on high value transfers.

HashiCorp Vault AWS Secrets Manager AWS KMS HSM signing MPC wallets Multisig

Infrastructure hardening and DDoS protection

Production infrastructure follows the principle of least privilege at every layer. Container images are built from minimal base images with no shell access in production. Kubernetes network policies restrict pod to pod communication to only the paths that the architecture requires. Security groups and firewall rules default to deny all and explicitly allow only necessary traffic. SSH access to production nodes is disabled in favor of session based access through AWS Systems Manager or equivalent, providing a full audit trail of every operator session.

DDoS protection operates at multiple layers. Cloudflare or AWS Shield absorbs volumetric attacks at the network edge. Application layer rate limiting handles targeted API abuse. Backend services are deployed across multiple availability zones with automated failover. RPC endpoints for blockchain interaction use dedicated node infrastructure rather than shared public endpoints, avoiding rate limits and reducing exposure to third party outages. Geographic load balancing routes requests to the nearest healthy region.

Kubernetes Cloudflare AWS Shield Network policies Minimal containers

SEC / 06

Monitoring and incident response

Real time monitoring and anomaly detection

Deployment is not the finish line. It is the point at which the attack surface becomes live. Monitoring must detect anomalous behavior within seconds, not hours, because on chain exploits drain funds in minutes.

Forta provides a decentralized monitoring network with detection bots that watch for suspicious on chain activity. Bots can be configured to alert on large transfers, unusual function call patterns, governance proposal submissions, flash loan usage, and known attacker addresses interacting with the contract. Tenderly alerts monitor contract state changes, failed transactions, and gas usage spikes in real time. Custom alerts trigger when contract balances deviate from expected ranges, when admin functions are called outside of scheduled maintenance windows, or when oracle prices diverge from secondary data sources.

Backend monitoring uses Grafana dashboards backed by Prometheus metrics for system health, Sentry for application error tracking, and structured logging through the ELK stack (Elasticsearch, Logstash, Kibana) or cloud native equivalents. Every API request, every transaction submission, and every blockchain event is logged with correlation IDs that allow tracing a single user action across every service it touches.

Forta Tenderly Grafana Prometheus Sentry ELK stack

Circuit breakers and emergency response

When an exploit is detected, the first priority is stopping the bleeding. Circuit breakers are emergency mechanisms built into the contract and the backend that can halt operations before an attacker drains the full value at risk.

On chain circuit breakers include pausable contracts (OpenZeppelin Pausable), withdrawal rate limits that cap the amount any address can withdraw per time period, and guardian multisigs that can trigger emergency shutdown without a full governance vote. Off chain circuit breakers include API kill switches that stop the backend from submitting new transactions, RPC failover that redirects traffic away from compromised endpoints, and automated treasury sweeps that move funds to cold storage when anomalous withdrawal patterns are detected.

The key is that circuit breakers must be tested before they are needed. A pause function that has never been called in a test environment is a pause function that might not work when the contract is under attack. Every circuit breaker is tested in fork tests, and the emergency response runbook is reviewed quarterly.

OpenZeppelin Pausable Guardian multisig Rate limits Kill switches

Incident response and post incident analysis

Every project with deployed contracts has a written incident response plan. The plan defines roles (who is the incident commander, who communicates externally, who executes the technical response), escalation thresholds (what constitutes a P1 versus a P2), communication channels (a dedicated war room, not a general Slack channel), and a step by step runbook for the most likely incident types.

Detection. Automated alerts from Forta, Tenderly, or internal monitoring flag the anomaly. The on call engineer confirms it is a real incident, not a false positive.
Containment. Circuit breakers activate. If the exploit is on chain, the pause function or guardian multisig executes. If the exploit is off chain, the affected service is isolated.
Assessment. The team determines the scope. What was affected. How much value is at risk. Is the exploit ongoing or has it concluded.
Remediation. A fix is developed, tested against a fork of the current state, and deployed. For upgradeable contracts, the fix goes through the proxy upgrade path. For immutable contracts, mitigation may require deploying a new contract and migrating state.
Communication. Users, partners, and the broader community are informed with honest, specific updates. What happened, what was the impact, what is the plan.
Post incident review. A blameless retrospective documents the root cause, the timeline, what worked, what failed, and what changes will prevent recurrence. The findings feed back into the threat model and the test suite.

Incident runbooks PagerDuty Blameless retrospectives War rooms

FAQs

How do you make sure our smart contracts are secure before they go live?

Security is layered across the entire engagement: threat modeling during design, automated testing and fuzzing during implementation, static analysis on every pull request, and a third-party audit before deployment. The system is designed to catch issues at every stage, not just at the end.

Do you handle the security audit, or do we need to find an audit firm ourselves?

Gatekick coordinates the full audit process. The team prepares the codebase, engages the audit firm, manages the timeline, and works through findings with you. Audit firms are engaged early so they do not become a bottleneck before launch.

What happens if the auditors find a critical issue?

Critical findings are fixed before deployment, full stop. The team works through every finding, explains severity in plain language, implements fixes, and verifies them before going live. You see the full audit report and understand exactly what was found and how it was resolved.

How do you test for attacks that have not happened yet?

The team uses fuzzing and property-based testing to generate millions of randomized scenarios that attempt to break system invariants. This catches edge cases that manual testing misses. For critical financial logic, formal verification mathematically proves that certain properties always hold.

Security as a practice, not an afterthought.

Threat modeling before code

Common vulnerability classes

Reentrancy

Access control

Oracle manipulation

Front running and MEV

Flash loan attacks

Integer overflow and precision loss

Storage collision and proxy pitfalls

How the security review works

Unit testing and integration testing

Property based testing and fuzzing

Fork testing against mainnet state

Coverage targets and what they actually mean

Static analysis in the CI pipeline

Slither

Mythril

Aderyn

Formal verification

Certora Prover

Halmos

What formal verification can and cannot prove

Preparing for an audit

When to schedule and how to read findings

Reading the findings report

Recommended audit firms and platforms

API security and authentication

Secrets management and key management

Infrastructure hardening and DDoS protection

Real time monitoring and anomaly detection

Circuit breakers and emergency response

Incident response and post incident analysis

Tell us what you are building.