Testing in Production (TiP) Approaches and Risks

What & Why of TiP
"Testing in production (TiP) is a set of software testing methodologies that utilizes real users and production environments in a way that both leverages the diversity of production, while mitigating risks to end users. By leveraging the diversity of production we are able to exercise code paths and use cases that we were unable to achieve in our test lab, or did not anticipate in our test planning"
A to Z Testing in Production: TiP Methodologies, Techniques, and Examples

"In today’s world, testing in production is not only important, but also critical to ensure complex applications work as we expect them to. Particularly with the move to devops and continuous delivery, testing in production has become a core part of testing software"
An Introduction to Testing in Production

TiP Approaches
TiP is a growing practice in the industry mainly due to growing complexity of application, adoption of continuous delivery & deployment and reducing the BUFT (Big Up Front Testing) cost. Setting up a production like environment for testing is also a another challenge for testing complex service(s). Some of the prominent TiP approaches are as follows-
    1. Production Data Analysis
    2. Exposure Control
    3. Synthetic Tests in Production
    4. Destructive Testing

Production Data Analysis
In this approach, existing production data is validated and analyzed against real users and systems behaviour. Production environment is monitored closely for any alarms and issues being fixed as per priority. There is no change to production environment while using this approach and all testings / operations are strictly read only. We need to be cautious on configuring too many monitors though as this might result into network congestion and should be avoided. Different types of production data and examples of what can be analyzed by testers are noted below.

Type	Notes
Transaction data	» Are users abandoning the site during checkout process? » Are significant number of payment transactions getting failed? » Are videos getting errored out during play? » Are transaction data being retained as per the policy? » Do you see stale resources in different regions? Is content being propagated across servers? » Are users abandoning the site during checkout process? » Are there any broken links or unavailability of resources like images / videos from site?
Logs (application, errors, events, audit, database, network, etc.)	» Are number of errors have increased post deployment of new build? » Are DOS attacks in place? » Are results vetted out by search engine is meaningful? » Are there lots of connection errors? » Are there certification related errors?
Monitoring logs	» Are transactions resulting into errors? From which region? At what time? » Are users facing the latency issues? » Is service stable or going down frequently? » Is production up and running during deployment time? » Has volume of logging changed?
Web analytics tool like Google analytics	» Is bounce rate high? » Is performance issue visible for some region, devices or browser combination? » Do you see sudden increase in 404 or other errors? » Which scenarios taking maximum time?
Feedback loop	» Are there negative feedbacks present online?

Exposure Control
This method can be used to slowly rolling out the software version to subset of users before rolling out to entire infrastructure and make it available to all. This approach makes sure, if failure is there it should fail fast and bring the improvement quickly to the market by getting feedbacks from real users. Different ways exposure of code / features can be controlled to few users before releasing it to all users are listed below.

Approach	Notes
A/B testing (Experimentation for Design)	» It is also known as split testing or bucket testing. » It is essentially an experiment where two or more variants of a page are shown to users at random in an unbiased way, and statistical analysis is used to determine which variation performs better for a given conversion goal. » Testing one change at a time helps in pinpointing the changes had effect on the visitors behaviours. Over time, effect of multiple winning changes from experiments can demonstrates the measurable improvement » Cut losses by detecting ineffective features early and increasing investment on successful features » New user experience is usually well-tested prior to experiment » While canary releases are a good way to detect problems and regressions, A/B testing is a way to test a hypothesis using variant implementations » There is a possibility of data being compromised during experience(s) and it is very important to keep redundant stores and consider read-only access on ramp-up of users. Rollback must be absolutely easy and reliable for the success of this approach.
Canary Deployment	» It is about deploying new code to a small sub-set of the machines in production, verifying that the the new bits didn’t cause regression (functionality and performance), and slowly increasing the exposure of the bits to the rest of the machines in production » It is started with deploying new version of software to subset of infrastructure, to which no users are routed. On success, routing is started to few selected users. Later after gaining more confidence, it is released to more servers & users » Migration phase lasts until all the users have been routed to the new version. In case of any issues, users are re-routed back to old infrastructure. Old infrastructures are decommissioned on complete satisfaction » Conducting capacity testing is easier using canary releases along with safe rollback strategy » It can take days to gather enough data to demonstrate statistical significance from an A/B test, while canary rollout can be carried in minutes or hours. » One drawback of using canary releases is that multiple version of softwares need to be managed » Another scenario where using canary releases is hard is when software is distributed that is installed in the users' computers or mobile devices » Managing database changes also requires attention when doing canary releases. » With A/B Testing we experiment with multiple experiences in order to make sure that we build the right thing, but with canary releases, it is checked if code is built right
Feature Switches	» It is about incrementally building the features in the production environment w/o making it available to real users » Helps gradually building larger features into the existing code without slowing the release cycles » Usual battery of tests can be executed w/o disturbing the real user behaviours

Synthetic Tests in Production
Synthetic tests are automated tests running against the production instances. Synthetic tests can be divided to two groups, API Tests and User Scenario Tests. “Write once, test anywhere” principal is preferred where same tests can run against test and production environment. Production monitors / diagnostics are enabled to assess and report pass / fail. Performance metrics are monitored very closely and test automation is shut down in case of any kind of unacceptable impact on the user experience.

Type	Notes
End to End User Scenarios Execution	» Runs as subset of an application’s automated tests against the live production system on a regular basis. » Results are pushed into monitoring service, which triggers alerts in case of failures » Need to be caution on following items though - - Test users should not be visible to real users or can interact with real users - Tests should be hidden from search engines - It should not influence the continuity and stability of real users - Test data should be tagged so that can be isolated and cleaned easily - Limited number of test users should be created - PII data should be avoided completely for any testing - After verification, test transaction data (e.g. test orders) should be cleaned - Real user data should not be modified, and synthetic data must be identified and handled in such a way as to avoid contaminating production data
Load	» Simulating the production environment in lab is usually very difficult and a primary reason of doing load testing in production even when it is inherently risky » Synthetic load are injected onto production systems, usually on top of existing real-user load » Real traffic data (user workflows, user behaviours and resources) should be collected for simulating load tests » Service virtualization can be used for emulating response from 3rd party services / back office - Tests should be conducted when usage are low - Generating load on an application involving a third-party would indirectly generate load on the partner’s environment, which is NOT legal until informed the said party in advance. - Requires constant monitoring of entire production environment and tests should be stopped immediately to avoid any production issues - Avoid steps that generate records in the back office (avoid validating the order)

Type

Notes

End to End User Scenarios Execution

» Runs as subset of an application’s automated tests against the live production system on a regular basis.
» Results are pushed into monitoring service, which triggers alerts in case of failures
» Need to be caution on following items though -
- Test users should not be visible to real users or can interact with real users
- Tests should be hidden from search engines
- It should not influence the continuity and stability of real users
- Test data should be tagged so that can be isolated and cleaned easily
- Limited number of test users should be created
- PII data should be avoided completely for any testing
- After verification, test transaction data (e.g. test orders) should be cleaned
- Real user data should not be modified, and synthetic data must be identified and handled in such a way as to avoid contaminating production data

Load

» Simulating the production environment in lab is usually very difficult and a primary reason of doing load testing in production even when it is inherently risky
» Synthetic load are injected onto production systems, usually on top of existing real-user load
» Real traffic data (user workflows, user behaviours and resources) should be collected for simulating load tests
» Service virtualization can be used for emulating response from 3rd party services / back office
- Tests should be conducted when usage are low
- Generating load on an application involving a third-party would indirectly generate load on the partner’s environment, which is NOT legal until informed the said party in advance.
- Requires constant monitoring of entire production environment and tests should be stopped immediately to avoid any production issues
- Avoid steps that generate records in the back office (avoid validating the order)

Destructive Testing
In this approach, the goal is to see how soon the system can recover when failure happens. Infrastructure faults are deliberately injected into production system (e.g. services, servers, network etc.) to validate service continuity in the event of real faults or to find MTTR (mean time to recovery). According to Netflix engineers Cory Bennett and Ariel Tseitlin, “The best defense against major unexpected failures is to fail often. By frequently causing failures, we force our services to be built in a way that is more resilient.” Below are some tools, which are being used by industry for destructive testing in the production environment.
» Netflix’s "Simian Army", a set of destructive scripts that the company deploys to simulate various failures
» "Latency Monkey" induces artificial delays
» "Conformity Monkey" finds instances that do not adhere to best practices and shut them down
» "Janitor Monkey" searches for and disposes of unused resources
» "Chaos Monkey" introduces failures on purpose or randomly kills production instance in AWS

References:
» Testing and monitoring in production - your QA is incomplete without it - By Cody Reichert
» Exposure Control: Software Services Peep Show - By Seth Eliot
» 4 Creative Ways to Test Your Code in Production - By Josh Dreyfuss
» Testing in Production – Benefits, Risks and Mitigations - By Aviad Ezra
» Don’t Do It the Wrong Way: Tips for Testing in Production - By Tim Hinds
» CanaryRelease - By Danilo Sato
» A/B Testing - By Optimizely
» Load & Performance Testing in Production - By Neotys

Search This Blog

Tech Notes

Testing in Production (TiP) Approaches and Risks

Comments

Post a Comment

Popular posts from this blog

Performance Test Run Report Template

Bugs Management in Agile Project

Understanding Blockchain