AI for Testing

Winds of Change….

Software testing conventionally looks at validating requirements and, ignoring the correlation of data points that can be extracted from test results, test processes, defect logs, developer & tester performance, production incidents, code coverage, environment factors and sentiment of product users in real life situations. RTAP draws insights from these data points and enables AI (Machine Learning) based decisioning to reduce software defects, thus surpassing traditional testing practices and driving process and performance efficiencies.

Future of Testing….

Companies across the world are prioritizing their strategic technological adoption plans in the face of growing competition and pressure, which is positively contributing to the software testing industry. As the world embraces digital transformation and companies use newer techno-business models to sustain competition, the need for faster deployment of IT solutions will continue to rise. As digital transformation gains ground, organizations are investing more in application development to meet the demand of businesses. This has boosted spending on testing and quality assurance. This will translate into notable changes in the overall software development & testing process.

In this rapidly changing scenario, there is a paradigm shift in the way customers are building & testing products. The traditional methods of testing are gradually getting augmented by more controllable methods like Agile & SDET (Software Developer Engineer in Test). But the biggest change that is coming forth is from the realization that human effort in software testing has limited scalability and reliability. This directly challenges the continuation of our prevalent labor arbitrage model.

The beginning….

Imagine an AI (Machine Learning) software that analyzes every stage of the software testing life cycle and helps you take data driven decisions to reduce defects, optimize test case execution and reduce human effort. RTAP is such a product!

RTAP can deliver Improved quality, Faster time to market, and significant reduction in efforts with complete End to End test coverage. For the first time in the field of testing, it lays the foundation for taking a data driven, cognitive approach for reducing software defects.

It is an accepted fact in the technology industry that by simply adding more people in the software testing process, products are not going to get better. Increasing Head Count is simply not a scalable solution and heuristic methods of predicting QA outcomes is not a reliable method when you look at the sheer size and number of data points to be considered. There is also this need for progressive “defect reduction” as a key metric for evaluating the performance of test teams. In fact, some customer surveys have gone so far as to suggest that, in the future testers should be paid based on the defects they fix!

Results to look for….

RTAP improves testing effectiveness by reducing Overfit (removing redundant test cases) and Underfit (not testing key features). It uses machine learning models for Test Suite Optimization, Risk Based Testing, Traceability and Test Coverage.

RTAP uses AI models for making Predictions related to defects and helps testing team to incorporate “Shift Left” techniques that are so important for improving product quality.

RTAP also provides “What-If” analysis to simulate various parametric behaviors whereby helping the testing team to take corrective actions in advance to achieve testing objectives. It facilitates in prescribing corrective actions learned from the historical data, which can be used to reduce defects on a continuous basis. RTAP also is a self-learning product and gets smarter ( more accurate ) over a period of time by fine-tuning its machine learning models.

Only RTAP….

Backed by over 2 years of research, product development, data science and collaboration with one of the largest testing services providers in this world, RTAP has emerged as the only product in the market today that can be quickly configured for any customer’s environment to deliver AI enabled testing of software products.

RTAP is an integrated product and has all the necessary data management utilities, machine learning models and visualization elements necessary for this use case. RTAP reduces your “tools to solutions journey” and can quickly help you deliver software products in a smarter way.

At RTAP we still believe that this is only the beginning of better things to come….

SAP HANA – Scale Out Performance Test Results – Early Findings

Customer Performance Requirements

I was recently asked by a customer to recommend a solution that could solve the problem of performing analytics on data getting generated at ATMs and POS machines. The problem in essence was that while the data was getting generated there was no way to query it on the fly. Traditional databases like Oracle, DB2, SQL Server and others were not able to solve this problem with their current product set since these databases were not built with the purpose of analyzing large amounts of data online. The traditional databases also required constant tuning, indexing and materializing the data before you could run any sort of business intelligence query on it. Essentially someone had to prepare the data and make it query ready and to say the least this costs a lot of time and money which this particular customer was trying to avoid.

Linear Scaling for Big Data in Real Time

In my opinion the only way to do this was using an in-memory database like SAP HANA which was built with the purpose of running analytics on live data. I did have some doubts about HANA’s scalability and requested SAP for guidance. They briefed me about a recent scale-out, testing of HANA where they simulated 100 TB of actual BW customer raw data over a SAP Certified configuration of a 16 node cluster with 4 IBM X5 CPUs each having 10 cores and 512 GB memory. The test data consisted of 100TB test database with one large fact table (85TB, 100 billion records) and several dimension tables. 20x data compression was observed, resulting in a 4TB HANA instance, distributed equally on the 16 nodes (238GB per node). Without indexing, materializing the data, or caching the query results, the queries ran between 300 to 500 milliseconds, which in my opinion close enough to real time. There were also ad-hoc analytic query scenarios where materialized views cannot be easily used, such as listing top 100 customers in a sliding time window, and year-to-year comparisons for a given month or quarter.

In my opinion these tests demonstrate that SAP HANA offers linear scalability with sustained performance at large data volumes. Very advanced compression methods were applied directly to the columnar database without degrading the query performance. Standard BW workload provides validation for not only SAP BW customers, but any data mart use cases. This is the first time I have encountered a solution offering BW the potential to access raw transactional ERP data in virtual real-time.

Data Management Architecture for Next-generation of Analytics

Readers of this blog may be also interested in knowing that new business intelligence optimized databases such as HANA have inherent architectural advantages over traditional databases. Old database architectures were optimized for transactional data storage on disk-based systems. These products focused more on transactional integrity during the age of single CPU machines connected through low-bandwidth distributed networks while optimizing the use of expensive memory. The computing environment has changed significantly over last decade. With multi-core architectures becoming available through commodity hardware, processing large volumes data in real-time over high-speed distributed networks is becoming a reality due to products such SAP HANA.

All in-memory Database Appliances are Not Created Equal

Apparently some solutions in the market like Oracle’s Exadata, also cache the data in Exalytic/TimesTen for in-memory acceleration. However, TimesTen is a row-based in memory database and not a columnar database like HANA which are faster for business intelligence applications. Oracle also uses these databases for in-memory cache, not like HANA which is the primary data persistence layer for BW or data mart. Therefore in my opinion, Oracle’s solution is more suited for faster transactional performance but creates data latency issues for real-time data required for analytics. From a cost and effort perspective it will also require significant amount of tuning and a large database maintenance effort when doing ad-hoc queries (sliding time-window or month2month comparison…etc) because you are trying to re-configure an architecture that is meant for transactional systems to deploy for analytics.

I hope this blog is useful and provides general guidelines to people interested in considering new database technologies like SAP HANA.