The V-Model is a software development method often found in areas with high requirements on safety and security, which are common in highly regulated areas. Combining the traditional V-Model with a disciplined agile approach promises to allow as much agility as possible, while addressing the issues often found in AIoT initiatives: complex dependencies, different speeds of development, and the "first time right" requirements of those parts of the system which can`t be updated anymore after the Start of Production (SOP).

Foundation: V-Model

The V-model is a systems development lifecycle which has verification and validation "built in". It is often used for development of mission critical systems, e.g. in automotive, aviation, energy and military applications. It also tends to be used in hardware-centric domains. The V-model - not surprisingly - is using a v-shaped visual representation, where the left side of the "V" represents the decomposition of requirements, as well as the creation of system specifications ("definition and decomposition"). The right side of the "V" represents the integration and testing of components. Moving up on the right side, testing usually starts with the basic verification (e.g. unit tests, then integration tests), followed by validation (e.g. user acceptance tests).

When applying the V-model to AIoT, it needs to take different dimensions into consideration, usually including hardware, software, AI and networking. In addition to the standard verification tests (unit tests, integration tests) and validation tests (user acceptance and usability tests), the V-model for AIoT also needs to address interoperability testing, performance testing, scalability testing and reliability testing. The highly distributed nature of AIoT systems of will pose specific challenges here.

Test automation is key to ensure a high level of test efficiency and test coverage. On the software-side, there are many tools and techniques available to support this. In the AI-world, these kinds of tools and techniques are only just beginning to emerge, which makes it likely that a more custom approach will be required. In the embedded and hardware world, simulation techniques like Hardware-in-the-Loop (HIL), Software-in-the-Loop (SIL) and Model-in-the-Loop (MIL) are well established. However, most AIoT products will also require testing of the actual physical product, and how it performs in the field in different types of environments. This again will be a challenge, and some ingenuity will be required to automate testing of physical products wherever possible.

Agile V-Model

The AIoT framework is trying to strike a good balance between the agile software world and the less agile world of often safety-critical, complex and large(r)-scale AIoT product development with hardware and potentially even manufacturing elements to it. Therefore it is important to understand how an agile method works well together with a V+V-centric approach like the V-model. The logical consequence is the Agile V-model. Combining agile development with the V-model is no contradiction. They both can work very well together, as shown in the figure below:

Agile methods use story maps including epics, themes, features and user stories for logical decomposition. This maps well to the left side of the V
Continuous Integration / Continous Test / Continuous Delivery are inherently agile methods, which map well to the right side of the V
The key assumption is that the V-model is not used like one large waterfall approach. Instead, the Agile V-model must ensure that the sprints themselves will become Vs according to the V-model

There are two options to implement the latter:

Each sprint becomes a complete V, including development and integration / test
The agile schedule introduces the concept of dedicated integration sprints.
One V becomes 2 sprints: One development sprint, one integration sprint
There are pros and cons to both approaches
Complexity and scale of the project will surely play a role in determining the best setup

For most projects / product teams, it is recommended that development and integration are combined in a single sprint. Only for projects with a very high level of complexity and dependencies - e.g. between components developed by different organizations - it is recommended to alternate between development and integration sprints. The latter approach is likely to add inefficiencies to the development process, but might be the only way to effectively deal with alignment across organizational boundaries.

The diagram above shows how a sprint is executed in the agile V-Model, including:

Story Map & Definition of Done (DoD): Creation of the Story Map is the topmost agile requirements engineering tool (see Product/Solution Design). An AIoT Story Map usually includes Epics, Features and later User Stories. This is usually also where the Definition of Done should reside, defining more general acceptance criteria.
Component Architecture: Even though the agile manifesto mandates "code over documentation", most complex projects (i.e. most AIoT projects) require at least a high level component architecture in order to ensure alignment between the different teams. The focus here should be on functional components and their interactions (see AIoT Product Architecture).
User Stories + Acceptance Criteria (AC): This is usually the most fine grained requirements definition in the Story Map. User Stories should be concise, fit onto a post it, and be accompanied with acceptance criteria which are specific to the user story.
Mapping: The next step is to create a mapping between each user story worked on in the upcoming sprint and the components which are required in order to implement it. These can be existing components, or components to be newly created. A key result of this mapping is the identification of the required experts to form the feature team responsible for the implementation of the user story.
Coding / Doing: In AIoT, the doing will not always be coding - it can include hardware development or AI/ML-related work as well. The different tasks at hand will often have different development speeds, and it might not always be possible to create a potentially shippable product increment. The Agile V-Model recommends that if this is a case, at least mockup implementations of the public interfaces should be provided for integration testing.
Component Integration: Component integration should focus on integrating and testing the set of components required for a particular user story. This can sometimes mean that these components are embedded into an environment which simulates auxiliary components and/or production data. This should be handled automatically by the CI (Continuous Integration) infrastructure.
Verification: Supported by the CT (Continuous Testing) infrastructure, verification will focus on unit tests, as well as testing the compatibility of the components in scope of the particular user story.
System Integration: All components changed/created by the current sprint then need to be integrated to create the next, potentially shippable increment of the system.
Validation: The validation phase will focus on User Acceptance Tests (UAT). The physical components of the AIoT system should also be tested in the test lab, or undergo field tests. Because field tests can be quite elaborate, they might have dedicated sprints assigned to them.
Production: Finally, if all goes well, the sprint results can be moved to the production system. In AIoT, the real production system will most likely only be available after the Start of Production (SOP) of the required hardware. After SOP, taking Edge components into production will probably require use of OTA capabilities.

Example: The ACME Vacuum Robot

To illustrate the use of the Agile V-Model, a realistic example is discussed in the following. This example is a robot vacuum cleaning system, which combines a smart, connected vaccum robot with a cloud-based backend, as well as a smart app for control. The example will provide a general introduction first, before discussing an example sprint in the system development in more detail.

Vacuum Robot Systems

As introduced earlier, modern robot vacuum cleaner are very intelligent, connected products. Even the most basic versions provide collision, wheel, brush and cliff sensors. More advanced versions use visual sensor combined with a VSLAM algorithm (Visual Simultaneous Location and Mapping). The optical system can identify landmarks on the ceiling, as well as judge the distance between walls. The most advanced systems utilize LIDAR technology (Light Detection and Ranging) to map out their environment, identify room layouts and obstacles, and as input for computing efficient routes and cleaning methods. For example, the robot can decide to make a detour vs. switching into the build-in „climb over obstacle“-mode. Another example is the automatic activation of a „carpet boost“ mode. IoT-connectivity to the cloud enables integration with user interface technology such as smart mobile devices or smart home appliances for voice control („clean under the dining room table“). Edge AI algorithms deployed on the robot are used to control these processes in advanced models.

Story Map

According to the story map structure proposed by the AIoT Framework, the story map for the vacuum robot system includes epics and features on the top level. The epics include Human / Machine Interfaces, the actual cleaning functions, management of the maps for the areas to be cleaned, navigation / sensing, system configuration, and status / history.

Component Architecture

The Component Architecture for the ACME Vacuum Robot highlights the key functional components in three clusters: the robot itself (Edge), the cloud backend, as well as the smart phone app.

On the robot, two embedded components provide control over the robot functions, as well as access to the sensor data. Higher-level components include the control of the robot movements (based on AI/ML, potentially with a dedicated hardware), the robot configuration, as well as remote access to the robot APIs.

The cloud services include basic services for managing map data, status/history data, as well as user and robot configuration data. The robot control component enables remote access to the robot.

Finally, the smart phone / mobile app provides components for robot configuration and map management, all accessible via the main screen of the app.

Note that - in spirit with the agile "working code over documentation" - the component architecture documentation does not have to be very detailed. Only the main functional components are listed here. Depending on the project, usage dependencies can be added, if such level of detail is deemed relevant and the maintenance of the information is realistic.

User Story & Acceptance Criteria

In this example, we are focusing on the Epic "Configuration". This includes features like "Cleaning Mode" (e.g. silent, standard, or power mode), "Cleaning Schedule Management", "User Account Management", "WiFi Configuration", as well as "Software Update Management". The Definition of Done provides higher level acceptance criteria.

In our example, we will focus on the "Cleaning Mode" feature. This contains a Use Story "change cleaning mode", including some acceptance criteria specific to this user story. The idea is that the user can select different cleaning modes via his smart phone app, which will then be supported by both the ad-hoc as well as the scheduled cleaning modes.

Mapping User Story to Components and Feature Team

Having defined the user story, the next step is to identify which components are required for the story. The "Change Cleaning Mode" story will require a robot configuration component on the smart phone. This will have to interact with the robot configuration component in the cloud. In order to record the change, an interaction with the status / history component is required. Finally, the remove service component on the robot will receive the selected mode, and configure the robot accordingly.

Based on the analysis of the functional components, the required members of feature team for this user story can be identified: They include a domain expert, an embedded developer for the components on the robot, a cloud developer, a mobile app developer, and an integration / test expert. Note that some organization strive to employ full-stack developers. However, in this case it seems unlikely that a single developer will have all the required skills.

Sprint Planning

To kick off a new sprint, the sprint planning event brings together all relevant stakeholders to agree what can be delivered in the sprint and how that work will be achieved. After having agreed on the user stories for the upcoming sprint, in the Agile V-Model each user story must be mapped to the corresponding components in the component architecture.

A useful tool in the documentation of the component-related tasks for a user story is to notation in he figure below:

Plan: New Component - this indicates that a new component will be introduced
API Definition - this indicates that an API definition for a new component is planned (important for the "API first" strategy)
Mockup/Test Implementation - indicates that as a first deliverable, a mockup or test implementation will be provided (important for the "divide and conquer" strategy to support development at different speeds and manage overall systems complexity)
Stable Implementation: Indicates availability of a stable component implementation
Plan: Add / Change Service - indicates that a component must be changed or a new service will be added to the component

Based on the above notation, the evolution of the "Change Cleaning Mode" user story over multiple sprints can be documented. Please note that it might not be necessary to do all this planning up-front. It should usually be sufficient to plan for the target picture, as well as the results of the currently planned sprint on the way to achieving this target picture.

Details of single sprint in the Agile V-Model

Applying the approach outline in the previous section to a single sprint, the figure below outlines how at the beginning of the sprint, the "as is" component architecture is reviewed ("sprint start"). Based on this, the goal for this sprint is outlined ("sprint goal"). In this example, a single new component will be introduced (colored dark grey). At the end of the sprint, we have to go back to the original plan and review it as part of the verification process. In this case, the goal was achieved, as indicated by the "new" component colored in green.

Of course in a larger organization, multiple teams might work on multiple user stories - as indicated by the figure below. Cross-dependencies between these teams have to be identified on the component level. Also, it is usually a good idea to have a commonly agreed Definition of Done, while each feature team can have individual Acceptance Criteria for the user stories they are working on.

At the end of a sprint, the sprint retrospective should summarize any key findings from the sprint`s V&V tasks which need to carry over to the next sprint. For example, findings during User Acceptance Tests might require a change to a user story. Or a feature team might have decided on an ad-hoc change to the planned component architecture. These kinds of findings must either be reflected by changes to the backlog (for user story definitions), or by updating the architecture plans and documentation.

Agile V-Model and AIoT

Taking everything together that was discussed so far for IoT and AI, as well as Agile V-Model, and combining it with the product organization discussion from the chapter on Product Organization, the complete picture looks like in the diagram below: The AIoT workstreams are synchronized by applying the Agile V-model accross all AIoT workstreams. Development and integration sprints alternate (or are integrated into single sprints), in order to ensure that at the end of each agile V-sprint, a potentially shippable increment is achieved (although in reality this will not be achievable for the hardware components - some ingenious approach still has to be applied here).

Agile V-Model, AIoT, and SOP

Finally, this discussion should conclude by mentioning the probably biggest differentiator between a "normal" cloud project and an AIoT project: The Start of Production, or SOP. This is the point in time when the mass production of the smart, connected products is starting, and they are leaving the factory to be deployed in the field. Any required changes to the hardware configuration of the product will now be extremely difficult to be achieved (usually involving costly product recalls, which nobody wants). In a world where manufacturers want to utilize the AIoT to constantly stay in contact with their products after they have reached the customer, and provide the customer with new digital services and applications - rolled out via OTA - it becomes extremely important to understand that the V&V process does not stop with the SOP - at least not for anything that is software. And with software defined vehicles, software defined robots and software defined everything, this is fast becoming the new normal.