More on modules

IModule and derived interfaces

You already know that each network consists of so-called modules. In this chapter you will learn how a module works and how to write new modules. Module is a class which implements IModule interface. So let's have a look at the UML diagram of IModule and derived interfaces:

The following sections present a brief introduction to these interfaces. For a complete description see the OpenVIP API reference documentation.

The internals of a module

Imagine that you prepare a network description and submit it to the core. What does the core do?

First, it asks all the modules to enumerate their input and output connectors. This is done using the EnumConnectors method. Here is the full prototype:

void openvip::IModule::EnumConnectors(out ConnectorDescList in_conns, out ConnectorDescList out_conns)

The module simply assembles a list of its input connectors and puts it into in_conns; the output connectors go into out_conns. A connector is described using a ConnectorDesc structure:

Each connector must be given a name (e.g. audio0, video0 etc.) and a flag whether the module requires that the connector must be linked to another module. In addition it is necessary to specify the corresponding stream type. Recall that OpenVIP supports three data types: video frames, audio buffers, and data packets (see Data types in OpenVIP). Stream type STREAM_VIDEO then means that the corresponding connector can transport video frames, STREAM_AUDIO transports only audio buffers, and finally STREAM_DATA supports only data packets.

At this moment the core is able to link the modules and check whether the supplied network is correct, i.e. whether all required connectors have a link.

Before the computation starts, all modules have to provide a more detailed information about the data they will produce. The is done using the SetStreams method, which has the following prototype:

void openvip::IModule::SetStreams(in StreamInfoList in_streams, out StreamInfoList out_streams)

The module has to create the out_streams list containing as many StreamInfo objects as there are output connectors. The information in StreamInfo is different for different stream types:

For a STREAM_VIDEO stream the module supplies a VideoStreamInfo object with information about the video frames it will produce including their dimensions, frame rate, aspect ratio and the total number of frames. An AudioStreamInfo object corresponds to a STREAM_AUDIO stream and contains the sample rate, the number of audio channels, and the total number of samples. A DataStreamInfo object simply provides an information about the length of STREAM_DATA stream in seconds.

There arises a question now: How does a module determine its output stream parameters if they depend on the data it will get from its input connectors? E.g. a simple video filter such as Flip produces video frames which are exactly of the same type as the input video frames. The answer is that the core first asks the module which precedes our video filter in the network about the data it will produce (using SetStreams() again) and then supplies this information to our video filter using the in_streams parameter of the SetStreams() method. This means that the input parameter in_params contains one StreamInfo object for each input connector. Our video filter then simply copies the frame width, height etc. to one of StreamInfo objects in the out_streams list.

This procedure always gives sense - since networks cannot contain cycles, there is always a module which doesn't have any input connectors (these are usually the input modules which read data from files). The core first calls SetStreams() on this module, then on its successor etc.

At this time we collected information about all modules in the network and the computation can begin. It is driven by a scheduling algorithm, which decides what to compute at the moment (see Internals of Network Processing). The scheduler asks the modules to produce particular pieces of data.

Imagine we have a simple video filter module which reads a video frame from input stream, performs some operation on it and passes it to the output stream. Assume that the core asks our module to produce videoframe number N. The module now sends a reply to the core that in order to produce this frame it needs frame N from the preceding module in network.

In general it could have required more input frames to produce a single output frame (consider a motion blur module which blends successive frames together). The communication between the core and the module about their requirements proceeds using the method

void QueryRequirements(in RequestList requests, out RequestList requirements);

Its input and output parameters are lists of Request structures:

A request simply identifies the connector number (it is always an output connector in case of requests parameter and always an input connector in case of requirements parameter) and the piece of data required (see Data types in OpenVIP for an overview of address types).

The core calls module's QueryRequirements method to ask it about the data required to fulfill its requests. The module read the requests list, creates the requirements list and returns it to the core. Now it is the core's task to obtain the data specified in requirements. It usually generates a request to the preceding module etc. Note that each network must contain a module with no input connectors; such module usually reads data from a file and therefore has no further requirements.

Finally, after the core obtained the data specified in requirements, it supplies them to the module using the method

void Process(in RequestList requests, in DataFragmentList inputs, out DataFragmentList outputs)

The requests parameter has the same meaning as in QueryRequirements. The inputs list now contains the data the module asked for. After the module processed them, it returns the output in the outputs list.

A simple module example

Let's practice the theory from the previous section by a designing a simple module, which would perform a decomposition of video frames into R, G, B channels. The code presented here is a simplified version of the RGBDecompose module from OpenVIP distribution.

We start with the class declaration:

class RGBDecompose : public IModule
{
public:
    void EnumConnectors(ConnectorDescList& in_conns, ConnectorDescList& out_conns);
    void SetStreams(const StreamInfoList& in_streams, StreamInfoList& out_streams);
    void QueryRequirements(const RequestList& requests, RequestList& requirements);
    void Process(const RequestList& requests, const DataFragmentList& inputs, DataFragmentList& outputs);

    UPF_DECLARE_CLASS(RGBDecompose)
};

UPF_IMPLEMENT_CLASS(RGBDecompose)
{
    UPF_INTERFACE(IModule)
}

As you can see, it is sufficient to implement only the four methods from the IModule interface. The UPF library learns about the existence of our class through the UPF_DECLARE_CLASS and UPF_IMPLEMENT_CLASS statements. We usually want to build the code in form of a dynamic library and therefore we use the statement

UPF_DLL_MODULE()
{
    UPF_EXPORTED_CLASS(RGBDecompose)
}

All we have to do now is to write the code for the four methods. The EnumConnectors method will look like this:

void RGBDecompose::EnumConnectors(ConnectorDescList& in_conns, ConnectorDescList& out_conns)
{
    in_conns.push_back(ConnectorDesc(STREAM_VIDEO, "video0", true));
    out_conns.push_back(ConnectorDesc(STREAM_VIDEO, "video_r", false));
    out_conns.push_back(ConnectorDesc(STREAM_VIDEO, "video_g", false));
    out_conns.push_back(ConnectorDesc(STREAM_VIDEO, "video_b", false));
}

The code says that our module has an input connector called video0 of type STREAM_VIDEO - this is the stream for reading input video frames. There are also three output connectors called video_r, video_g, video_b of type STREAM_VIDEO. The images corresponding to R, G, B channels of input frames will go there.

The SetStreams method is simple, too:

void RGBDecompose::SetStreams(const StreamInfoList& in_streams, StreamInfoList& out_streams)
{
    Ptr<IStreamInfo> info = in_streams[0];
    out_streams.push_back(info);
    out_streams.push_back(info);
    out_streams.push_back(info);
}

Recall that the SetStreams method is responsible of setting the output video streams' width, height, aspect ratio etc. However, our module won't change any of these properties and it is therefore sufficient to set the output streams to match the input stream exactly.

Here comes the QueryRequirements method:

void RGBDecompose::QueryRequirements(const RequestList& requests, RequestList& requirements)
{
    for (RequestList::const_iterator i = requests.begin(); i != requests.end(); i++)
    {
        Request r(*i);
        r.connector = 0;
        requirements.push_back(r);
    }
}

The core can ask for a video frame with number N from any of the three output streams. The code above says that to fulfill this request the module first needs to get the frame with number N from its input stream.

The actual RGB decomposition is done in the Process method:

void RGBDecompose::Process(const RequestList& requests, const DataFragmentList& inputs, DataFragmentList& outputs)
{
    Ptr<IVideoFrame> in(inputs[0]);
    Ptr<IVideoFrame> out;
    unsigned w = in->GetWidth();
    unsigned h = in->GetHeight();
    const pixel_t *in_data = in->GetData(FORMAT_RGB24);
    
    for (RequestList::const_iterator i = requests.begin(); i != requests.end(); i++)
    {
        out = upf::create<IVideoFrame>();
        out->Create(w, h, FORMAT_RGB24);
        switch (i->connector)
        {
            case 0:
                copy_channel_R(in_data + 0, out->GetWData(FORMAT_RGB24), w * h);
                break;
            case 1:
                copy_channel_G(in_data + 1, out->GetWData(FORMAT_RGB24), w * h);
                break;
            case 2:
                copy_channel_B(in_data + 2, out->GetWData(FORMAT_RGB24), w * h);
                break;
        }
        outputs.push_back((IDataFragment*)out);
    }
}

First we ask the input video frame instance to get the image data in RGB24 format. Then we create a new video frame with the same dimensions and copy the requested channel into it. The copy_channel methods could look like this:

void copy_channel_R(const pixel_t *in, pixel_t *out, size_t len)
{
    for (size_t i = len; i > 0; i--, in += 3, out += 3)
        {
            out[0] = *in;
            out[1] = 0;
            out[2] = 0;
        }
}

(Similarly for copy_channel_G and copy_channel_B).

That's all - just compile the new code and you will be able to use RGBDecompose in a network.

Terminal modules

You might have noticed that the scheme described in previous sections doesn't work well with terminal modules, i.e. modules with no output connectors (recall that QueryRequirements contains requests on output connectors). Imagine a terminal module which saves audio and video data to a file; how does the core tell the module to save a group of frames or a block of audio? The answer is that it doesn't. It is reasonable to suppose that the output module knows better the order in which data should be written (note that different output formats may employ different interleaving schemes).

The whole thing with terminal modules is solved using the idea of computation steps. Terminal module classes have to implement the ITerminalModule interface, which is derived from IModule and in addition contains the method

unsigned long GetCompStepCount();

A computation step corresponds to a logical unit of output data. The module is free to interpret computation step any way it wishes. For video output module, it may be equal to frame number; audio and video output may interpret it as e.g. 1 second (so that it can easily interleave audio and video data).

Before the whole network processing starts, each terminal module has to decide about the number of computation steps it needs to perform. The core then calls QueryRequirements and Process using the ADDR_COMP_STEPS addressing scheme (see Data types in OpenVIP) repeatedly with computation step number equal to 0,1,2,...,GetCompStepCount()-1.

Multipass modules

It is possible that a module is not able to compute its outputs in a single pass; it might e.g. first want to scan the whole input stream before it decides what to do. Such modules have to implement the IMultiPassModule interface (see the UML diagram at the beginning of the chapter).

Before the computation starts, each multipass module is asked for the number of passes using the GetPasses method; let us denote this number N. Since the module doesn't produce any output during the passes 1, ..., N-1, it always behaves as a terminal module, which means that the core calls its QueryRequirements and Process methods using the computation scheme (see the previous section). Each of the passes 1, ..., N-1 may consist of a different number of computation steps.

After one pass finished, the core calls the NextPass method and continues with the next pass. A multipass module may be terminal in the last pass (e.g. a multipass output module), but is not required to (e.g. a multipass audio filter) - its GetCompStepCount() then returns 0 in the last pass.

You may object that the whole thing with multipass modules seems weird - if a module first needs to scan all video frames from a file, why doesn't it use the ordinary module's QueryRequirements method with requirements for all frames? This would do the trick for files with only few video frames. However, it is impossible to ask for hundreds or thousands of frames at once. A well-behaved module never requires extraordinary large amount of data and uses the multipass mechanism instead.