Fluorine kernel projects planning from the DDF


Stephen Kitt <skitt@...>
 

Hi everyone,

During the DDF we had a kernel projects planning session. Here are my
notes:

Fluorine kernel projects planning session


Clustering
==========

Improve clustering: look at the three-node CSIT and fix the
issues. Make it run reliably so it can be used as a stable base.
Distinguish between failures related to the clustering layer itself,
and failures caused by poor clustering support in individual modules.
We know of one Akka issue we're currently tracking but it's hard to
reproduce and difficult to get Akka upstream involved. Start by
getting Vratko's clustering test-suite stable (it doesn't use any
plug-ins), then work on fixing plug-in issues revealed by the full
CSIT suite.

Ask TAC about buying some upstream support from Akka-supporting
companies? (Need to talk to the governing board, add this to the goals
document. Action => Daniel Farrell.)

Use the lab-as-a-service offering from OPNFV, to see whether the
improved reproducibility promised there helps with the issues we're
seeing. However since the issues we're seeing are now back to the same
on Vexxhost as we had with Rackspace, there's a strong possibility
that the issues are inherent to ODL/Akka and not related to the
hosting environment.

Need to reduce the complexity involved in developing plugins which
behave correctly in a clustered environment (recover from failed
connections etc.). Cluster singleton is documented but users need to
satisfy every single API requirement to the letter. We also have
conflicting requirements, and recovery has different meanings for
different services (e.g. deleting state from external hardware when a
cluster node is no longer its owner) — big mismatch between cluster
singleton requirements and southbound plug-in requirements when the
plug-ins need to synchronise their state across the cluster and with
devices. So at the kernel level we need detailed requirements
describing what other projects want to do; then we could expose new
features if necessary (including Akka features which aren't exposed
currently). Entity ownership service helps a lot.

We will follow up on these points in the kernel call. Action => Anil
Vishnoi to document requirements.

Tell-based: weird behaviour seen in CSIT, need to investigate that
before re-enabling.


Network use
===========

Some plug-ins use the data store for high volume storage, which
results in heavy network use. The data store isn't designed for this
use case, and perhaps we should define limits beyond which other tools
should be used (Cassandra etc.)

We need performance testing, and work on updating our published
performance numbers. We need to instrument performance testing so we
can spot regressions and measure improvements (see Bitergia).


Data store debugging
====================

What is the recommended way to retrieve the data store contents from a
non-shard-leader node? Ask-based Akka limits the amount of data we can
transfer; with tell-based we'll be able to do better. Whether this
makes it in Fluorine is still uncertain.


AAA
===

Removal of web.xml in favour of the code-based configuration
framework.

Jackson migration, done today.

Replace the OLTU-based token server.

More protection, better encryption.


Controller
==========

Fluorine: a lot of clean-up:
* DataChangeListener is up for removal; it was deprecated in Carbon
and scheduled for removal in Fluorine. All the ODL projects have
been converted away from DCL; downstream projects *will* need to
perform the same migration after Fluorine is
released. DataTreeChangeListener provides a better interface with
better performance.
* The controller implementations of the MD-SAL APIs should be dropped
in favour of the MD-SAL versions. This will take many releases but
we will start deprecating them in Fluorine.
* The Config Sub-System is up for removal. The people who wrote it no
longer maintain it, and all of ODL has already switched to
BluePrint.
* The in-memory datastore will be removed (it's also available in
MD-SAL).


MD-SAL
======

Binding v2 won't be ready in Fluorine.

Binding v1 is being retro-fitted to fix the more grievous issues (see
the mailing list). This will allow v1 to keep us ticking over until v2
is ready.

Clean up the DOM-level APIs to better support new use cases.

Support for actions and nested notifications in Fluorine and possible
Oxygen SR2.

MD-SAL will probably become release-based at the end of Fluorine.


Infrautils
==========

See the dedicated Infrautils session.


NETCONF
=======

Same trajectory, nothing major (maintenance mode).

We need to start testing the RFC 8040-based version of NETCONF
(instead of bierman).


Federation
==========

The project is dead; if anyone wants to revive it, see Robert Varga.




Please join the Kernel Projects Call, Tuesdays at 9am Pacific:
https://wiki.opendaylight.org/view/Kernel_Projects_Call




Regards,

--
Stephen Kitt
Principal Software Engineer, Office of the CTO
Red Hat