Flood Markup Language – A Standards-based Exchange Language for Flood Risk Communication

Flooding is one of the most common natural disasters. There are extensive amounts of studies on understanding and predicting flooding to support preparedness and response. It is critical to share and communicate flood forecasting and modeling datasets generated by different systems and organizations. Most of the organizations share flood risk data for operational purposes with limited metadata and structure. However, there is no unified standard for exchanging flood forecast and alert data with various stakeholders and automated systems. This paper proposes a data communication specification, Flood Markup Language (FloodML), to extensively describe and exchange flood forecasts and alerts with corresponding stakeholders. FloodML can be used in a wide variety of data sharing use cases and needs for emergency management, research and modeling communities, and the general public. This manuscript is an EarthArXiv preprint and has been submitted for possible publication in the International Journal of Disaster Risk Reduction. Please note that this has not been peerreviewed before and is currently undergoing peer-review for the first time. Subsequent versions of this manuscript may have slightly different content. If accepted, the final version of this manuscript will be available via the 'Peer-reviewed publication DOI' link on this webpage. Please feel free to contact the authors; we welcome feedback.

The National Weather Service (NWS), a federal agency and part of the National Oceanic and Atmospheric Administration (NOAA), provides comprehensive model analyses and forecast products for the protection of life and properties (NWS, 2017). The Advanced Hydrologic Prediction Service (AHPS) from NWS provides the current water level and forecast data at selected water gage locations in the form of tabular and XML (Extensible Markup Language) data. The XML format provides mostly numerical data, while the metadata of the stream gage is in the form of plain text. In addition, the NWS website includes historical and recent crests for reference. NWS is also the biggest flood alerts service provider in the U.S., for which the alerts are normally delivered in an XML-based format using Common Alerting Protocol (CAP) (NWS, 2013). This system is event-based, which mainly contains metadata and condensed information, while other data, including detailed forecast flooding events, stage levels, and time are in the plain text format.
There are also other regional or local flood forecasting and communication efforts that have been developed and widely used in their context. Iowa Flood Information System (IFIS, Demir and Krajewski, 2013) is a web-based platform created by the Iowa Flood Center (IFC, Krajewski et al., 2017). This system provides flood condition, forecasts, visualization, and other flood-related information. In addition to the NWS forecast, the IFIS also provides various forecast products with different rainfall scenarios (i.e., with rain forecast, no rain, or 2" rain scenario). It also includes a flood alert system that if the forecast indicates a water level higher than the action or flood level, it will show a flood alert based on the IFC model.
The National Water Center (NWC) is part of the Office of Water Prediction (OWP) within NOAA's National Weather Service (NWS). The NWC developed an operational flood forecast model, National Water Model (NWM), where it simulates and predicts the streamflow for the entire continental U.S. It provides predicted stream flow rate in three different time series: short-range within 18 hours, medium-range within 10 days or longrange within 30 days. All the NWM outputs are in the format of NetCDF, which is an abstract data type widely used in the climate model output datasets. The NetCDF model output data and metadata can be visualized on a map-based interface.

Relevant Data Standards
There are hundreds of markup languages based on XML for various scientific fields. Some of these markup languages are related to geography, weather or disasters. There is a clear gap and need in a standards-based flood risk communication language. These existing markup languages in different domains are informative and inspirational for designing a comprehensive specification and structure for Flood Markup Language (FloodML).
Cyclone Warning Markup Language (CWML) is a markup language created for cyclone advice by National Information and Communications Technology Australia Research Centre (NICTA) (Sun et al., 2006). It contains the necessary information such as observation, precaution, threat type, event name, time, and category for forecast events. All of the events have an applicable area, which may be a circle with a radius and center, or a list of places (region, country, state, city, station) and distance.
Digital Weather Markup Language (DWML) is a markup language created by NOAA (U.S. DOC et al., 2009), which offers XML services to access the National Digital Forecast Database. The design of DWML is focused on producing XML format of the National Digital Forecast Database, which includes three products: Forecast at a Glance, Digital Tabular Forecast, and Digital Zone Forecast. The design of DWML also tries to accommodate other data types such as weather observations and guidance (U.S. DOC et al., 2009).
Quake Markup Language (QuakeML) is a markup language created by the Southern California Earthquake Center (Schorlemmer et al., 2011). It provides an XML-based data exchange standard for an earthquake in order to present waveform data, macroseismic information, probability density functions, slip distribution, and shake maps. In addition, a python package for the QuakeML-style data objects is under development.
WaterML2 is the latest markup language created for exchanging water observation and measurement data. It is based on the Observational Data Model version 2.0 (ODM2, Horsburgh et al., 2016) and uses a schema of Geography Markup Language (Taylor, 2012). The main feature of WaterML2 is its foundation on the time-series, where each observation has a phenomenon period time, and then the results are all listed by data points. With the information of the data points and their relationship (how the time-series was created), it allows interpreting data products such as statistical summaries (Taylor, 2012). The WaterML2 can also apply to other time-series data products such as hydrological models, environmental reporting, and hydrological infrastructure, where it lacks standards and automation (Taylor et al., 2014).
Common Alerting Protocol (CAP) is an XML structured format available from the National Weather Service alert system (NWS, 2013). Other than metadata and alert information, the CAP system is aggregated on a real-time map and can be originated either automatically by an autonomous sensor system or manually. When there is a false alarm, officials can issue a cancellation message to the prior alert (Jones and Botterell, 2005). The CAP is the standard alerting protocol adapted by Federal Emergency Management Agency (FEMA) and NWS, and widely used for the alerts data around the world.

Challenges and Needs in Data Exchange
Recent reports by the National Academies of Sciences, Engineering, and Medicine (2017) mentioned that a large variety of information was delivered through social media platforms and private companies and indicate the current alert information system needs to fit the development of communication demands.
On the one hand, the current alert systems are a little bit bloated that the last-mile transportation can be a big problem (Waidyanatha et al., 2008;Cola et al., 2012). As is OASIS standard, CAP is focusing on exchanging only, but not take into account the delivery abilities to users (Malizia et al., 2010). For example, in the flood alert, the CAP message can contain several Kbytes, CAP contains too much information for the last-mile users, and the servers have to compress the CAP message into shorter length to reduce the information loss (Tomaso et al., 2012). On the other hand, the next generation forecast and alert systems should contain more data and support more users, terminal equipment, automated workflow systems (Duffy et al., 2012), and alert generators (Klafft & Ziegler, 2014;Aanandh et al., 2015). The current alerting systems such as the CAP contain only the decision and action information, and it does not seek to obtain evidence, while the evidencing module may be important for hazard control and post hazard analysis (Aanandh et al., 2015). On the distribution stage, the personalized alert and evidence data should be provided to different users. For translation to other languages, it would be most efficient if the alert messages comply with a predefined list of references of event type, target group, severity, etc (Klafft & Ziegler, 2014). The forecasts and alerts should also contain more supporting information such as images and videos, which can help the public read and interpret the data easily (Carr et al., 2017). The forecast data and alerts should also be stored for a long time for the researchers to do the post-analysis after the hazard events.
The proposed standard should also support the latest application scenarios. Smart city projects may contain standardized online equipment such as an automated river gauge to report flooding (Gaur et al., 2015). Data sharing should be also machine-readable, thus the data-driven methods can be applied to the event data for analyzing the public response, forecast and alert accuracy, etc. In addition, crowdsourced citizen science projects aim to develop an alert system based on the social media platforms such as Facebook, which may possibly allow for a new alert generator in the future (Moi et al., 2016).
Overall, efficient information sharing and exchange are critical to the disaster and emergency sectors, where this standards-based language for flood forecast and alerts can support common flood risk information and communication needs, and can be used among different flooding stakeholders, services, and systems.

Motivation
As technology advances, the flood forecast and alert system will need to evolve for greater accuracy and support modern communication platforms. It is important and necessary to share data with other researchers or organizations for real-time warning and post-modeling evaluation and re-analysis studies. However, reading data from many disparate sources is a complicated task and requires developing a robust and intelligent integration system that can handle changes to the interfaces of the data source. Moreover, current forecast and alert systems operate separately in each organization (e.g. NWS). As is stated in previous sections, there are many data exchange and communication languages for different fields with modern data specifications and formats (XML, JSON). A recent report by the Department of Homeland Security (DHS, 2018), highlights the importance of a standard language to help minimize human error and improve flood-related information access, preparedness, and response for both public and decision makers, and modeling studies for researchers. Given the need for an efficient and standardized data communication language, we propose Flood Markup Language (FloodML) for sharing the flood forecast model data, alerts, warnings, and advisories to support many public, scientific and operational use cases.

FloodML Data Specification
FloodML data specification includes an extensive ontology and metadata on organization issuing the forecast or alerts, event time and locations, time-series data for water stage and flow rate from both historical observations and model output predictions. The FloodML is designed to support a variety of use cases both for researchers, decision makers, and public. In addition, FloodML includes flood alert data, which can be issued automatically when the model outputs are higher than the flooding alert levels. Related instructions and follow-up media broadcasts can be released together with the alert as well. In the following sections, FloodML data models and ontology concepts are explained.

FloodML Ontology
The FloodML has 5 major sections in its ontology, including the sources, products, datasets, models, stations, data, and alerts. The sources section contains the basic organization information including ID, name, website, phone number, and email which can be connected to the sources. The products section contains the contacts, issue time, start time, end time and, web links of the products. Each source may have different products, for example, the NOAA has the forecast models from NWS and NWC, which are two independent products. The contacts section includes the information of the person in charge of this product in the organization such as the name, email, phone, and role. The stations section contains the information related to the specific monitoring station connected to the forecast. Each FloodML file may contain several datasets, and each dataset contains multiple models and data in time series. The FloodML includes all the elements customized for each use case to allow corresponding users to benefit from a different aspect of the forecast output. The alerts section contains the main attributes from the CAP, which has already been an open standard by OASIS (Organization for the Advancement of Structured Information Standards). The CAP is a mature alert system that supports the exchange of highly summarized alerts, although it does not allow to share of original forecast data. In addition to posting an alert by summarized information, FloodML supports the alerts to be together with the forecast data which triggered it. The alerts in FloodML help the researchers to manage the alerts more efficiently, and help users visualize them in an interactive web environment. The FloodML contains all the data that may interest the public, researchers, and decision makers. For example, the public is normally interested in the water levels and the location, while the flow rates and rating curves of the stations are important for the researchers, and historical or recent crests may be important for the decision makers. Figure 1 shows the overall structure of the FloodML ontology. The detailed descriptions for each element in the ontology are provided in Appendix 1.
A schema file is also created for FloodML in the Extensible Markup Language Schema Definition (XSD) format, which specifically defines which elements and attributes are permitted, which elements are required or optional (Laranjeiro et al., 2009). Data type and example values for each element and attribute are also defined in XSD. The order of elements in the FloodML is followed with the XSD file as well. This description file can be used to verify the XML documents and increase the robustness of the system. Users will be noticed when there is invalid syntax, wrongly named variables, or variables assigned with wrong values, etc. It also provides readable instruction for further development. The FloodML XSD schema is available in Appendix 2.

FloodML Web Framework
The web framework of FloodML is based on the PostGIS database, GeoServer, proxy server, and web services. The prototype in Figure 2 demonstrates the implementation of FloodML framework architecture along with other tools and technologies.
GeoServer is an open-source GIS system, which can be used to view, edit, and share geospatial data (Jones et al., 2013). The FloodML uses the GeoServer and several services it provided to visualize the data on a map environment. For example, the users can publish the real-time data in FloodML with the web feature service (WFS). GeoServer can handle additional media and geospatial data in FloodML in the format of shapefile or KML. GeoServer also connects web visualization and PostgreSQL, which helps the generation of flood forecast maps with flooding areas.

Figure 1. The FloodML Ontology Map
GeoSPARQL is a new standard included in the Open Geospatial Consortium (OGC) from 2012 (Perry & Herring, 2012). It is an extension to SPARQL which is a query language to access data stored in RDF (Resource Description Framework) format. It provides a standard format for indexing the triples, and easily attaches to the ontologies with spatial information (Koles et al., 2013). In the flood data platform, GeoSPARQL can be used for querying the information based on coordinates and bounding box, which can increase the ability to perform interactive queries for live data streams.

Data Management
For flood data persistence, the PostgreSQL database server was used with the PostGIS plugin for geo-spatial processing. PostgreSQL is a free and open-source object-relational database management system (ORDBMS), which was developed based on Berkeley code. It supports most of the SQL standards and offers many features like complex queries, foreign keys, triggers, etc. (PostgreSQL Global Development Group, 2017). In the FloodML, the geospatial datasets such as points, lines, polygons, and raster data can be stored and analyzed by PostGIS. In addition, it provides hundreds of GIS processing and analytical functions, such as a buffer, union, and clip to create new geometry, which helps the data management and analysis needs of the platform. FloodML data is stored in the following database schema in Figure 3, which aligns with the FloodML ontology.

Data Transformation
A data transformation component, FloodML converter, is developed to automatically convert data from other formats to FloodML format. Data providers for flood information are using different formats other than FloodML to share data. For example, the NWS uses CAP to provide flood alerts, and the NOAA Hydrologic Prediction Service provides the observation and forecast in generic XML formats. The FloodML converter can recognize the data format from the header, and then convert it to the FloodML for storage and further analysis. In this project, a converter for CAP 1.2 and XML in Hydrologic Prediction Service was provided by converting them to the FloodML XML data format using PHP. This converter can translate the flood forecast data from NWS XML and flood alerts in CAP 1.2 formats to FloodML directly. Due to the high compatibility, most of the information is kept during the conversion. The converter doesn't change data already in FloodML. One of the FloodML transformation output files from the NWS is shown in Appendix 3. The users can create their own converters from any format based on the FloodML schema.

Data Integration and Processing
Web Coverage Service (WCS): We developed a data coverage service to enable the FloodML framework to share maps in raster format. WCS provides detailed data with a description and returns the data in its original format and semantic (OGC, 2012). For example, WCS allows trimming to create a subarea coverage with a bounding box and slicing to subset the data in a specific position, which reduced the size and improved efficiency. The WCS service can deliver FloodML metadata, FloodML description, and coverage in its original or optimized format since they are all based on XML standards. Extensions of WCS allow communication with protocols such as SOAP, HTTP POST, and HTTP GET, which provides high compatibility and consistency to use WCS in the FloodML for requesting coverage.

Figure 3. FloodML Database Schema
Catalog Service for the Web (CSW): CSW was developed to enable clients to search for available services from different providers registered in the local catalog or in other catalogs connected to the local catalog (Voges & Senkler, 2007). The catalog service returns FloodML metadata of the available services that match the user's request. These metadata have generalized properties that can be used for resource evaluation, invocation, or retrieval of the reference sources. In FloodML, the CSW will be used to quickly index the records from thousands of entries as a part of a web service.
Web Processing Service (WPS): We developed the WPS to allow GIS analysis to be applied on the server side. When users are requesting complex geolocation data for FloodML, WPS supports most of the geospatial processes, which include the area, boundary, centroid of a geometry, a smallest convex polygon that contains a geometry, minimum distance between two geometries, and set operations (i.e., intersection, union, and difference) between two geometries (OGC, 2018). This allows the client application to handle some of the computations and save or retrieve the results with the most appropriate method when requesting flood data.

Communication and Sharing
Web Feature Service (WFS): We developed the WFS for sharing geographic data at the feature level (unlike traditional file level). The WFS allows querying data for features and properties, and data manipulation like adding new features, or updating and deleting existing features (Vretanos, 2010). With different operations, users can interrogate, style, edit, and download individual features of the FloodML data (OGC, 2016). And by using the XMLbased filter in WFS, people can obtain specific features based on geometry, spatial, logical, or comparison in a WFS service. WFS operations can be requested and responded through HTTP to edit or download features of flood forecast data in FloodML.
Web Map Service (WMS): The WMS was developed for sharing map images on data service. When the client sends a request to the WMS for a map image, the server would extract the relevant data from different sources and combine them in the selected format (Beaujardiere, 2006). It supports most geospatial image data formats, including jpeg picture files, XMLbased FloodML, and KML data. The WMS also allows the images to be overlaid from different servers directly on a web-based map environment. WMS helps FloodML to visualize the data when requesting map images.

Results and Discussion
Proposed FloodML structures can be used on both machine-generated data and humanreported data. The fixed data such as the organization name and the station metadata can be stored as the default value, which can increase communication efficiency and reduce errors. Several use cases for utilizing FloodML are listed with details below.

Format Converter
Due to the lack of a uniform standard and data format, current flood data and service providers are using their own formats when sharing data with the public. Researchers and platforms often integrate data from different providers for specific purposes. If the FloodML was used as the intermediate file structure for data exchange, existing files need to be converted to FloodML before exchange. This allows users to use their own structure or database to store their own data, and use the FloodML as an intermediate format for converting from and to different formats due to its high compatibility to existing formats.
For example, the AHPS XML from NOAA contains the main forecast data from AHPS model results, including originator, generation time, time zone, station name, flood stages, rating table, and time-series forecast data. In this use case, as an example, we created a PHP script, and a FloodML template, when there is a node from AHPS XML and FloodML matches, we convert the data of the node to FloodML. Thus, the files from NOAA AHPS model can be read and phased into FloodML with this format converter PHP script. Appendix 3 shows an example result by converting a 7-day flood forecast data from NWS on Oct 19, 2020, for the station "Mississippi River at Rock Island LD15" to the FloodML format automatically.
The resulting FloodML file covered all the information in the original NOAA AHPS. It is noticed that NOAA AHPS does not contain the longitude and latitude of the station, nor the publisher information in detail. These data may be important because it is hard for end-users to identify what the station code represents when they want to reach out. Appendix 4 shows the same example with all the expected information, which is expected information if NOAA applies the FloodML on their data sharing service. With this detailed, accurate, and comprehensive information, FloodML can be shared with various stakeholders including the public. Many organizations (i.e., NWS, IFC) may use FloodML as the standard format for flood data exchange in the future, which helps to integrate and compare results between different institutions for flood forecasting services.

Data Exchange Service
Data exchange service is the primary use case for the FloodML that allows stakeholders to share and exchange the data using a web service. The flood data shared via web service can include metadata, station information, flood forecast data, alerts, or other relevant flood information. The metadata includes service provider information and model information. The exchange of metadata on models is important for the researchers to evaluate different models. It is also crucial for the end-users to identify which service the data is transferred from. Station information includes station location, rating curve table and historical crests, which are necessary for hydrologic engineering design. The station information also includes the upstream and downstream station IDs, which connect the stations in the river network, which may help researchers to assess the hydrological models. The flood forecast data are the numerical data generated from the models in time series. The exchange of real-time forecast data is important for alert and emergency broadcast. Forecast data from different models and different stations can be integrated to complement each other in next-generation forecast platform. The forecast data from the past can be used to compare the observations for postanalysis and model evaluation to improve the model accuracy. With the data exchange service of FloodML, a local organization or institution can easily integrate the NWS and NWM data for more reliable in-house model reanalysis and evaluation.
Alert information exchange is also available through FloodML. It allows the combination of time-series forecast data and alert instructions in the same service, which can be used for more effective public communication. The current alert communication system, CAP, provides the instructions and descriptions in paragraphs only. The exchange of alerts with structured flood data helps to intuitively display when, where, and how will the flood happen if the user requires it. On websites or mobile apps, detailed flood alerts with forecast data can be displayed easily through visualization tools and show detailed instructions. For timecritical alerts, the shortened alert messages with instructions can be distributed to the public directly through other common technologies, i.e., Short Message Service (SMS) in affected areas.
Appendix 5 shows the example with NWS forecast data for station "Illinois River AT Havana" and the flood warning alert based on the station issued on Jan 3 rd , 2019. The forecast of the next hours shows a flood is expected, and this file includes the station name, affected area, flooding stages, forecast river height at the prediction hours. Although the forecast data with flood alerts are hard to read directly, it helps the public to understand with appropriate visualization. Users can understand how much the river stage exceeds the flood stage and can determine if further actions are required based on the altitude of their home. This is also useful for the researchers to develop detailed visualization products, such as the pseudo-realtime flood maps with potential flooding areas displayed for specific flood events over time.
With the data exchange service, users can decide what types of data to share or receive, which is highly personalized through the standard format of the FloodML.

Data Visualization
GeoServer in the framework allows information in the FloodML to be easily displayed through WFS. Individually, flood forecast data in each FloodML can be read and visualized in figures rather than text formats for a better understanding of how the water level will change in the future. Figure 4 shows a simple visualization of data accessed directly from the FloodML and integrated into the Google Map API. It reads the data section in the FloodML, which contains the time, gauge height, and units. The green line shows the observation data, and the blue one represents the forecast. The time and value for specific timestep are displayed under the title when the mouse moves over the figure. More information, such as flood stage, historical crests, and recent crests can be displayed on the figure for better visualization when the data are available in FloodML. The visualization scripts can be locally installed, so the FloodML can be visualized directly through a mobile application, which is more efficient than compressing and transferring an image. The geolocation of the station from metadata and affected area from alerts in the FloodML can be visualized on the map as shown in Figure 5. It was visualized using the Google Map API, and the visualization of flood data helps the public better understand what will happen and where will be affected during the flood events. The FloodML service can also be used in decision support systems to notify the public for avoiding the flooded roads and possible electrical hazards.

FloodML Service Catalog
For many use cases, important flood data services can be registered to the FloodML catalog for further query and quick review at the community level. With the use of GeoServer CSW, we can view and query the FloodML Service Catalog as shown in Figure 7. In addition, people can also register their FloodML web services to our catalog.
Users can design their own catalog registration by any of the features in FloodML. For example, as is shown in Figure 8, users can easily inquire about the FloodML files based on the model or station names. Due to the simplicity of FloodML, a catalog service can show either the whole file or subsection of the requested FloodML.

Challenges
Due to the lack of a unified standard, adapting and integrating many existing formats, standards, and protocols is the biggest challenge for flood risk communication. The FloodML is designed to share flood-related data and information resources between different institutions, organizations, agencies, researchers, and the public. As is shown in this section, FloodML can meet the existing data sharing demands and convert the current data exchange formats such as NOAA's AHPS and NWM automatically. For the alert information, FloodML allows exchanging the flood alert information with instructions, descriptions, and structured flood forecast data in a format that is compatible with CAP. In addition, this is also the first attempt to integrate the forecast system with the alert system using machine processing. The automated workflows, including alert data acquisition, will be of great help to improve the system's efficiency. On the other hand, the alerts level and types can be determined based on the forecast, and the indication of the alerting stage and timing can directly point to the forecast data. As is shown in the data visualization section, the automated integration of the FloodML into web-based UI and mobile systems is also an attempt that could modernize and simplify the consumption of the flood forecast and alerts services.

Conclusion
The FloodML is the first-generation markup language for flood information, forecast, and alerts data. It provides a simple structure and high practicality in integration, visualization and exchange. It makes the sharing of flooding data easy and efficient for organizations and agencies. It helps the aggregation of flood data for future analysis and provides the standard for organizations to establish their data platform. The FloodML has a simple structure and easy to implement, which fills in the gap of effective flood data communication. The XML-based portable structure is easy to use and is adaptable to other coding schemes. The message schema of FloodML has high practicality that supports multiple message types (e.g., alerts, forecasts), and multiple actions (e.g., initial, updating, cancellation). It provides supplemental source information including reference images, videos, original websites, issuer information, and contact details.
This markup language improves the consistency of flood risk communication across different institutions by providing a unified standard format, especially improves the readability of alerts by providing the flood forecast data in structured semantics. Researchers can easily compare different forecast products for (re)analysis. When there is a flood alert, the emergency department can also get information faster to prepare for a flood. It improves processing efficiency by connecting forecast data and alerts, and increases the speed of spreading critical information to the public with automated processing of the data. Through automated visualization capabilities of FloodML, the public can gain a more intuitive understanding of the changing trend of the water level of nearby rivers and possible future floods with FloodML. This standard format of flooding information also benefits the publishing, sharing, exchanging, and accessing data, which improves collaboration between different institutions and services for automated and faster data sharing and analysis. This will help reduce the injury and property loss during flooding for the public, local government, and the federal government.
With the open-source release of FloodML(https://github.com/uihilab/FloodML), we plan to collect community feedback and opinions to improve FloodML in future updates. Followup products and services, such as the customized API and supplementary exchange file formats will be developed as well.