Mobile robots enhance traditional acoustic tracking methods to study the spatiotemporal behavior of deep-sea fishery resources.

Knowing the displacement capacity and mobility patterns of industrially exploited (i.e., fished) marine resources is key to establishing effective conservation management strategies in human-impacted marine ecosystems. Acquiring accurate behavioral information of deep-sea fished ecosystems is necessary to establish the sizes of marine protected areas within the framework of large international societal programs (e.g., European Community H2020, as part of the Blue Growth economic strategy). However, such information is currently scarce, and high-frequency and prolonged data collection is rarely available. Here, we report the implementation of autonomous underwater vehicles and remotely operated vehicles as an aid for acoustic long-baseline localization systems for autonomous tracking of Norway lobster (*Nephrops norvegicus*), one of the key living resources exploited in European waters. In combination with seafloor moored acoustic receivers, we detected and tracked the movements of 33 tagged lobsters at 400-m depth for more than 3 months. We also identified the best procedures to localize both the acoustic receivers and the tagged lobsters, based on algorithms designed for off-the-shelf acoustic tags identification. Autonomous mobile platforms that deliver data on animal behavior beyond traditional fixed platform capabilities represent an advance for prolonged, in situ monitoring of deep-sea benthic animal behavior at meter spatial scales.

The marine benthic realm is progressively becoming wired with cabled infrastructures in an attempt to transform strategic or protected areas (i.e., those of commercial or ecological value) into robotized laboratories with permanent monitoring functions (*1*, *2*). At the same time, other relevant oceanic networks are being established worldwide, seeking to track the large-scale pelagic movements of species over large geographic areas and durations by animal-borne data loggers (*3*–*6*). Data like these provide essential behavioral information for applying the latest conservation policies (*7*).

Large marine megafauna (e.g., cetaceans, dolphins, elasmobranchs, or sea turtles), which rise to the sea surface habitually, allow the use of data loggers with global positioning system (GPS) and remote communication (e.g., Argos satellite network) to determine the duration and trajectories of those movements (*8*). However, never–surface-emerging benthic and pelagic species cannot be tracked using such a methodology because electromagnetic waves suffer the drawbacks of high attenuation in seawater medium (*9*).

For those nonemerging benthic species, acoustic positioning methods from fixed platforms can be used alongside acoustic tag sensors deployed on animals (*10*) that are tracked using long baseline (LBL) triangulation techniques (*11*). However, deploying benthic anchored receivers may increase operation complexity (e.g., in terms of spatial precision) and economic costs (*12*), with minimal flexibility (e.g., single location). Moreover, it is necessary to use specific tags to maintain synchronization between each receiver into the listening network, which may increase the complexity in data postprocessing (*13*). Although this technology has proven useful for behavioral tracking in shallow water scenarios (*14*), its performance has not yet been fully examined in the deep sea. Only a few efforts have been conducted to follow populations movements over kilometer scales (*15*) with acoustic receivers mounted on moored of curtain or gate typologies (*16*).

A complementary strategy to the use of moored devices is to mount acoustic receivers on autonomous underwater vehicles (AUVs), which is used as a virtual LBL, measuring the distance range with the target by acoustic modems (*17*, *18*). Differently from acoustic tags, these modems have bidirectional communications capabilities, and therefore, the time of flight (TOF) and slant range of an acoustic signal can be measured knowing the sound’s velocity. Last, triangulation localization techniques are applied to estimate the position of tagged individuals with different algorithms (*19*). In addition, the bearing information estimated by ultrashort baseline (USBL) systems can also be used, which increases the overall speed response (*20*, *21*). Nevertheless, the investigations are again limited to large animals due to the size of electronic tags (*22*). Other authors use bearing-only techniques to avoid the use of acoustic modems and overcome the size limitations (*23*, *24*), where an AUV borne hydrophones’ array is used to track acoustic tags. Unfortunately, a substantial localization uncertainty is produced by the too-close positioning of hydrophones, which often requires larger separation that is not achievable on AUVs (*25*).

However, marine robots have, in recent years, been used for the tracking of marine species. For example, AUVs equipped with a single hydrophone were used to track fishes with different error ranges and procedures [e.g., SYNAPS and SPLWCA (*14*, *26*, *27*)]. Some of the studies were conducted in combination with seafloor moored receivers, allowing records of the presence of tagged animals within the area of detection but with high uncertainty in their position (*28*–*31*). Other authors used custom transponders attached to large marine species to increase the efficiency of vehicle tracking capabilities (*20*, *32*), but this approach is impractical for small marine species.

Here, we describe a new procedure for multitracking one of the most important fishery resources in Europe, the Norway lobster (*Nephrops norvegicus* Linnaeus, 1758) (*33*), using a set of moored seabed receivers along with a remotely operated vehicle (ROV) and an AUV (Fig. 1). We developed an area-only target tracking (AOTT) method to achieve active tracking of instrumented individuals, which only uses the detection pings of acoustic tags. Moreover, time difference of arrival (TDOA) algorithms have been adapted and tested to study their accuracy in variable operational scenarios. Specific objectives were (i) TDOA algorithm performance comparison through Monte Carlo (MC) simulations; (ii) new AOTT algorithm capabilities presentation, where both simulations and field tests have been conducted; and (iii) the results of a 3-month campaign using static receivers (i.e., mooring lines with acoustic receivers) and underwater vehicles (an ROV and an AUV), where both TDOA and AOTT algorithms have been used. The tracking potential of this combined mobile and moored technology was tested in Roses continental slope (Fig. 1E), in a deep-sea, no-take fishing zone under restoration to show how a long-lasting acoustic-based deployment can provide previously unidentified behavioral data that can inform the establishment and spatial extent of conservation areas.

The performance of the different TDOA algorithms tested is presented in Fig. 2, where a set of MC simulations has been conducted. We used four receivers to localize a target on a two-dimensional (2D) scenario, because the depth of the targets under study was known and constant (the species is benthic, with no swimming capability). These simulations are important to demonstrate the capabilities to track benthic tagged animals and to set the appropriate configuration (e.g., number of receivers, receivers’ positions, or acoustic tag transmission period).

The Cramér-Rao bound (CRB) representation is presented in Fig. 2A, where four receivers with 200 m of baseline distance and a time error of 1 ms have been used (for other array configurations, see fig. S1). The area inside of the receivers’ array showed the lowest expected measurement SD error (<1 m), whereas the error increased up to 7 m at 250 m off the receivers’ array center.

To compare the algorithms’ performance, a predefined target trajectory has been designed (Fig. 2B and movie S1), according to which the target moves at 1 ms^{−1} among four receivers with a transmission period of 60 s. The root mean square error (RMSE) over the time is shown in Fig. 2C, where all the algorithms were iterated 100 times with a Gaussian error with 1-ms SD (fig. S2 shows the RMSE evolution with other errors). This result showed that the error is lower inside the receivers’ array, especially for the maximum likelihood (ML) estimation algorithm. That latter registered the greatest error. Because of numerical singularities around the receivers, the ML estimation failed to find the minimum of the cost function, and, instead, it reached the local minimum nearby the receiver position. This problem was reduced by choosing a different initial estimation (i.e., closer to the real position), as explored with the weighted least squares ML (WLS-ML) algorithm, where the WLS method is used to initialize the ML estimation algorithm.

The algorithms’ RMSE over the 100 MC iterations with different noise added in the time of arrival (TOA) measurements is presented in Fig. 2D. We simulated the algorithms’ performance with noise SD (σ) equal to 0.5, 1.0, and 1.5 ms. Moreover, additional tests with a Gaussian TOA error with σ = 1.5 ms plus 5% of outliers were simulated to observe the algorithms’ behavior when facing strong multipath scenarios. These simulations showed that the particle filter (PF) had the best performance under different noise conditions; however, it had more difficulties handling scenarios with outlier measurements, whereas the WLS excelled. In addition, the use of the WLS-ML combination slightly improved the algorithm’s performance with different noise configurations. Nonetheless, this benefit was not observed in scenarios with outliers.

Last, the average runtime required to compute one target position is shown in Fig. 2E, where the fastest algorithm was the WLS with 5 ms per iteration. In contrast, the PF required 977 ms, which means an increase of more than two orders of magnitude in the computational resources.

A set of simulations were conducted to observe the optimal parameters for the AOTT algorithm (Fig. 3A) and the functions to weight the PF’s particles (Fig. 3B). For example, the results showed a relationship between the tracker circumference radius (TCR) and the maximum transmission range (MTR), where the greatest ratio, Γ_{radius} = TCR/MTR, was equal to 0.8 (Fig. 3C). This means that the tracker had to conduct circumference maneuvers over the target estimation position with a radius less than the MTR but closer to it. Nonetheless, it is difficult to know a priori the MTR achievable by an acoustic tag, which can be affected by different factors such as the sea state or the acoustic noise. Therefore, different in situ tests should be conducted to estimate its value. In our case, those tests pinpointed a maximum range less than 400 m with only a 20% of successful receptions (SRs) (fig. S3). In addition, the MTR is pivotal to spread the PF’s particles, and therefore, different relationships between MTR and the maximum particles range (MPR) were studied, which allowed us to identify the relation between the ratio Γ_{range} = MPR/MTR and the AOTT’s performance (see Fig. 3C). Moreover, the behavior of missing some of the tag’s transmissions could also be observed, where the SR over the total transmissions ratio defined by Γ_{reception} = SR/TT is presented. Last, random particles were spread around the latest estimated target position (Compound resampling method), which helped to increase the particles diversity and emphasized the latest time that the tag was detected, which yielded to an increase in tracking performance (see Fig. 3C).

The AOTT’s performance can be observed in Fig. 3D (and movie S2), where all the recommendations derived from the previous MC simulations presented above were used, which showed an error of ~100 m. After these simulations, a field test was conducted on 27 to 28 June 2018 (Fig. 3E) using the Monterey Bay Aquarium Research Institute (MBARI) coastal profile float (CPF) as a target (Fig. 3F) and a Wave Glider as a tracker (Fig. 3G) in Monterey Bay area (CA, USA) (Fig. 3H). This test lasted more than 15 hours, where the CPF conducted three immersions at 60-m depth.

The results of the four-step process to adjust the receiver clocks’ drift and offset are shown in Fig. 4 (A to F), where a resolution greater than 2 ms was obtained. Moreover, a small number of outliers were detected during the postprocessing (e.g., Fig. 4E), which had a random nature due to the homogeneous bathymetry of the experiment zone (i.e., a quasi-flat slope ground). In addition, the deployment position of the mooring lines using the oceanographic vessel’s GPS, the ROV’s USBL, and the positions computed using the acoustic receivers are presented in Fig. 4G. Here, a great difference between the GPS’s and ROV’s positions could be observed, which pinpointed the necessity of using underwater vehicles to know the final positions of the receivers (Table 1).

View this table:

After determining the receiver localizations and calibrating their clock offsets, the tagged Norway lobster positions could be tracked using the TDOA algorithms (see the next section). The trajectory showed by each animal can be observed in Fig. 5A (and movie S3), where the localization of the synchronization acoustic tags attached on the mooring lines, and the acoustic tags attached on the 33 individuals, is shown. After the canister release in the center of the receivers’ array, the individuals show a dynamic dispersion and occupation of the monitored area. Furthermore, the cumulative distance of each individual was plotted in Fig. 5B, where we could appreciate how some animal went outside of the receivers’ reception range and therefore were not detectable from that moment on.

By the use of the two underwater robots (an ROV and an AUV), we could track the presence of some of those area-evading animals. The ROV conducted different lawn pattern movements on the southeast of the area, covering 10 km^{2}, and the AUV conducted a circumference path on the west (see Fig. 5C) with a radius equal to 150 m. During these tests, four tags were localized, and moreover, different images could be obtained, which will be used to study the seabed recovery in the protected areas (Fig. 5D and movie S4).

The algorithms studied to track acoustic tags using the TDOA information could be compared together during the entire Norway lobster tracking experiment. Because the “true” position of the tagged lobsters was unknown, the synchronization tags attached on the mooring lines were used. In this case, the tags were not moving but static. Figure 6 (A to E) shows their estimated position and the error covariance matrix, which are represented as error bars in Fig. 6F (and summarized in table S1). For example, the accuracy obtained to localize the lobster canister synchronization tag [i.e., base station (BS) D], which was placed in the receiver array center, was similar among all the algorithms (error, <1 m). However, the PF had the poorest performance when it came to localize the synchronization tags attached on the mooring lines. This low performance was due to the nature of the PF’s particles distribution near the receivers in a TDOA topology (i.e., eccentricity of the hyperbola close to 1). Moreover, we found that both PF and WLS methods showed higher errors in positioning mooring Vemco acoustic receivers indicated as BS(D) and BS(E), which had a different configuration (smaller dead weights and VR2AR-69k receivers). Taking into consideration the simulations conducted and the run time required for each method, the WLS-ML offered the best reliability.

Last, whereas the error of AOTT (order of tens of meters) is greater than the error that can be obtained with TDOA algorithms (order of few meters), the AOTT method overperforms these techniques due to the use of a single moving receiver on a mobile vehicle. This strategy markedly reduced infrastructure requirements.

In our study, we acoustically track tagged deep-sea benthic species, combining the information provided by underwater vehicles and anchored receivers, with meter spatial resolution. Here, the challenges of accurately positioning the receivers, adjusting the clocks’ drift, and algorithms’ performance have been analyzed, observing that they are the primary cause of tracking success for an EU-relevant fishery resource such as the Norway lobster in a no-take zone. Thus, we set the basis and procedures that should be followed to obtain the best accuracy possible in similar operative deep-sea scenarios. To achieve such performance, the use of underwater mobile robotic platforms has been crucial, which can substantially boost traditional tracking methods [e.g., (*15*, *16*)] and extend target tracking beyond the limits of current LBL systems. In doing so, we have worked with two methods for target localization, which have been used in combination to extend their capabilities: (i) through static receivers anchored on the seabed and using TDOA algorithms, where 1-m resolution can be achieved, and (ii) using a single receiver installed on an underwater vehicle for dynamic tracking using the AOTT algorithm, which is capable of localizing and tracking acoustic tags only by ping detections.

Many efforts to study deep-sea species using acoustic target tracking systems have been conducted, and a complete survey of design settings, detection algorithms, and used platforms is presented in Table 2. In this scenario, the strength of our contribution lies in the fact that we have faced the problem from a technological, operational, and scientific point of view, covering different areas of study and shedding light on the difficulties and solutions we encountered. Our results are an important step forward for prolonged, in situ monitoring of deep-sea benthic animal behavior at meter spatial scales. We believe that tracking deep-sea species using both acoustic mooring receivers and underwater robotized platforms, which, within the next years, could be an important component in fishery resources management, may inspire future scientific discoveries (*34*).

View this table:

This hyperbolic scheme is the method used when the acoustic target to be localized is not synchronized with the receivers or no bidirectional communications capability is available (*35*). In these cases, the slant range between target and receiver cannot be computed, and therefore, the triangulation methods for target localization based on range are not feasible. In a one-way communication scenario, the main problem to compute the TOF is knowing the initial transmission time *t*_{0}. The TDOA was designed to avoid this inconvenience (*36*), where, by using two synchronized receivers, the unknown *t*_{0} can be eliminated. In (*37*), the authors studied a method that also estimated the *t*_{0}; however, this method has its limitations when the acoustic tag does not transmit in a specific and fixed period. Moreover, in (*35*), the authors studied analytically and through simulations different TDOA target localization algorithms and found that it is not necessary to use the full set of TDOA measurements. In general, a set of *L* well-localized receivers are used, where there are *m* = *L*(*L* − 1)/2 distinct TDOA measurements from all possible sensor pairs, which is known as the full TDOA set (*35*). With only a subset, one can achieve the same performance but not when the target is outside of the center of the receivers’ array (see fig. S1). Moreover, we could observe that the WLS had the best performance, being also the fastest method. The target tracking experiments, in general, use a set of receivers anchored on the seabed [e.g., (*38*, *39*)]. These receivers can operate for months continuously recording information of the tagged animals, and therefore, the number of measurements and consequently the number of computations required to track each animal can be notable. Thus, the runtime required is important to obtain the trustable tracking data.

The area-based tracking method is used when the information to estimate the tag position is only the ping received by one receiver. Two sets of simulations with different reception ratios (Γ_{reception}) were conducted, using ratios equal to 100 and 60%. Before and after the target right turn (at 67 min from the beginning of the simulation), the error was ~50 m using the ideal reception ratio and ~100 m using the 60% ratio. In this last situation, the AOTT had more problems finding and tracking the real target position, which lost the target position about ~2% of the iterations. Despite that, the tracker in general did not lose the target’s position, and therefore, the great capabilities of the AOTT method were demonstrated in relation to previous efforts. For example, in (*23*), the authors used two hydrophones and bearing-only methods to track a tagged animal, resulting in critical consequences for the vehicle’s performance due to the payload’s size and drag effects of these hydrophones, with a reported error greater than 40 m. To increase the accuracy, the same authors presented a custom tag design (*32*), integrating an inertial measurement unit, which was used to adjust the velocity and attitude of the species during an offline postprocessing. However, this approach augments considerably the tag’s size and, therefore, is not suitable for smaller tags. The tag’s size is also an important constraint in (*20*). In (*14*), the authors developed a method that used the signal strength to infer the tag’s position; nonetheless, its performance and observability studies were not reported. Last, in (*17*), a synthetic LBL was presented, where a constant, precise tag burst rate and a high-resolution tag detection timestamp on the receiver were both necessary for estimating tag positions, which are not always possible.

From the AOTT’s initial field test error, we could pinpoint three elements: (i) The algorithm was notably stable, where the target was almost always localized. (ii) During the first CPF’s immersion, the error was lower than 100 m and then increased up to ~100 m. If we compare this performance with the simulations conducted previously, and if we take into consideration that the Wave Glider’s path was not optimal, the error’s values were inside the expected boundaries. (iii) When the CPF was at the surface (i.e., at 5 hours), the error obtained was greater, probably due to poor tag reception.

We efficiently detected 33 tagged lobsters during several months with high precision (i.e., less than 2 m) using the WLS-ML algorithm (see Table 1). Once the tagged animals were localized, their pattern of displacement could be inferred, e.g., using the joint estimation over multiple individuals method (*40*).

Nevertheless, the reception of the tags using the underwater vehicles was operationally complex. During the cruise conducted 5 months after the release of tagged individuals, different dives were conducted with both the AUV and the ROV. On the basis of these dives, we were able to detect four Norway lobsters. In addition to the possibility that most of the tagged lobsters were lost or disappeared from the study area, the small number of detections could be caused by (i) acoustic interferences caused by the thrusters or the equipment installed on the vehicles (e.g., the USBL or the doppler velocity log) or (ii) due to lobster’s diel burrow emergence patterns (*41*), because the acoustic signal could suffer strong attenuation while the individual is inside its burrow (*42*). For example, in (*31*), the authors used a Wave Glider to track snow crabs (*Chionoecetes opilio*), which is powered by sea waves and therefore does not use thrusters. Moreover, it does not use any acoustic positioning systems but GPS, because it stays permanently on the sea surface. Both aspects help to reduce the noise and interferences with the tag’s signal. This was also studied during the AOTT field test, where the reception ratio was greater. One of the main constraints for benthic deep-sea tag tracking is the maximum distance that an acoustic tag signal can be detected (e.g., less than 300 m for smaller devices), and therefore, the use of surface vehicles as Wave Gliders is not possible. One solution could be the use of an AUV with “silent” mode capabilities (i.e., dynamic buoyancy control) such as (*43*, *44*) or tethered the receiver at a sufficient distance.

Tracking procedures were conducted during 2019 in a no-take fishing zone, established at 380- to 400-m depth in Roses continental slope, northwestern Mediterranean Sea (42° 00.8006′ N and 03° 31.9723′ E; Fig. 1E). During an oceanographic cruise onboard of the R/V García del Cid, on 6 June 2019, we deployed four mooring lines with Vemco receivers (Vemco, Canada): two equipped with VR2W-69k receivers and V7-69k synchronization tags; two equipped with VR2AR-69k acoustic release receivers. In the middle of these four mooring lines, we simultaneously released 33 Norway lobsters, each dorsally glued (i.e., cyanoacrylate) to a Vemco V7-69k tag, by using release canisters [an adaptation of (*42*)]. All lobsters were captured in the study area with creels during the previous days before their release. The mooring lines with the receivers were recovered on 23 September 2019 during a second oceanographic cruise onboard the R/V García del Cid.

In addition to the four mooring receivers and also to detect the tagged lobsters, during a third oceanographic cruise onboard the R/V Sarmiento de Gamboa in October 2019, we deployed two underwater vehicles in the same field site: an AUV (Girona 500 AUV, IQUA Robotics, Spain) and an ROV (Super Mohawk II, Forum Energy Technologies, Houston, TX, USA), both equipped with VR2W receivers.

Complementarily, some of these materials and procedures were tested on different preliminary operational calibration trials: (i) conducted at OBSEA observatory (www.obsea.es) deployed at 20-m depth and 4 km east off central Catalan coast, Barcelona (Mediterranean Sea), one of the three EMSO testing sites (fig. S4) (*45*, *46*); and (ii) at Monterey Bay, California (USA), using the installations of MBARI.

Four receivers created an acoustic LBL localization system, where each one was in self-recording mode and was not accessed in real time. The tags transmitted periodically an acoustic and individualized ping with a unidirectional communication protocol, which was recorded by the receivers. The tags were programmed to send this ping every 60 s (plus a random value up to 30 s to avoid multiple tags consistently overlapping in time). Each tag transmits its own identifier using a pulse position modulation with a carrier signal frequency of 69 kHz. The Vemco V7 tag has a typical working range of ~250 m, and therefore, the receivers’ baseline was set to 200 m.

In addition, the V7 synchronization tags from Vemco were used to correct the receivers’ clock drift and to adjust the final receiver array position using a four-step process described below. These synchronization tags were attached on each mooring (1 m above the receivers) and to the lobster canister. During the experiment, both the ROV and the AUV positions were known using the R/V’s USBL. In addition, the AUV had its own dead reckoning system for autonomous navigation. The final position of the receivers could be computed using the information provided by the ROV’s USBL, which was more exact than the deployment position obtained on the surface with the GPS of the R/V due to the drift during the 400-m dive. The ROV was piloted above the moorings, and its position was used as a true position of two of them. Then, knowing the TOF among the other lines and the lobster canister through the synchronization tags and the receivers, their relative positions could be determined by simple trigonometry functions and rotation matrices.

Target localization using TDOA is a well-known problem that has been addressed on both terrestrial and underwater environments during the last decades. The TDOA has been usually used when no synchronization between transmitters and receivers can be enforced, and even more, if transmitters’ ping time is irregular (e.g., using Vemco devices). In both cases, the TOF cannot be measured or estimated, and consequently, the TDOA between different pairs of receivers is used.

In general, TDOA algorithms can be divided in two groups, the ML and LS methods (*47*). Using *n* + 1 receivers (where *n* ∈ {2,3} is the space dimensionality of the problem), a set of hyperbolic equations can be obtained to find the coordinates of the target. The TDOA measurement between two receivers *i*, *j* ∈ {0, …, *m*} and *i* ≠ *j*, *c* is the sound velocity in water, and *t*_{0} is the target transmission time. Assuming a zero-mean white Gaussian error noise distribution of the TDOA measurements, i.e., ^{2}, the unknown parameter * ^{ij}*(

A common practice in ML estimation is to work with the log-likelihood function. Because the logarithm is a strictly increasing function, and

In general, there is no closed-form solution to the previous optimization problem. The cost function is relatively complex, nonlinear, and even not differentiable at some points because of the square roots that define the TDOA measurements.

A standard approach for its optimization is to use Newton-Raphson iterative minimization (*48*). To implement gradient and Newton descent algorithms to minimize the cost function, it is necessary to have expressions for its gradient ∇*f*(**q**) and Hessian ∇^{2}*f*(**q**), which are the vector of its first partial derivatives and matrix of its second partial derivatives, respectively. This can be done by resorting to Matrix Differential Calculus; see (*11*, *49*) and the references therein.

Nonlinear estimation problems are also often addressed using linearized estimators, e.g., the extended Kalman filter (*50*). However, a linearization-based filtering approach marginalizes all but the current state and is hence unable to refine past linearization points. In contrast, a batch maximum a posteriori (MAP) estimator computes the estimates for the states at all time steps using all available measurements (*51*). The difference between MAP and ML estimation lies in the assumption of an appropriate prior distribution of the parameters to be estimated (*52*). The MAP estimator uses all available information to estimate the entire target’s trajectory, which is represented by stacking all states in the time interval [0,*k*] as**w**_{k − 1} is zero-mean white Gaussian noise with covariance **Q**, and the state transmission matrix, Φ_{k−1}, is given by

Then, the MAP estimator seeks to determine the entire state-space trajectory that maximizes the following posterior probability density function*k*]. Using the same procedure as in Eq. 3, the cost function is given by

Last, the solution can also be computed using Newton-Raphson iterative minimization methods; see (*51*) and references therein. However, this solution heavily depends on the quality of the initial estimate, especially if multimodal probability density functions are involved (i.e., the solution may lie on local minimum instead of the true target position).

To estimate multimodal distributions, one of the most used methods is the PF (*53*, *54*). The PF solves in a nonparametric way the probability distribution problem using a set of particles, *55*, *56*).

Another method to solve the likelihood function Eq. 3 is using a closed-form LS solution. A widely used closed-form method was developed by Chan and Ho (*36*). They gave an alternative solution for hyperbolic position fix by using an approximation of the ML estimation when the TDOA estimation errors are small. The original set of TDOA equations is transformed into another set **q**, and adding an extra variable *r*_{0}, which is the range between the target and the reference sensor. Then, the algorithm uses a two-step WLS method to estimate the target position, which is given by**B*** _{a}* = ||

Further, different authors have improved this technique; for example, in (*57*), the WLS includes a vertical plane constraint and a cone tangent plane constraint. These two constraints are derived from the initial value and updated again after each iteration.

Last, in (*37*), the authors developed the yet another positioning solver (YAPS) method, where they used the TOA instead of the TDOA to estimate the target position. Because of that, they had to also estimate the target transmission time *t _{k}*. The modeling follows the state space paradigm, which uses the process and observation models as in MAP estimation method. They used a stochastic processes to describe the state propagation as a random walk with different degrees of SD for both transmission time

Here, all these algorithms have been compared with the CRB (*58*), which sets the lowest bound on the performance of unbiased estimators that use observations according to a certain probability density function. This bound is one of the most widely used (*59*–*61*), which, for a TDOA target localization problem, is given by**I** denotes the Fisher information matrix defined as*f*(**q**) is the gradient of the log-likelihood function with respect to the unknown parameters, which has been used to compute the target position using the ML estimation. Taking the trace of **I**, we obtain a new inequality, which sets a fundamental lower bound on the mean square error of any unbiased estimator, given by

Different simulations have been conducted to characterize the TDOA target localization algorithms explained above under different parameters and scenarios. These simulations have been carried out using the MC simulation method. For all the simulations, the RMSE has been computed using the median, and the 5th and 95th percentiles, over 100 iterations, where different TDOA Gaussian noise has been added using σ = 0.5, 1, and 1.5 ms. The parameters of the scenario simulated used were (i) tag transmission delay = 120 s, (ii) target velocity = 0.2 m/s, and (iii) number of particles (for the PF algorithm) = 6000 particles. Algorithms’ run time has been obtained using a processor Intel Core i7-4760HQ CPU at 2.10 GHz with 8 GB of random-access memory.

Four receivers have been used in this study, where each one has an internal clock that is not synchronized periodically. Consequently, during the campaign, they suffered from drift and misalignment. This behavior introduces an error that must be fixed for two reasons: (i) to be able to associate independent receptions at separate receivers, corresponding to the same target and emission time, and (ii) to compute the TDOA accurately. The TDOA between two receivers (considering their clocks’ drift) can be modeled as*C _{ik}* is the clock’s misalignment of receiver

The first step was used to adjust the main drift, which is necessary to associate independent receptions at separate receivers. If the clock’s drift is greater than the acoustic tag transmission interval time, it is not possible to associate the receptions of an acoustic tag transmission at different independent receivers (in long field studies, i.e., more than 1 month, the drift can reach more than 30 s). Thus, only the initial points can be used. Then, the different receptions can be associated, and a polynomial fitted curve can be used to eliminate the main clocks’ drift for the entire data. In addition, the whole data were segmented into small portions (e.g., by weeks), and a second polynomial fitted curve was used for a fine tune. With this procedure, the drift was adjusted (i.e., the slope of the *C _{ij}*, aka

Once the internal clock drift was adjusted, the position of each receiver **b*** _{i}* could be computed. First, the distance among each receiver

The AOTT method uses a single moving receiver, and therefore, no TDOA information is present. In its place, the tag’s position is estimated by using the ping detection/no-detection information provided by a receiver. However, the detection of a tag’s transmission is complex due to acoustic noise from platforms’ thrusters, multipath, or distance between the tag and the receiver. Consequently, the AOTT algorithm attributes such as the reception ratio or MTR have been studied through simulations and field tests before the Norway lobsters’ field survey.

Given the acoustic receiver and transmitter tag used for this work, the only information that can be determined is the presence or absence of acoustic tag transmissions in the area of the receiver, without information about the tag’s direction or range of detection. The AOTT method infers the target position by taking the area determined by the maximum reception range as the only filter input (*62*). Two types of areas can be defined: one where the tag is detected and one where the tag is not detected. The estimation of the target’s localization can then be computed by overlapping all of these areas, where the zone with a main coincidence is where the target should be, thereby representing its probability distribution.

The AOTT was implemented by using a PF algorithm, where a set of particles *63*) with the recursion of the prediction step

The main difference between the range-only (*19*) and AOTT algorithm based on PF is how the particles are weighted. In a range-only method, the likelihood ratio based on the measurement probability function is described as*n* ∈ {0, …, *N*} indicates the particle number up to *N*.

Whereas in the area-only method, the measurement probability function is based on the distance that each particle has between each other and the observer, where the particles that are inside the area defined by the maximum range that an acoustic tag can be detected will be more weighted than the particles that are outside of this area. Moreover, if an acoustic tag detection is missed, the particles inside the area will be less weighted than the particles that are outside. This behavior can be represented using the cumulative distribution function (CDF) (*64*) and its complementary survival function (SF) [known also as *Q*-function (*65*)], which are given by*r* is the distance between each particle and the observer, μ is the maximum range that an acoustic tag can be detected, and *66*), a Compound resampling method can improve the target accuracy. The main idea of the Compound method is to spread a certain number of particles randomly. In this case, the random particles are spread around the latest estimated target position, which helps to increase the particles diversity and to emphasize the latest time that the tag was detected.

The idea of observability in target tracking using a single vehicle is of primary importance (*67*–*69*), which is related to the local weak observability properties for a specific nonlinear system. The observability problem is concerned with determining conditions under which a knowledge of the input-output data uniquely determines the state of the system (*70*), e.g., the optimal path that should be conducted by the vehicle to maximize the accuracy of the estimated target position (*71*–*73*). These studies pinpointed two basic rules to follow: (i) All measurements must be performed uniformly distributed on a circumference centered over the target, and (ii) the circumference’s radius must be greater than the target depth and, in some cases, as large as possible. Using these two ideas, we conducted different simulations to characterize the AOTT algorithm under different parameters and scenarios, which were used to optimize the algorithm’s parameters and tracker’s path. These simulations had been carried out using the MC simulation method. For all the simulations, the mean and the average result after 30 iterations are presented. The other parameters, which are not involved in the current simulation, had been considered ideal. Two different scenarios had been developed for each case: (i) localizing a static target and (ii) tracking a moving target with a velocity that is equal to 0.2 m/s. The weight’s distribution used in the area-only method was computed using σ* _{W}* = 4.5 m for the SF and

A second set of simulations was carried out to observe the AOTT’s performance using all the results derived from the previous section. In this case, the target was moving at 0.2 m/s and performed a 90° right turn after 67 min; the rest of the parameters were (i) tag transmission delay = 60 s, (ii) maximum tag transmission range = 250 m, (iii) tracker radius = 200 m, (iv) tracker velocity = 1 m/s, (v) number of particles = 10,000, (vi) resampling method = Compound with ratio 1.5%, (vii) MPR = 300 m, and (viii) number of iterations = 50.

Experimental field testing was conducted on 27 to 28 June 2018 using a Wave Glider (Liquid Robotics, USA) as a tracker and the MBARI’s CPF (*74*) as a target. The Wave Glider was equipped with a Vemco receiver (VR2C-69 kHz, Vemco, Canada), and two Vemco acoustic tags (V7P-69k, Vemco, Canada) were installed to the CPF. In addition, the CPF was equipped with a Benthos acoustic modem (ATM-900, Teledyne Marine – Benthos, USA) and the Wave Glider with a Benthos DAT (direction acoustic transponder) modem (DAT, Teledyne Marine–Benthos, USA), which is a type of USBL. This test lasted more than 15 hours, where the CPF conducted three immersions at ~60-m depth. During the tests, the Wave Glider carried out different circumferences around the area (manually piloted), which were used in two purposes: (i) to perform an acoustic tag detection ratio versus range test, finding the maximum range where the tags could be detected, and (ii) to compare the accuracy of the USBL, the range-only target tracking, and the AOTT methods.

robotics.sciencemag.org/cgi/content/full/5/48/eabc3701/DC1

Fig. S1. Algorithms’ performance versus receivers’ position.

Fig. S2. TDOA algorithms performance over the time.

Fig. S3. Reception ratio.

Fig. S4. Fieldwork methods evaluation at the OBSEA platform.

Table S1. Algorithms’ performance during the Norway lobster experiment.

Movie S1. Simulation of target tracking using TDOA.

Movie S2. Simulation of target tracking using AOTT.

Movie S3. Norway lobster movements.

Movie S4. Seabed images.

Autonomous sample return and high-resolution depth control enable global-scale biochemical mapping of marine genomics and proteomics.

Vast and diverse microbial communities exist within the ocean. To better understand the global influence of these microorganisms on Earth’s climate, we developed a robot capable of sampling dissolved and particulate seawater biochemistry across ocean basins while still capturing the fine-scale biogeochemical processes therein. Carbon and other nutrients are acquired and released by marine microorganisms as they build and break down organic matter. The scale of the ocean makes these processes globally relevant and, at the same time, challenging to fully characterize. Microbial community composition and ocean biochemistry vary across multiple physical scales up to that of the ocean basins. Other autonomous underwater vehicles are optimized for moving continuously and, primarily, horizontally through the ocean. In contrast, *Clio*, the robot that we describe, is designed to efficiently and precisely move vertically through the ocean, drift laterally in a Lagrangian manner to better observe water masses, and integrate with research vessel operations to map large horizontal scales to a depth of 6000 meters. We present results that show how *Clio* conducts high-resolution sensor surveys and sample return missions, including a mapping of 1144 kilometers of the Sargasso Sea to a depth of 1000 meters. We further show how the samples obtain filtered biomass from seawater that enable genomic and proteomic measurements not possible through in situ sensing. These results demonstrate a robotic oceanography approach for global-scale surveys of ocean biochemistry.

Warm-starting grasp-optimized motion planning with deep learning reduces computation time by two orders of magnitude.

Robots for picking in e-commerce warehouses require rapid computing of efficient and smooth robot arm motions between varying configurations. Recent results integrate grasp analysis with arm motion planning to compute optimal smooth arm motions; however, computation times on the order of tens of seconds dominate motion times. Recent advances in deep learning allow neural networks to quickly compute these motions; however, they lack the precision required to produce kinematically and dynamically feasible motions. While infeasible, the network-computed motions approximate the optimized results. The proposed method warm starts the optimization process by using the approximate motions as a starting point from which the optimizing motion planner refines to an optimized and feasible motion with few iterations. In experiments, the proposed deep learning–based warm-started optimizing motion planner reduces compute and motion time when compared to a sampling-based asymptotically optimal motion planner and an optimizing motion planner. When applied to grasp-optimized motion planning, the results suggest that deep learning can reduce the computation time by two orders of magnitude (300×), from 29 s to 80 ms, making it practical for e-commerce warehouse picking.

The Coronavirus Disease 2019 pandemic greatly increased demand for e-commerce and reduced the ability of warehouse workers to fill orders in close proximity, driving interest in robots for order fulfillment. However, despite recent advances in grasp planning [e.g., Mahler *et al.* (*1*)], the planning and executing of robot motion remain a bottleneck. To address this, in prior work, we introduced a Grasp-Optimized Motion Planner (GOMP) (*2*) that computes a time-optimized motion plan (see Fig. 1) subject to joint velocity and acceleration limits and allows for degrees of freedom in the pick-and-place frames (see Fig. 2). The motions that GOMP produces are fast and smooth; however, by not taking into account the motion’s jerk (change in acceleration), the robot arm will often rapidly accelerate at the beginning of each motion and rapidly decelerate at the end. In the context of continuous pick-and-place operations in a warehouse, these high-jerk motions could result in wear on the robot’s motors and reduce the overall service life of a robot. In this paper, we introduce jerk limits and find that the resulting sequential quadratic program (SQP) and its underlying quadratic program (QP) require computation on the order of tens of seconds, which is not practical for speeding up the overall pick-and-place pipeline. We then present DJ (Deep-learning Jerk-limited)–GOMP, which uses a deep neural network to learn trajectories that warm start computation, yielding a reduction in computation times from 29 s to 80 ms, making it practical for industrial use.

For a given workcell environment, DJ-GOMP speeds up motion planning for a robot and a repeated task through a three-phase process. The first phase randomly samples tasks from the distribution of tasks the robot is likely to encounter and generates a time- and jerk-minimized motion plan using an SQP. The second phase trains a deep neural network using the data from the first phase to compute time-optimized motion plans for a given task specification (Fig. 3). The third phase, used in pick-and-place, uses the deep network from the second phase to generate a motion plan to warm start the SQP from the first phase. By warm starting the SQP from the deep network’s output, DJ-GOMP ensures that the motion plan meets the constraints of the robot (something the network cannot guarantee) and greatly accelerates the convergence rate of the SQP (something the SQP cannot do without a good initial approximation).

This paper describes algorithms and training process of DJ-GOMP. In Results, we perform experiments on a physical Universal Robotics UR5 manipulator arm, verifying that the trajectories GOMP generates are executable on a physical robot and result in fast and smooth motion. This paper provides the following contributions: (i) J-GOMP, an extension of GOMP that computes time-optimized jerk-limited motions for pick-and-place operations; (ii) DJ-GOMP, an extension of J-GOMP that uses deep learning of time-optimized motion plans that empirically speeds up the computation time of the J-GOMP optimization by two orders of magnitude (300×); (iii) comparison to optimally time-parameterized Probabilistic Road Maps “Star” (PRM*) and TrajOpt motion planners in compute and motion time suggesting that DJ-GOMP computes fast motions quickly; and (iv) experiments in simulation and on a physical UR5 robot suggesting that DJ-GOMP can be practical for reducing jerk to acceptable limits.

We consider the problem of automating grasping and placing motions of a manipulator arm while avoiding obstacles and minimizing jerk and time. Minimizing motion time requires incorporating the robot’s velocity and acceleration limits. We cast this as an optimization problem with nonconvex constraints and compute an approximation using an SQP.

To plan a robot’s motion, we compute a trajectory τ as a sequence of configurations (**q**_{0}, **q**_{1}, …, **q*** _{n}*), in which each configuration

The motion starts with the robot’s end effector at a grasp frame **g**_{0} ∈ *SE*(3) and ends at a place frame **g*** _{H}* ∈

To be dynamically feasible, trajectories must also remain within the velocity, acceleration, and jerk limits (**v**_{max}, **a**_{max}, and **j**_{max}) of the robot.

Treating

We propose a multistep process for computing motion plans quickly. The underlying motion planner is based on an SQP proposed in GOMP (*2*), which is a time-optimizing extension of TrajOpt (*3*) that incorporates a depth map for obstacle avoidance, degrees of freedom at pick and place points, and robot dynamic limits. In GOMP and its extensions in this work, trajectories are discretized into a sequence of *H* + 1 waypoints separated by a fixed time interval *t*_{step}, where *t*_{step} is a tunable parameter, and *H* is the computed (time) horizon of the motion (borrowing the term from receding horizon control methods). In this work, we extend the SQP in GOMP to include jerk limits and minimization to create J-GOMP, a jerk-limited motion planner. J-GOMP produces jerk-limited motion plans but at a substantially increased compute time.

To address the slow computation, we train a deep neural network to approximate J-GOMP. Because the network approximates J-GOMP, we use J-GOMP to generate a training dataset consisting of trajectories for random pick and place points likely to be encountered at runtime (e.g., from location in a picking bin to a location in a placement bin). With GPU (graphics processing unit)–based acceleration, the network can compute approximate trajectories in milliseconds. However, the network cannot guarantee that the trajectories it generates will be kinematically or dynamically feasible or avoid obstacles.

To fix the trajectories generated by the network, we propose using the network’s trajectory to warm start the SQP from J-GOMP. The warm start allows the SQP to start from a trajectory much closer to the final solution and thus allows it to converge to an optimal solution quickly. Because the SQP enforces the pick, place, kinematic, dynamic, and obstacle constraints, the resulting trajectory is valid.

We tested DJ-GOMP on a physical UR5 robot (*4*) fitted with a Robotiq 2F-85 (*5*) parallel gripper. In the experiment setup (see Fig. 4), the robot must move objects from one fixed bin location to another. We set DJ-GOMP to be constrained according to the specified joint configuration and velocity limits of the UR5. We derived an acceleration limit based on the UR5’s documented torque and payload capacity, and we limited the jerk to a multiple of the computed acceleration limit. In practice, we surmise that an operator would define jerk limits by taking into account the desired service life of the robot.

To generate train/test data for the deep neural network, we use all 80 hardware threads of an NVIDIA DGX-1 to compute 100,000 optimized input and trajectory **x*** is the discretized trajectory. The J-GOMP optimizer is written in C++ and uses Operator Splitting solver for Quadratic Program (OSQP) (*6*) as the underlying QP solver. The inputs it generates consist of random pick (**t**_{0}) and place (**t*** _{H}*) translations drawn uniformly from the pick and place physical space. For each generated translation, we also generate a top-down rotation angle (θ

We trained the deep network with the Adadelta (*7*) optimizer for 50 epochs after initializing the weights using a He uniform initializer (*8*). The network architecture and optimization framework were written in Python using PyTorch. All training and deep network computations were accelerated by GPUs on NVIDIA DGX-1’s Tesla V100 SXM2 GPU and Intel Xeon E5-2698 v4 central processing units (CPUs).

To evaluate the ability of the deep-learning approach of DJ-GOMP to speed up motion planning, we computed 1000 random motion plans both without and with deep learning–based warm start and plot the results in Fig. 5. The median compute time without deep learning is 29.0 s. Using a network to estimate the optimal time horizon, but not the trajectory, can speed up computation significantly but at a cost of increased failure rate. Using the network to both predict the time horizon and the warm-start trajectory results in a median with deep learning of 80 ms; when compared to J-GOMP, this shows two orders of magnitude improvement, an approximate 300× speedup.

To evaluate the effect on the optimality of the computed trajectories, we compared the sum-of-squared jerks between trajectories generated with the full SQP versus those generated with a warm-started prediction with the optimal horizon. We observe that more than 99% of the test trajectories are within 10^{−3} of each other, which is an error value that is within the tolerance bounds we set for the QP optimizer. For a small fraction (less than 1%), we observe that the warm-started optimization and the full optimization find different local minima, without clear benefit to either optimization.

Because the optimality of the trajectory and the failure rate is dependent on accurately predicting the optimal time horizon of a trajectory, we separately evaluated this prediction. We observe that shorter values of the horizon lead inevitably to SQP failures, whereas longer values lead to suboptimal trajectories. Because failures are likely to be more problematic than slighty slower trajectories, we propose a simple heuristic to predict longer horizons. When the network predicts a horizon longer than the optimal, we observe that the optimization of trajectories with suboptimal horizon can be faster than that of the optimal horizon (shown in Fig. 5B). This is likely due to the suboptimal trajectory being less constrained and thus faster to converge. In practice, we propose that using a readily available multicore CPU to simultaneous compute multiple SQPs for different horizons around the estimated horizon would be a practical way to address the failures and suboptimal trajectories. However, if constrained to a single-core computation, using a longer horizon may also be practical because the compute time saved may be more than time saved by using the optimal horizon.

To evaluate the effect on failure rate, we recorded the number of failures with both cold-started and warm-started optimization with the optimal horizon (observing that predicting short horizon is the other source of failures). Cold-started optimizations fail 10.7%, whereas warm-started optimizations fail 5.7%. These failures occur because the optimizer cannot move the trajectory into a feasible region due to the tight constraints. In experiments, the failure rate went down with additional training data and longer network training, suggesting that further improvement is possible.

We compare compute time and motion time performance to PRM* (*9*, *10*) and TrajOpt (*3*). For PRM*, we precompute graphs of 10,000, 100,000, and 1,000,000 vertices over the workspace in front of the robot. Because PRM* is an asymptotically optimal motion planner, graphs with more vertices should produce shorter paths, at the expense of longer graph search time. For TrajOpt, we configure the optimization parameters to match that of DJ-GOMP, observing that this improves success rate over the default. Straight-line initialization in TrajOpt fails in this environment due to the bin wall between the start and end configurations; whereas DJ-GOMP’s specialized obstacle model moves the trajectory out of collision, TrajOpt’s obstacle model result in linearizations that do not push the trajectory out of collision. We thus initialize TrajOpt with a trajectory above the obstacles in the workspace. Because both PRM* and TrajOpt do not directly produce time-parameterized trajectories, we use Kunz *et al.*’s method (*11*) to compute time-optimal time parameterization. This time parameterization method first “rounds corners” by adding smooth rounded segments to connect the piecewise linear motion plan from PRM* before computing the optimal timing for each waypoint. Without the rounded corners, the robot would have to stop between each linear segment of the motion plan to avoid an instantaneous infinite acceleration. The radius of the corner rounding is tunable; however, rounding corners too much can result in a motion plan that collides with obstacles. This time parameterization also does not minimize or limit jerk and thus produces high jerk trajectories with peaks in the range 5 × 10^{5} to 8 × 10^{5} rad/s^{3} (Fig. 6A), meaning that they should have an advantage in motion time over jerk-limited motions (Fig. 7). As a final step, because 180° rotated parallel jaw grasps are equivalent, we compute trajectories for each pick and place combination and select the fastest motion. The results for 1000 pick-place pairs are shown in Fig. 6. We observe that PRM* has consistent fast compute times but produces the slowest trajectories. TrajOpt is slower to compute but produces faster trajectories than PRM*. DJ-GOMP, because it directly optimizes for a time-optimal path, produces the fast motions, whereas the deep-learning horizon prediction and warm start allow it to compute quickly despite complex constraints and result in the overall fastest combined compute and motion time.

To evaluate whether motion plans that DJ-GOMP generates work on a physical robot, we have a UR5 follow trajectories that DJ-GOMP generates. An example motion is shown in Fig. 4. The UR5 controller does not allow the robot to exceed joint limits and issues an automated emergency stop when it does. The trajectories that DJ-GOMP generates are constrained to the documented limits and thus do not cause the stop. However, we have observed that, without jerk limits, a high-jerk trajectory can cause the UR5 to overshoot its target and bounce back. With DJ-GOMP’s jerk-limited trajectories, the UR5 empirically does not overshoot.

Experiments suggest that warm starting the J-GOMP optimizing motion planner with an approximation from deep learning can speed up motion planning with J-GOMP by two orders of magnitude, over 300×, and compute time-optimized jerk-limited trajectories with an 80-ms median compute time. The time optimization has potential to reduce picks per hour, an important metric in warehouse packing operations, whereas the jerk limits can reduce wear on robots, leading to longer lifespans and reduced downtime.

The design and training of the deep network that approximates the trajectories of J-GOMP have an important impact on the performance of the overall motion planning system. Trajectories that are closer to the J-GOMP solution will take fewer optimizations iterations, whereas trajectories further from the solution will take more optimization iterations. In the method we propose, we use deep network to predict both the optimal time horizon of the trajectory and the full trajectory for any horizon. In ablation studies, we tried a policy-style network that predicts an action to take based on the current state and the goal state. By feeding each state back into the network, it computes a sequence of states that form a trajectory. This network produced less stable results and resulted in longer times to produce an optimization. Although an 80-ms median compute time may be sufficient for many applications, further improvement may be possible with different design.

The choice of loss function used in the training strongly influences the warm-start speed. Although a mean squared error (MSE) loss, because it measures the difference between training data and the network’s output, should be sufficient if reduced to 0, we propose using a loss that is a weighted combination of MSE and a loss that encourages the network to produce dynamically feasible motions. Because the dynamics loss is consistent with the generated trajectories, using it did not significantly affect the reported MSE test loss but did result in trajectories that were smoother. The resulting smoothed trajectories were closer to a consistent solution and resulted in the optimizer requiring fewer SQP iterations to complete.

Training the network also benefits from a J-GOMP implementation and dataset that approaches a continuous function. Experimentally, we found that discontinuities made training the network difficult. To encourage continuity, we took the following steps: (i) The optimization smoothly varies around obstacles by performing a continuous collision detection based on the spline between waypoints, (ii) the cold-started optimizations starts from a deterministic and smoothly varying interpolation, and (iii) using the optimal trajectories with suboptimal horizons in the training dataset. We also observe that for a given start-goal pair, there can be multiple minimum time trajectories due to discretization of time. By minimizing jerk as well, J-GOMP provides a consistent mechanism for selecting a trajectory to learn.

In continuous operation, a system will produce trajectories that can be used to train the deep network. When running the experiments, we found that more training data improved the predictions of the network. We hypothesize that we did not reach the limit of improvement, and continuous operation would provide a method by which additional training data can be generated. An additional benefit may come from such a feedback system. The initial training dataset that we propose is from a uniform random distribution over two volumes—the pick bin and place bin (Fig. 4). In practice, the distribution of trajectories is likely to be nonuniform, e.g., based on how objects form piles in each bin. Hence, the initial training distribution will likely be out of distribution with the system during operation, and other precomputation strategies (*12*) may produce a better initial results. By leveraging the data from repeated operation, the system should continue to gain data from which it can learn and thus produce better trajectories that will speed up the SQP computation.

We propose a system for speeding up motion planning and execution time and experimented on a UR5 robot in a pick-and-place operation. The kinematic design of this robot has favorable properties in this application and motion planning algorithm. The robot has two joints that can lift the end effector up from any configuration—with the depth map as the obstacle, this means that there will always be an obstacle-free trajectory, provided that there are a sufficient number of waypoints allocated to the trajectory to make the traversal. In addition, because of its 6-DOF (degrees of freedom) design, for any end-effector location, there exists eight analytic inverse kinematic solutions (*13*), allowing for rapid computation of multiple initial and final poses to seed the optimization process.

Application to robots with additional degrees of freedom would not only result in more inverse kinematic solutions but also allow the robot to have more options (in the form of different configurations) to avoid obstacles. In these cases, changes in the initial trajectory seeded to the optimization can result in the robot converging on a different homotopic path. For example, with a different obstacle environment, one seed might lead to an arm going above an obstacle, whereas a different seed would lead an arm going to the side of an obstacle. We hypothesize that this could be addressed in the proposed system by having a consistent solution to seeding a trajectory—one that results in a smooth function for the deep network to approximate.

Applications to other environments would require an additional data generation and training step specific to the new condition. In the experiments, we generated training and test datasets from the same distribution. If the test dataset were to come from a different (or held out) distribution, then the resulting covariate shift would decrease performance. In practice, however, we would generate training data from the new distribution.

The deep learning–based warm start of the optimization used by DJ-GOMP may also help speed up other optimizing motion planners such as TrajOpt (*3*), Covariant Hamiltonian Optimization for Motion Planning (CHOMP) (*14*), Stochastic Trajectory Optimization for Motion Planning (STOMP) (*15*), and Incremental Trajectory Optimization for Motion Planning (ITOMP) (*16*), ones based on interior-point optimization (*17*) and gradients (*18*). Many of these planners already compute solutions quickly, although with increased constraints, more complex obstacle environments, or additional way points in the descretization, they may slow down to the point where they become impractical to use without something like the deep learning–based warm start proposed in DJ-GOMP.

In this paper, we explore speeding up the computation of jerk-limited motions for the pick-and-place task from GOMP in which both pick and place frames have an additional degree of freedom. The degree of freedom comes from the four degree–of–freedom representation commonly used by grasp analysis approaches such as Dexterity Network (Dex-Net) (*1*, *19*–*21*), Grasp Quality Convolutional Neural Network (GG-CNN) (*22*), Grasp Pose Detection (GPD) (*23*), or Fully Convolutional GQ-CNN (FC-GQ-CNN) (*24*). These data-driven methods often represent grasps using a center axis (*1*) or rectangle formulation (*25*) in the image plane (e.g., from a depth camera), which results in 4 DOF (a three-dimensional translation, plus a rotation about the camera *z* axis). Although we use FC-GQ-CNN (*24*) in experiments, we propose that many grasp analysis algorithms could be incorporated into the computation and learning process. However, on the basis of the grasp analysis software and gripper, modifications to the network design may be necessary. For example, recent work exploring additional degrees of freedom for grasps (*26*–*29*) and showing that top-down grasps leave out many high quality grasps on many objects (*30*) may require an alternate formulation of the input to the network used for predicting the warm-start trajectory.

In future work, DJ-GOMP could be integrated with a grasp planner to optimize among multiple grasp configurations. Whether the grasp analysis method is from the first wave of grasping research based on analytic algorithms with physical models of contact dynamics and known geometry (*31*–*34*), the second wave of research based on data-driven learning and neural networks (*25*, *35*–*41*), or the recent wave of research combining the two (*1*), many grasp analysis methods often have the ability to generate multiple ranked candidate grasps. With multiple forward passes using the DJ-GOMP network, grasp candidates from these methods could be rapidly weighted on the basis of their potential execution speed. This would allow the combination of grasp analysis software and DJ-GOMP to rapidly determine which grasp of multiple candidates leads to the fastest motion or to perform a cost-benefit analysis—for example, trading off some reliability of the grasp for speed of motion.

We propose and experiment with an optimizing motion planning method in the context of a repeated pick-and-place scenario, in which the optimization is slowed down because of constraints on dynamics, obstacle avoidance, and degrees of freedom at pick and place points. We envision that other scenarios may also benefit from the proposed approach, including applications in manufacturing, assembly, painting, welding, inspection, robot-assisted surgery, construction, farming, and recycling. In each of these scenarios, the constraints in the optimization would need to adapt to the task, and the inputs to the system would also vary accordingly.

In future work, we will explore expanding DJ-GOMP to additional robots performing more varied tasks that would include increased variation of start and goal configurations and in more complex environments. We will also explore additional deep-learning approaches to find better approximations of the optimization process and thus allow for faster warm starting of the final optimization step of DJ-GOMP. For systems without access to a GPU or other neural network accelerator, it may be fruitful to explore other routes to compute a warm-start trajectory, e.g., different/smaller network design, or a nearest trajectory from the training dataset (*42*). There may be potential for using a deep learning–based warm start to speed up constrained optimizations in other fields of robotics, e.g., grasp contact models (*43*), task planning (*44*, *45*), and model predictive control (*46*, *47*)—potentially allowing such algorithms to run at interactive rates and enabling new applications.

This section describes the methods in DJ-GOMP. Underlying DJ-GOMP is a jerk- and time-optimizing constrained motion planner based on an SQP. Because of the complexity of solving this SQP, computation time can far exceed the trajectory’s execution time. DJ-GOMP uses this SQP on a random set of pick-and-place inputs to generate training data (trajectories) to train a neural network. During pick-and-place operation, DJ-GOMP uses the neural network to compute an approximate trajectory for the given pick and place frames, which it then uses to warm start the SQP.

To generate a jerk- and time-optimized trajectory, DJ-GOMP extends the SQP formulated in GOMP (*2*). The solver for this SQP, following the method in TrajOpt (*3*) and summarized in Algorithm 1, starts with a discretized estimate of the trajectory τ as a sequence of *H* waypoints after the starting configuration, in which each waypoint represents the robot’s configuration **q**, velocity **v**, acceleration **a**, and jerk **j** at a moment in time. The waypoints are sequentially separated by *t*_{step} seconds. This discretization is collected into **x**^{(0)}, where the superscript represents a refinement iteration. Thus

The choice of *H* and *t*_{step} is application specific, although in physical experiments, we set *t*_{step} to match (an integer multiple of) the control frequency of the robot, and we set *H* such that *H* · *t*_{step} is an estimate of the upper bound of the minimum trajectory time for the workspace and task input distribution.

The initial value of **x**^{(0)} seeds (or warm starts) the SQP computation. Without the approximation generated by the neural network (e.g., for training data set generation), this trajectory can be initialized to all zeros. In practice, the SQP can converge faster by first computing a trajectory between inverse kinematic solutions to **g**_{0} and **g*** _{H}* with only the convex kinematic and dynamic constraints (described below).

In each iteration *k* = (0,1,2, …, *m*) of the SQP, DJ-GOMP linearizes the nonconvex constraints of obstacles and pick-and-place locations and solves a QP of the following form**A** defines constraints enforcing the trust region, joint limits, and dynamics, and **P** is defined such that **x**^{T}**Px** is a sum-of-squared jerks. To enforce the linearized nonconvex constraints, DJ-GOMP adds constrained nonnegative slack variables penalized using appropriate coefficients in **p**. As DJ-GOMP iterates over the SQP, it increases the penalty term exponentially, terminating on the iteration *m* at which **x**^{(m)} meets the nonconvex constraints.

**Algorithm 1:** Jerk-limited Motion Plan

**Require:** **x**^{(0)} is an initial guess of the trajectory, *h* + 1 is the number of waypoints in **x**^{(0)}, *t*_{step} is the time between each waypoint, **g**_{0} and **g*** _{H}* are the pick and place frames, β

1: μ ← initial penalty multiple

2: ϵ_{trust} ← initial trust region size

3: *k* ← 0

4: **P**, **p**, **A**, **b** ← linearize **x**^{(0)} as a QP

5: **while** μ < μ_{max} **do**

6: **x**^{(k)} */

7: **if** sufficient decrease in trajectory cost **then**

8: *k* ← *k* + 1 /*accept trajectory */

9: ϵ_{trust} ← ϵ_{trust}β_{grow} /* grow trust region */

10: **A**, **b** ← update linearization using **x**^{(k)}

11: **else**

12: ϵ_{trust} ← ϵ_{trust}β_{shrink} /* shrink trust region */

13: **b** ← update trust region bounds only

14: **if** ϵ_{trust} < ϵ_{min_trust} **then**

15: μ ← γμ /* increase penalty */

16: ϵ_{trust} ← initial trust region size

17: **p** ← update penalty in QP to match μ

18: **return x**^{(k)}

To enforce joint limits and dynamic constraints, Algorithm 1 creates a matrix **A** and a vector **b** that enforce the following linear inequalities on joint limits

In addition, Algorithm 1 linearizes nonconvex constraints by adding slack variables to implement *L*_{1} penalties. Thus, for a nonconvex constraint *g _{j}*(

In the QP, obstacle avoidance constraints are linearized on the basis of the waypoints of the current iteration’s trajectory (Algorithm 2). To compute these constraints, the algorithm evaluates the spline**x*** _{t}*,

We observe that linearization at each waypoint will safely avoid obstacles with a sufficient buffer around obstacles (e.g., via a Minkowski difference with obstacles); however, slight variations in pick or place frames can shift the alignment of waypoints to obstacles. This shift of alignment (see Fig. 8C) can lead to solutions that vary disproportionately to small changes in input. Although this may be acceptable in operation, it can lead to data that can be difficult for a neural network to learn.

**Algorithm 2:** Linearize Obstacle-Avoidance Constraint

1: **for** *t* ∈ [0, *H*) **do**

2: (**n**_{min}, *d*_{min}) ← linearize obstacle nearest to *p*(**q*** _{t}*)

3: **q**_{min} ← **q**_{t}

4: **for all** *s* ∈ [0, *t*_{step}) **do** /* line search *s* to desired resolution */

5:

6: (**n*** _{s}*,

7: **if** **then** /* compare signed distance */

8: (**n**_{min}, *d*_{min}, **q**_{min}) ← (**n**_{s}, *d*_{s}, **q**_{s})

9: **J*** _{q}* ← Jacobian of

10: **J*** _{p}* ← Jacobian of position at

11:

12:

13: Add

As with GOMP, DJ-GOMP allows degrees of freedom in rotation and translation to be added to start and goal grasp frames. Adding this degree of freedom allows DJ-GOMP to take a potentially shorter path when an exact pose of the end effector is unnecessary. For example, a pick point with a parallel-jaw gripper can rotate about the axis defined by antipodal contact points (see Fig. 2), and the pick point with a suction gripper can rotate about the normal of its contact plane. Similarly, a task may allow for a place point anywhere within a bounded box. The degrees of freedom about the pick point can be optionally added as constraints that are linearized as**w**_{min} ≤ **w**_{max} defines the twist allowed about the pick point. Applying a similar set of constraints to **g*** _{H}* allows degrees of freedom in the place frame as well.

The SQP establishes trust regions to constrain the optimized trajectory to be within a box with extents defined by a shrinking trust region size. Because convex constraints on dynamics enforce the relationship between configuration, velocity, and acceleration of each waypoint, we observe that trust regions only need to be defined as box bounds around one of the three for each waypoint. In experiments, we established trust regions on configurations.

**Algorithm 3:** Time-optimal Motion Plan

**Require**: **g**_{0} and **g*** _{H}* are the start and end frames, γ > 1 is the search bisection ratio

1: *H*_{upper} ← fixed or estimated upper limit of maximum time

2: *H*_{lower} ← 3

3: *v*_{upper} ← ∞ /* constraint violation */

4: **while** *v*_{upper}> tolerance **do** /* find upper limit */

5: (**x**_{upper}, *v*_{upper}) ← call Alg. 1 with cold-start trajectory for *H*_{upper}

6: *H*_{upper} ← max(*H*_{upper} + 1, ⌈γ *H*_{upper}⌉)

7: **while** *H*_{lower} < *H*_{upper} **do** /* search for shortest *H* */

8: *H*_{min} ← *H*_{lower} + ⌊(*H*_{upper} − *H*_{lower})/γ⌋

9: (**x**_{mid}, *v*_{mid}) ← call Alg. 1 with warm-start trajectory **x**_{upper} interpolated to *H*_{mid}

10: **if v**_{mid}≤ tolerance **then**

11: (*H*_{upper}, **x**_{upper}, *v*_{upper}) ← (*H*_{mid}, **x**_{mid}, *v*_{mid})

12: **else**

13: *H*_{lower} ← *H*_{mid} + 1

14: **return x**_{upper}

To find the minimum time-time trajectory, J-GOMP searches for the shortest jerk-minimized trajectory that solves all constraints. This search, shown in Algorithm 3, starts with a guess of *H* and then performs an exponential search for the upper bound, followed by a binary search for the shortest *H* by repeatedly performing the SQP of Algorithm 1. The binary search warm starts each SQP with an interpolation of the trajectory of the current upper bound of *H*. The search ends when the upper and lower bounds of *H* are the same.

To speed up motion planning, we add a deep neural network to the pipeline. This neural network treats the trajectory optimization process as a function *f*_{τ} to approximate*H** waypoints, each of which has a configuration, velocity, acceleration, and jerk for all *n* joints of the robot. We assume that the neural network *f*_{τ}, thus *E*(τ). Hence, the output of

In this section, we describe a proposed neural network architecture, its loss function, training and testing dataset generation, and the training process. Although we posit that a more general approximation could include the amount of pick and place degrees of freedom as inputs, for brevity, we assume that *f*_{τ} and its neural network approximation are parameterized by a preset amount of pick and place degrees of freedom. In practice, it may also be appropriate to train multiple neural networks for different parameterizations of *f*_{τ}.

The deep neural network architecture we propose is depicted in Fig. 3. It consists of an input layer connected through four fully connected blocks to multiple output blocks. The input layer takes in the concatenated grasp frames *H** can vary, the network has multiple output heads for each of the possible values for *H**. To select which of the outputs to use, we use a separate classification network with two fully connected layers with one-hot encoding trained using a cross-entropy loss.

We refer to the horizon classification and multiple-output network as a HYDRA (Horizon Yielding Distillation through Retained Activations) network. The network yields both an optimal horizon length and the trajectory for that horizon. To train this network (detailed below), the trajectory output layers’ activation values for horizons not in the training sample are retained using a zero gradient so as to weight the contribution of the layers during backprop to the input layers.

In experiments, a neural network with a single output head was unable to produce a consistent result for predicting varied length horizons. We conjecture that the discontinuity between trajectories of different horizon lengths made it difficult to learn. In contrast, we found that a network was able to accurately learn a function for a single horizon length but was computationally and space inefficient, because each network should be able to share information about the function between the horizons. This led to the proposed design in which a single network with multiple output heads shares weights through multiple shared input layers.

We propose generating a training dataset by randomly sampling start and end pairs from the likely distribution of tasks. For example, in a warehouse pick-and-place operation, the pick frames will be constrained to a volume defined by the picking bin, and the place frames will be constrained to a volume defined by the placement or packing bin. For each random input, we generate optimized trajectories for all time horizons from *H*_{max} to the optimal *H**. Although this process generates more trajectories than necessary, generating each trajectory is efficient because the optimization for a trajectory of size *H* warm starts from the trajectory of size *H* + 1. Overall, this process is efficient and, with parallelization, can quickly generate a large training dataset.

This process can also help detect whether the analysis of the maximum trajectory duration was incorrect. If all trajectories are significantly shorter than *H*_{max}, then one may reduce the number of output heads. If any trajectory exceeds *H*_{max}, then the number of output heads can be increased.

We also note that in the case where the initial training data do not match the operational distribution of inputs, the result may be that the neural network produces suboptimal motions that are substantially, kinematically, and dynamically infeasible. In this case, the subsequent pass through the optimization (see “Fast pipeline for trajectory generation” section) will fix these errors, although with a longer computation time. We propose, in a manner similar to DAgger (*48*), that trajectories from operation can be continually added to the training dataset for subsequent training or refinement of the neural network.

To train the network in a way that encourages matching the reference trajectory while keeping the output trajectory kinematically and dynamically feasible, we propose a multipart loss function ℒ. This loss function includes a weighted sum of MSE loss on the trajectory * _{q}* = 10, α

Because the ground-truth trajectories for horizons shorter than *H** are not defined, we must ensure that some finite data are present so that the multiplication of an indicator value of 0 results in 0 (and not NaN). Multiplying by indicator function in this way results in a zero gradient for the part of the network that is not included in the trajectory data.

During training, we observed that the network would often exhibit behavior of coadaptation in which it would learn either *49*) with dropout probability *p*_{drop} = 0.5 between each fully connected layer, shown in Fig. 3. We annealed (*50*) *p*_{drop} to 0 over the course of the training to reduce the loss further.

The end goal of this proposed motion planning pipeline is to generate feasible, time-optimized trajectories quickly. The SQP computes feasible, time-optimized trajectories but is slow when starting from scratch. The HYDRA neural network computes trajectories quickly (e.g., the forward pass on the network in the results section requires ∼1 to compute) but without guarantees on correctness. In this section, we propose combining the properties of the SQP and HYDRA into a pipeline (see Fig. 9) to get fast computation of correct trajectories by using a forward pass on the neural network to warm start the SQP.

The first step in the pipeline is to compute *H**. This requires a single forward pass through the one-hot classification network. Because predicting horizons shorter than *H** result in an infeasible SQP, it can be beneficial to either compute multiple SQPs around the predicted horizon or increase the horizon if the difference in the one-hot values for

The second step in the pipeline is to compute

The final step is to compute the trajectory using ^{−3} m of the target frame, instead of the 10^{−6} m used in dataset generation.

We observe that symmetry in grippers, such as found in parallel and multifinger grippers, means that multiple top-down grasps can result in the same contact points [e.g., see Fig. 2 (A and D)]. In this setting, we can use

The experimental workspace consists of two bins position in front of a UR5 robotic arm fitted with a Robotiq 2F-85 parallel-jaw gripper. DJ-GOMP’s network is trained on inputs in which the gripper picks from the bin in front of it and places in the bin to its right while avoiding the bin wall between the pick and place points. The pick frame allows a degree of freedom in rotation about the grasp axis on the pick point and a degree in left-right and forward-back translation (but not up-down).

We generate uniform random pick and place points from box volumes bounded by their respective bins and with random top-down rotation of 0° to 180°. For each pick-place pair, we compute a J-GOMP trajectory to all four combinations of their symmetric grasp points. The resulting dataset consists of 100,000 (input, trajectory) pairs × 4 for the symmetric grasps. With this dataset, we train the neural network. In experiments, we use a different set of 1000 random inputs from the same distribution to compare the time to compute an optimized trajectory with and without warm start. We run a subset of these results on the physical robot to spot check that the generated trajectories will run on the UR5.

robotics.sciencemag.org/cgi/content/full/5/48/eabd7710/DC1

Movie S1. Example of motions computed by grasp-optimized motion planning with a deep-learning warm start.

Millirobots with magnetically drivable surfaces display a range of locomotive abilities and programmable behavior.

Millirobots that can adapt to unstructured environments, operate in confined spaces, and interact with a diverse range of objects would be desirable for exploration and biomedical applications. The continued development of millirobots, however, requires simple and scalable fabrication techniques. Here, we propose a minimalist approach to construct millirobots by coating inanimate objects with a composited agglutinate magnetic spray. Our approach enables a variety of one-dimensional (1D), 2D, or 3D objects to be covered with a thin magnetically drivable film (~100 to 250 micrometers in thickness). The film is thin enough to preserve the original size, morphology, and structure of the objects while providing actuation of up to hundreds of times its own weight. Under the actuation of a magnetic field, our millirobots are able to demonstrate a range of locomotive abilities: crawling, walking, and rolling. Moreover, we can reprogram and disintegrate the magnetic film on our millirobots on demand. We leverage these abilities to demonstrate biomedical applications, including catheter navigation and drug delivery.