publications | Lizhi Yang

2022

Elec. Img.

Drone object detection using RGB/IR fusion

Lizhi Yang, Ruhang Ma, and Avideh Zakhor

Electronic Imaging, 2022

Abs HTML PDF

Object detection using aerial drone imagery has received a great deal of attention in recent years. While visible light images are adequate for detecting objects in most scenarios, thermal cameras can extend the capabilities of object detection to night-time or occluded objects. As such, RGB and Infrared (IR) fusion methods for object detection are useful and important. One of the biggest challenges in applying deep learning methods to RGB/IR object detection is the lack of available training data for drone IR imagery, especially at night. In this paper, we develop several strategies for creating synthetic IR images using the AIRSim simulation engine and CycleGAN. Furthermore, we utilize an illumination-aware fusion framework to fuse RGB and IR images for object detection on the ground. We characterize and test our methods for both simulated and actual data. Our solution is implemented on an NVIDIA Jetson Xavier running on an actual drone, requiring about 28 milliseconds of processing per RGB/IR image pair.
ICRA

Bayesian Optimization Meets Hybrid Zero Dynamics: Safe Parameter Learning for Bipedal Locomotion Control

Lizhi Yang, Zhongyu Li, Jun Zeng, and 1 more author

In 2022 IEEE International Conference on Robotics and Automation (ICRA), 2022

Abs HTML PDF

In this paper, we propose a multi-domain control parameter learning framework that combines Bayesian Optimization (BO) and Hybrid Zero Dynamics (HZD) for locomotion control of bipedal robots. We leverage BO to learn the control parameters used in the HZD-based controller. The learning process is firstly deployed in simulation to optimize different control parameters for a large repertoire of gaits. Next, to tackle the discrepancy between the simulation and the real world, the learning process is applied on the physical robot to learn for corrections to the control parameters learned in simulation while also respecting a safety constraint for gait stability. This method empowers an efficient sim-to-real transition with a small number of samples in the real world, and does not require a valid controller to initialize the training in simulation. Our proposed learning framework is experimentally deployed and validated on a bipedal robot Cassie to perform versatile locomotion skills with improved performance on smoothness of walking gaits and reduction of steady-state tracking errors.
Elec. Img.

Sensor-aware frontier exploration and mapping with application to thermal mapping of building interiors

Zixian Zang, Haotian Shen, Lizhi Yang, and 1 more author

Electronic Imaging, 2022

Abs HTML PDF

The combination of simultaneous localization and mapping(SLAM) and frontier exploration enables a robot to traverse and map an unknown area autonomously. Most prior autonomous SLAM solutions utilize information only from depth sensing devices. However, in situations where the main goal is to collect data from auxiliary sensors such as thermal camera, existing approaches require two passes: one pass to create a map of the environment and another to collect the auxiliary data, which is both time consuming and energy inefficient. We propose a sensor-aware frontier exploration algorithm that enables the robot to perform map construction and auxiliary data collection in one pass. Specifically, our method uses a realtime ray tracing technique to construct a map that encodes unvisited locations from the perspective of auxiliary sensors rather than depth sensors; this encourages the robot to fully explore those areas to complete the data collection task and map making in one pass. Our proposed exploration framework is deployed on a LoCoBot with the task to collect thermal images from building envelopes. We validate with experiments in a multi-room commercial building. Using a metric that evaluates the coverage of sensor data, our method significantly outperforms the baseline method with a naive SLAM algorithm. The code can be found at https://github.com/lzyang2000/herox

2021

ICRA

Robotic Guide Dog: Leading a Human with Leash-Guided Hybrid Physical Interaction

Anxing Xiao, Wenzhe Tong, Lizhi Yang, and 3 more authors

In 2021 IEEE International Conference on Robotics and Automation (ICRA), May 2021

Abs HTML PDF

An autonomous robot that is able to physically guide humans through narrow and cluttered spaces could be a big boon to the visually-impaired. Most prior robotic guiding systems are based on wheeled platforms with large bases with actuated rigid guiding canes. The large bases and the actuated arms limit these prior approaches from operating in narrow and cluttered environments. We propose a method that introduces a quadrupedal robot with a leash to enable the robot-guidinghuman system to change its intrinsic dimension (by letting the leash go slack) in order to fit into narrow spaces. We propose a hybrid physical Human Robot Interaction model that involves leash tension to describe the dynamical relationship in the robot-guiding-human system. This hybrid model is utilized in a mixed-integer programming problem to develop a reactive planner that is able to utilize slack-taut switching to guide a blind-folded person to safely travel in a confined space. The proposed leash-guided robot framework is deployed on a Mini Cheetah quadrupedal robot and validated in experiments (Video ¹)
CASE

Autonomous Navigation for Quadrupedal Robots with Optimized Jumping through Constrained Obstacles

Scott Gilroy, Derek Lau, Lizhi Yang, and 8 more authors

In 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE), Aug 2021

Abs HTML PDF

Quadrupeds are strong candidates for navigating challenging environments because of their agile and dynamic designs. This paper presents a methodology that extends the range of exploration for quadrupedal robots by creating an end-to-end navigation framework that exploits walking and jumping modes. To obtain a dynamic jumping maneuver while avoiding obstacles, dynamically-feasible trajectories are optimized offline through collocation-based optimization where safety constraints are imposed. Such optimization schematic allows the robot to jump through window-shaped obstacles by considering both obstacles in the air and on the ground. The resulted jumping mode is utilized in an autonomous navigation pipeline that leverages a search-based global planner and a local planner to enable the robot to reach the goal location by walking. A state machine together with a decision making strategy allows the system to switch behaviors between walking around obstacles or jumping through them. The proposed framework is experimentally deployed and validated on a quadrupedal robot, a Mini Cheetah, to enable the robot to autonomously navigate through an environment while avoiding obstacles and jumping over a maximum height of 13 cm to pass through a window-shaped opening in order to reach its goal. (Video11Experimental videos can be found at https://youtu.be/5pzJ8U7YvGc.)

2020

EPIC@ECCV2020

Spatio-Temporal Action Detection with Multi-Object Interaction

Huijuan Xu, Lizhi Yang, Stan Sclaroff, and 2 more authors

Aug 2020

Abs HTML PDF

Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube". Nowadays, most spatio-temporal action detection datasets (e.g. UCF101-24, AVA, DALY) are annotated with action tubes that contain a single person performing the action, thus the predominant action detection models simply employ a person detection and tracking pipeline for localization. However, when the action is defined as an interaction between multiple objects, such methods may fail since each bounding box in the action tube contains multiple objects instead of one person. In this paper, we study the spatio-temporal action detection problem with multi-object interaction. We introduce a new dataset that is annotated with action tubes containing multi-object interactions. Moreover, we propose an end-to-end spatio-temporal action detection model that performs both spatial and temporal regression simultaneously. Our spatial regression may enclose multiple objects participating in the action. During test time, we simply connect the regressed bounding boxes within the predicted temporal duration using a simple heuristic. We report the baseline results of our proposed model on this new dataset, and also show competitive results on the standard benchmark UCF101-24 using only RGB input.
ICCHP

Indoor query system for the visually impaired

Lizhi Yang, Ilian Herzi, Avideh Zakhor, and 3 more authors

In International Conference on Computers Helping People with Special Needs, Aug 2020

Abs HTML PDF

Scene query is an important problem for the visually impaired population. While existing systems are able to recognize objects surrounding a person, one of their significant shortcomings is that they typically rely on the phone camera with a finite field of view. Therefore, if the object is situated behind the user, it will go undetected unless the user spins around and takes a series of pictures. The recent introduction of affordable, panoramic cameras solves this problem. In addition, most existing systems report all “significant” objects in a given scene to the user, rather than respond to a specific user-generated query as to where an object located. The recent introduction of text-to-speech and speech recognition capabilities on mobile phones paves the way for such user-generated queries, and for audio response generation to the user. In this paper, we exploit the above advancements to develop a query system for the visually impaired utilizing a panoramic camera and a smartphone. We propose three designs for such a system: the first is a handheld device, and the second and third are wearable backpack and ring. In all three cases, the user interacts with our systems verbally regarding whereabouts of objects of interest. We exploit deep learning methods to train our system to recognize objects of interest. Accuracy of our system for the disjoint test data from the same buildings in the training set is 99%, and for test data from new buildings not present in the training data set is 53%.