Skip to content

All search terms

all search terms

Publish Date Title Authors PDF Code Abstract
2023-07-27 PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking Yang Zheng et.al. 2307.15055v1 null We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks. Our data and code are publicly available at: https://pointodyssey.com
2023-07-27 MapNeRF: Incorporating Map Priors into Neural Radiance Fields for Driving View Simulation Chenming Wu et.al. 2307.14981v1 null Simulating camera sensors is a crucial task in autonomous driving. Although neural radiance fields are exceptional at synthesizing photorealistic views in driving simulations, they still fail in generating extrapolated views. This paper proposes to incorporate map priors into neural radiance fields to synthesize out-of-trajectory driving views with semantic road consistency. The key insight is that map information can be utilized as a prior to guide the training of the radiance fields with uncertainty. Specifically, we utilize the coarse ground surface as uncertain information to supervise the density field and warp depth with uncertainty from unknown camera poses to ensure multi-view consistency. Experimental results demonstrate that our approach can produce semantic consistency in deviated views for vehicle camera simulation.
2023-07-27 GET3D--: Learning GET3D from Unconstrained Image Collections Fanghua Yu et.al. 2307.14918v1 null The demand for efficient 3D model generation techniques has grown exponentially, as manual creation of 3D models is time-consuming and requires specialized expertise. While generative models have shown potential in creating 3D textured shapes from 2D images, their applicability in 3D industries is limited due to the lack of a well-defined camera distribution in real-world scenarios, resulting in low-quality shapes. To overcome this limitation, we propose GET3D--, the first method that directly generates textured 3D shapes from 2D images with unknown pose and scale. GET3D-- comprises a 3D shape generator and a learnable camera sampler that captures the 6D external changes on the camera. In addition, We propose a novel training schedule to stably optimize both the shape generator and camera sampler in a unified framework. By controlling external variations using the learnable camera sampler, our method can generate aligned shapes with clear textures. Extensive experiments demonstrate the efficacy of GET3D--, which precisely fits the 6D camera pose distribution and generates high-quality shapes on both synthetic and realistic unconstrained datasets.
2023-07-27 Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving Peter Bauer et.al. 2307.14889v1 null Accurate 3D human pose estimation (3D HPE) is crucial for enabling autonomous vehicles (AVs) to make informed decisions and respond proactively in critical road scenarios. Promising results of 3D HPE have been gained in several domains such as human-computer interaction, robotics, sports and medical analytics, often based on data collected in well-controlled laboratory environments. Nevertheless, the transfer of 3D HPE methods to AVs has received limited research attention, due to the challenges posed by obtaining accurate 3D pose annotations and the limited suitability of data from other domains. We present a simple yet efficient weakly supervised approach for 3D HPE in the AV context by employing a high-level sensor fusion between camera and LiDAR data. The weakly supervised setting enables training on the target datasets without any 2D/3D keypoint labels by using an off-the-shelf 2D joint extractor and pseudo labels generated from LiDAR to image projections. Our approach outperforms state-of-the-art results by up to $\sim$ 13% on the Waymo Open Dataset in the weakly supervised setting and achieves state-of-the-art results in the supervised setting.
2023-07-27 Learning Full-Head 3D GANs from a Single-View Portrait Dataset Yiqian Wu et.al. 2307.14770v1 null 33D-aware face generators are commonly trained on 2D real-life face image datasets. Nevertheless, existing facial recognition methods often struggle to extract face data captured from various camera angles. Furthermore, in-the-wild images with diverse body poses introduce a high-dimensional challenge for 3D-aware generators, making it difficult to utilize data that contains complete neck and shoulder regions. Consequently, these face image datasets often contain only near-frontal face data, which poses challenges for 3D-aware face generators to construct \textit{full-head} 3D portraits. To this end, we first create the dataset {$\it{360}^{\circ}$}-\textit{Portrait}-\textit{HQ} (\textit{$\it{360}^{\circ}$PHQ}), which consists of high-quality single-view real portraits annotated with a variety of camera parameters {(the yaw angles span the entire $360^{\circ}$ range)} and body poses. We then propose \textit{3DPortraitGAN}, the first 3D-aware full-head portrait generator that learns a canonical 3D avatar distribution from the body-pose-various \textit{$\it{360}^{\circ}$PHQ} dataset with body pose self-learning. Our model can generate view-consistent portrait images from all camera angles (${360}^{\circ}$) with a full-head 3D representation. We incorporate a mesh-guided deformation field into volumetric rendering to produce deformed results to generate portrait images that conform to the body pose distribution of the dataset using our canonical generator. We integrate two pose predictors into our framework to predict more accurate body poses to address the issue of inaccurately estimated body poses in our dataset. Our experiments show that the proposed framework can generate view-consistent, realistic portrait images with complete geometry from all camera angles and accurately predict portrait body pose.
2023-07-27 High Dynamic Range Imaging via Visual Attention Modules Ali Reza Omrani et.al. 2307.14705v1 link Thanks to High Dynamic Range (HDR) imaging methods, the scope of photography has seen profound changes recently. To be more specific, such methods try to reconstruct the lost luminosity of the real world caused by the limitation of regular cameras from the Low Dynamic Range (LDR) images. Additionally, although the State-Of-The-Art methods in this topic perform well, they mainly concentrate on combining different exposures and have less attention to extracting the informative parts of the images. Thus, this paper aims to introduce a new model capable of incorporating information from the most visible areas of each image extracted by a visual attention module (VAM), which is a result of a segmentation strategy. In particular, the model, based on a deep learning architecture, utilizes the extracted areas to produce the final HDR image. The results demonstrate that our method outperformed most of the State-Of-The-Art algorithms.
2023-07-27 FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene Chengrui Wei et.al. 2307.14624v1 null It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes. We observe that it is essentially due to not only the scale-ambiguous problem but also the focal-ambiguous problem that decreases the generalization ability of monocular depth estimation. That is, images may be captured by cameras of different focal lengths in scenes of different scales. In this paper, we develop a focal-and-scale depth estimation model to well learn absolute depth maps from single images in unseen indoor scenes. First, a relative depth estimation network is adopted to learn relative depths from single images with diverse scales/semantics. Second, multi-scale features are generated by mapping a single focal length value to focal length features and concatenating them with intermediate features of different scales in relative depth estimation. Finally, relative depths and multi-scale features are jointly fed into an absolute depth estimation network. In addition, a new pipeline is developed to augment the diversity of focal lengths of public datasets, which are often captured with cameras of the same or similar focal lengths. Our model is trained on augmented NYUDv2 and tested on three unseen datasets. Our model considerably improves the generalization ability of depth estimation by 41%/13% (RMSE) with/without data augmentation compared with five recent SOTAs and well alleviates the deformation problem in 3D reconstruction. Notably, our model well maintains the accuracy of depth estimation on original NYUDv2.
2023-07-27 White-light superflare and long-term activity of the nearby M7 type binary EI~Cnc observed with GWAC system Hua-Li Li et.al. 2307.14594v1 null Stellar white-light flares are believed to play an essential role on the physical and chemical properties of the atmosphere of the surrounding exoplanets. Here we report an optical monitoring campaign on the nearby flaring system EI~Cnc carried out by the Ground-based Wide Angle Cameras (GWAC) and its dedicated follow-up telescope. A superflare, coming from the brighter component EI~CncA, was detected and observed, in which four components are required to properly model the complex decay light curve. The lower limit of flare energy in the $R-$band is estimated to be $3.3\times10^{32}$ ergs. 27 flares are additionally detected from the GWAC archive data with a total duration of 290 hours. The inferred cumulative flare frequency distribution follows a quite shallow power-law function with a slope of $\beta=-0.50\pm 0.03$ over the energy range between $10^{30}$ and $10^{33}$ erg, which reinforces the trend that stars cooler than M4 show enhanced superflare activity. The flares identified in EI~Cnc enable us to extend the $\tau-E$ relationship previously established in the white-light superflares of solar-type stars down to an energy as low as $\sim10^{30}$erg (i.e., by three orders): $\tau\propto E^{0.42\pm0.02}$, which suggests a common flare mechanism for stars with a type from M to solar-like, and implies an invariant of $B^{1/3}\upsilon_{\rm A}$ in the white-light flares.
2023-07-27 MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical Image Segmentation Liang Xu et.al. 2307.14588v1 link The UNet architecture, based on Convolutional Neural Networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome this limitation by effectively capturing global feature correlations. However, the integration of the Transformer modules may result in the loss of local contextual information during the global feature fusion process. To overcome these challenges, we propose a 2D medical image segmentation model called Multi-scale Cross Perceptron Attention Network (MCPA). The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron. The Cross Perceptron first captures the local correlations using multiple Multi-scale Cross Perceptron modules, facilitating the fusion of features across scales. The resulting multi-scale feature vectors are then spatially unfolded, concatenated, and fed through a Global Perceptron module to model global dependencies. Furthermore, we introduce a Progressive Dual-branch Structure to address the semantic segmentation of the image involving finer tissue structures. This structure gradually shifts the segmentation focus of MCPA network training from large-scale structural features to more sophisticated pixel-level features. We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices, including the open large-scale dataset of CT (Synapse), MRI (ACDC), fundus camera (DRIVE, CHASE_DB1, HRF), and OCTA (ROSE). The experimental results show that our MCPA model achieves state-of-the-art performance. The code is available at https://github.com/simonustc/MCPA-for-2D-Medical-Image-Segmentation.
2023-07-27 A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos Rongqin Liang et.al. 2307.14575v1 null Identifying traffic accidents in driving videos is crucial to ensuring the safety of autonomous driving and driver assistance systems. To address the potential danger caused by the long-tailed distribution of driving events, existing traffic accident detection (TAD) methods mainly rely on unsupervised learning. However, TAD is still challenging due to the rapid movement of cameras and dynamic scenes in driving scenarios. Existing unsupervised TAD methods mainly rely on a single pretext task, i.e., an appearance-based or future object localization task, to detect accidents. However, appearance-based approaches are easily disturbed by the rapid movement of the camera and changes in illumination, which significantly reduce the performance of traffic accident detection. Methods based on future object localization may fail to capture appearance changes in video frames, making it difficult to detect ego-involved accidents (e.g., out of control of the ego-vehicle). In this paper, we propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos. Different from previous approaches, our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames through the collaboration of optical flow reconstruction and future object localization tasks. Further, we introduce a memory-augmented motion representation mechanism to fully explore the interrelation between different types of motion representations and exploit the high-level features of normal traffic patterns stored in memory to augment motion representations, thus enlarging the difference from anomalies. Experimental results on recently published large-scale dataset demonstrate that our method achieves better performance compared to previous state-of-the-art approaches.
2023-07-26 Patterns of Vehicle Lights: Addressing Complexities in Curation and Annotation of Camera-Based Vehicle Light Datasets and Metrics Ross Greer et.al. 2307.14521v1 null This paper explores the representation of vehicle lights in computer vision and its implications for various tasks in the field of autonomous driving. Different specifications for representing vehicle lights, including bounding boxes, center points, corner points, and segmentation masks, are discussed in terms of their strengths and weaknesses. Three important tasks in autonomous driving that can benefit from vehicle light detection are identified: nighttime vehicle detection, 3D vehicle orientation estimation, and dynamic trajectory cues. Each task may require a different representation of the light. The challenges of collecting and annotating large datasets for training data-driven models are also addressed, leading to introduction of the LISA Vehicle Lights Dataset and associated Light Visibility Model, which provides light annotations specifically designed for downstream applications in vehicle detection, intent and trajectory prediction, and safe path planning. A comparison of existing vehicle light datasets is provided, highlighting the unique features and limitations of each dataset. Overall, this paper provides insights into the representation of vehicle lights and the importance of accurate annotations for training effective detection models in autonomous driving applications. Our dataset and model are made available at https://cvrr.ucsd.edu/vehicle-lights-dataset
2023-07-26 Technical note: ShinyAnimalCV: open-source cloud-based web application for object detection, segmentation, and three-dimensional visualization of animals using computer vision Jin Wang et.al. 2307.14487v1 null Computer vision (CV), a non-intrusive and cost-effective technology, has furthered the development of precision livestock farming by enabling optimized decision-making through timely and individualized animal care. The availability of affordable two- and three-dimensional camera sensors, combined with various machine learning and deep learning algorithms, has provided a valuable opportunity to improve livestock production systems. However, despite the availability of various CV tools in the public domain, applying these tools to animal data can be challenging, often requiring users to have programming and data analysis skills, as well as access to computing resources. Moreover, the rapid expansion of precision livestock farming is creating a growing need to educate and train animal science students in CV. This presents educators with the challenge of efficiently demonstrating the complex algorithms involved in CV. Thus, the objective of this study was to develop ShinyAnimalCV, an open-source cloud-based web application. This application provides a user-friendly interface for performing CV tasks, including object segmentation, detection, three-dimensional surface visualization, and extraction of two- and three-dimensional morphological features. Nine pre-trained CV models using top-view animal data are included in the application. ShinyAnimalCV has been deployed online using cloud computing platforms. The source code of ShinyAnimalCV is available on GitHub, along with detailed documentation on training CV models using custom data and deploying ShinyAnimalCV locally to allow users to fully leverage the capabilities of the application. ShinyAnimalCV can contribute to CV research and teaching in the animal science community.
2023-07-26 AutoSourceID-Classifier. Star-Galaxy Classification using a Convolutional Neural Network with Spatial Information F. Stoppa et.al. 2307.14456v1 null Aims. Traditional star-galaxy classification techniques often rely on feature estimation from catalogues, a process susceptible to introducing inaccuracies, thereby potentially jeopardizing the classification's reliability. Certain galaxies, especially those not manifesting as extended sources, can be misclassified when their shape parameters and flux solely drive the inference. We aim to create a robust and accurate classification network for identifying stars and galaxies directly from astronomical images. By leveraging convolutional neural networks (CNN) and additional information about the source position, we aim to accurately classify all stars and galaxies within a survey, particularly those with a signal-to-noise ratio (S/N) near the detection limit. Methods. The AutoSourceID-Classifier (ASID-C) algorithm developed here uses 32x32 pixel single filter band source cutouts generated by the previously developed ASID-L code. ASID-C utilizes CNNs to distinguish these cutouts into stars or galaxies, leveraging their strong feature-learning capabilities. Subsequently, we employ a modified Platt Scaling calibration for the output of the CNN. This technique ensures that the derived probabilities are effectively calibrated, delivering precise and reliable results. Results. We show that ASID-C, trained on MeerLICHT telescope images and using the Dark Energy Camera Legacy Survey (DECaLS) morphological classification, outperforms similar codes like SourceExtractor. ASID-C opens up new possibilities for accurate celestial object classification, especially for sources with a S/N near the detection limit. Potential applications of ASID-C, like real-time star-galaxy classification and transient's host identification, promise significant contributions to astronomical research.
2023-07-26 US & MR Image-Fusion Based on Skin Co-Registration Martina Paccini et.al. 2307.14288v1 null The study and development of innovative solutions for the advanced visualisation, representation and analysis of medical images offer different research directions. Current practice in medical imaging consists in combining real-time US with imaging modalities that allow internal anatomy acquisitions, such as CT, MRI, PET or similar. Application of image-fusion approaches can be found in tracking surgical tools and/or needles, in real-time during interventions. Thus, this work proposes a fusion imaging system for the registration of CT and MRI images with real-time US acquisition leveraging a 3D camera sensor. The main focus of the work is the portability of the system and its applicability to different anatomical districts.
2023-07-26 Probing reflection from aerosols with the near-infrared dayside spectrum of WASP-80b Bob Jacobs et.al. 2307.14399v1 null The presence of aerosols is intimately linked to the global energy budget and composition of planet atmospheres. Their ability to reflect incoming light prevents energy from being deposited into the atmosphere, and they shape spectra of exoplanets. We observed one near-infrared secondary eclipse of WASP-80b with the Wide Field Camera 3 aboard the Hubble Space Telescope to provide constraints on the presence and properties of atmospheric aerosols. We detect a broadband eclipse depth of $34\pm10$ ppm for WASP-80b, making this the lowest equilibrium temperature planet for which a secondary eclipse has been detected so far with WFC3. We detect a higher planetary flux than expected from thermal emission alone at $1.6\sigma$ that hints toward the presence of reflecting aerosols on this planet's dayside. We paired the WFC3 data with Spitzer data and explored multiple atmospheric models with and without aerosols to interpret this spectrum. Albeit consistent with a clear dayside atmosphere, we found a slight preference for near-solar metallicities and for dayside clouds over hazes. We exclude soot haze formation rates higher than $10^{-10.7}$ g cm$^{-2}$s$^{-1}$ and tholin formation rates higher than $10^{-12.0}$ g cm$^{-2}$s$^{-1}$ at $3\sigma$. We applied the same atmospheric models to a previously published WFC3/Spitzer transmission spectrum for this planet and find weak haze formation. A single soot haze formation rate best fits both the dayside and the transmission spectra simultaneously. However, we emphasize that no models provide satisfactory fits in terms of chi-square of both spectra simultaneously, indicating longitudinal dissimilarity in the atmosphere's aerosol composition.
2023-07-26 DisguisOR: Holistic Face Anonymization for the Operating Room Lennart Bastian et.al. 2307.14241v1 link Purpose: Recent advances in Surgical Data Science (SDS) have contributed to an increase in video recordings from hospital environments. While methods such as surgical workflow recognition show potential in increasing the quality of patient care, the quantity of video data has surpassed the scale at which images can be manually anonymized. Existing automated 2D anonymization methods under-perform in Operating Rooms (OR), due to occlusions and obstructions. We propose to anonymize multi-view OR recordings using 3D data from multiple camera streams. Methods: RGB and depth images from multiple cameras are fused into a 3D point cloud representation of the scene. We then detect each individual's face in 3D by regressing a parametric human mesh model onto detected 3D human keypoints and aligning the face mesh with the fused 3D point cloud. The mesh model is rendered into every acquired camera view, replacing each individual's face. Results: Our method shows promise in locating faces at a higher rate than existing approaches. DisguisOR produces geometrically consistent anonymizations for each camera view, enabling more realistic anonymization that is less detrimental to downstream tasks. Conclusion: Frequent obstructions and crowding in operating rooms leaves significant room for improvement for off-the-shelf anonymization methods. DisguisOR addresses privacy on a scene level and has the potential to facilitate further research in SDS.
2023-07-26 The nature of the X-ray sources in dwarf galaxies in nearby clusters from the KIWICS Şeyda Şen et.al. 2307.14230v1 null We present a deep search for and analysis of X-ray sources in a sample of dwarf galaxies (M${r}$ < -15.5 mag) located within twelve galaxy clusters from the Kapteyn IAC WEAVE INT Cluster Survey (KIWICS) of photometric observations in the $\textit{r}$ and $\textit{g}$ using the Wide Field Camera (WFC) at the 2.5-m Isaac Newton telescope (INT). We first investigated the optical data, identified 2720 dwarf galaxies in all fields and determined their characteristics; namely, their colors, effective radii, and stellar masses. We then searched the $\textit{Chandra}$ data archive for X-ray counterparts of optically detected dwarf galaxies. We found a total of 20 X-ray emitting dwarf galaxies, with X-ray flux ranging from 1.7$\times10^{-15}$ to 4.1$\times10^{-14}$ erg cm$^{-2}$ s$^{-1}$ and X-ray luminosities varying from 2$\times10^{39}$ to 5.4$\times10^{41}$ erg s$^{-1}$. Our results indicate that the X-ray luminosity of the sources in our sample is larger than the Eddington luminosity limit for a typical neutron star, even at the lowest observed levels. This leads us to conclude that the sources emitting X-rays in our sample are likely black holes. Additionally, we have employed a scaling relation between black hole and stellar mass to estimate the masses of the black holes in our sample, and have determined a range of black hole masses from 4.6$\times10^{4}$ to 1.5$\times10^{6}$ M$\odot$. Finally, we find a trend between X-ray to optical flux ratio and X-ray flux. We discuss the implications of our findings and highlight the importance of X-ray observations in studying the properties of dwarf galaxies.
2023-07-26 Tackling Scattering and Reflective Flare in Mobile Camera Systems: A Raw Image Dataset for Enhanced Flare Removal Fengbo Lan et.al. 2307.14180v1 null The increasing prevalence of mobile devices has led to significant advancements in mobile camera systems and improved image quality. Nonetheless, mobile photography still grapples with challenging issues such as scattering and reflective flare. The absence of a comprehensive real image dataset tailored for mobile phones hinders the development of effective flare mitigation techniques. To address this issue, we present a novel raw image dataset specifically designed for mobile camera systems, focusing on flare removal. Capitalizing on the distinct properties of raw images, this dataset serves as a solid foundation for developing advanced flare removal algorithms. It encompasses a wide variety of real-world scenarios captured with diverse mobile devices and camera settings. The dataset comprises over 2,000 high-quality full-resolution raw image pairs for scattering flare and 1,100 for reflective flare, which can be further segmented into up to 30,000 and 2,200 paired patches, respectively, ensuring broad adaptability across various imaging conditions. Experimental results demonstrate that networks trained with synthesized data struggle to cope with complex lighting settings present in this real image dataset. We also show that processing data through a mobile phone's internal ISP compromises image quality while using raw image data presents significant advantages for addressing the flare removal problem. Our dataset is expected to enable an array of new research in flare removal and contribute to substantial improvements in mobile image quality, benefiting mobile photographers and end-users alike.
2023-07-26 Memory-Efficient Graph Convolutional Networks for Object Classification and Detection with Event Cameras Kamil Jeziorek et.al. 2307.14124v1 null Recent advances in event camera research emphasize processing data in its original sparse form, which allows the use of its unique features such as high temporal resolution, high dynamic range, low latency, and resistance to image blur. One promising approach for analyzing event data is through graph convolutional networks (GCNs). However, current research in this domain primarily focuses on optimizing computational costs, neglecting the associated memory costs. In this paper, we consider both factors together in order to achieve satisfying results and relatively low model complexity. For this purpose, we performed a comparative analysis of different graph convolution operations, considering factors such as execution time, the number of trainable model parameters, data format requirements, and training outcomes. Our results show a 450-fold reduction in the number of parameters for the feature extraction module and a 4.5-fold reduction in the size of the data representation while maintaining a classification accuracy of 52.3%, which is 6.3% higher compared to the operation used in state-of-the-art approaches. To further evaluate performance, we implemented the object detection architecture and evaluated its performance on the N-Caltech101 dataset. The results showed an accuracy of 53.7 % mAP@0.5 and reached an execution rate of 82 graphs per second.
2023-07-26 Learning heterogeneous delays in a layer of spiking neurons for fast motion detection Antoine Grimaldi et.al. 2307.14077v1 null The precise timing of spikes emitted by neurons plays a crucial role in shaping the response of efferent biological neurons. This temporal dimension of neural activity holds significant importance in understanding information processing in neurobiology, especially for the performance of neuromorphic hardware, such as event-based cameras. Nonetheless, many artificial neural models disregard this critical temporal dimension of neural activity. In this study, we present a model designed to efficiently detect temporal spiking motifs using a layer of spiking neurons equipped with heterogeneous synaptic delays. Our model capitalizes on the diverse synaptic delays present on the dendritic tree, enabling specific arrangements of temporally precise synaptic inputs to synchronize upon reaching the basal dendritic tree. We formalize this process as a time-invariant logistic regression, which can be trained using labeled data. To demonstrate its practical efficacy, we apply the model to naturalistic videos transformed into event streams, simulating the output of the biological retina or event-based cameras. To evaluate the robustness of the model in detecting visual motion, we conduct experiments by selectively pruning weights and demonstrate that the model remains efficient even under significantly reduced workloads. In conclusion, by providing a comprehensive, event-driven computational building block, the incorporation of heterogeneous delays has the potential to greatly improve the performance of future spiking neural network algorithms, particularly in the context of neuromorphic chips.
2023-07-26 Three-year performance of the IceAct telescopes at the IceCube Neutrino Observatory Lars Heuermann et.al. 2307.13969v1 null IceAct is an array of compact Imaging Air Cherenkov Telescopes at the ice surface as part of the IceCube Neutrino Observatory. The telescopes, featuring a camera of 61 silicon photomultipliers and fresnel-lens-based optics, are optimized to be operated in harsh environmental conditions, such as at the South Pole. Since 2019, the first two telescopes have been operating in a stereoscopic configuration in the center of IceCube's surface detector IceTop. With an energy threshold of about 10 TeV and a wide field-of-view, the IceAct telescopes show promising capabilities of improving current cosmic-ray composition studies: measuring the Cherenkov light emissions in the atmosphere adds new information about the shower development not accessible with the current detectors. First simulations indicate that the added information of a single telescope leads, e.g., to an improved discrimination between flux contributions from different primary particle species in the sensitive energy range. We review the performance and detector operations of the telescopes during the past 3 years (2020-2022) and give an outlook on the future of IceAct.
2023-07-26 Towards a cosmic ray composition measurement with the IceAct telescopes at the IceCube Neutrino Observatory Larissa Paul et.al. 2307.13965v1 null The IceCube Neutrino Observatory is equipped with the unique possibility to measure cosmic ray induced air showers simultaneously by their particle footprint on the surface with the IceTop detector and by the high-energy muonic shower component at a depth of more than 1.5 km. Since 2019 additionally two Imaging Air Cherenkov Telescopes, called IceAct, measure the electromagnetic component of air showers in the atmosphere above the IceCube detector. This opens the possibility to measure air shower parameters in three independent detectors and allows to improve mass composition studies with the IceCube data. One IceAct camera consists of 61 SiPM pixels in a hexagonal grid. Each pixel has a field of view of 1.5 degree resulting in an approximately 12-degree field of view per camera. A single telescope tube has a diameter of 50 cm, is built robust enough to withstand the harsh Antarctic conditions, and is able to detect cosmic ray particles with energies above approximately 10 TeV. A Graph Neural Network (GNN) is trained to determine the air shower properties from IceAct data. The composition analysis is then performed using Random Forest Regression (RF). Since all three detectors have a different energy threshold, we train several RFs with different inputs, combining the different detectors and taking advantage of the lower energy threshold of the IceAct telescopes. This will result in composition measurements for different detector combinations and enables cross-checks of the results in overlapping energy bands. We present the method, parameters for data selection, and the status of this analysis.
2023-07-25 Decisive Data using Multi-Modality Optical Sensors for Advanced Vehicular Systems Muhammad Ali Farooq et.al. 2307.13600v1 null Optical sensors have played a pivotal role in acquiring real world data for critical applications. This data, when integrated with advanced machine learning algorithms provides meaningful information thus enhancing human vision. This paper focuses on various optical technologies for design and development of state-of-the-art out-cabin forward vision systems and in-cabin driver monitoring systems. The focused optical sensors include Longwave Thermal Imaging (LWIR) cameras, Near Infrared (NIR), Neuromorphic/ event cameras, Visible CMOS cameras and Depth cameras. Further the paper discusses different potential applications which can be employed using the unique strengths of each these optical modalities in real time environment.
2023-07-25 HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird's Eye View Yiming Wu et.al. 2307.13510v1 null Vision-based Bird's Eye View (BEV) representation is an emerging perception formulation for autonomous driving. The core challenge is to construct BEV space with multi-camera features, which is a one-to-many ill-posed problem. Diving into all previous BEV representation generation methods, we found that most of them fall into two types: modeling depths in image views or modeling heights in the BEV space, mostly in an implicit way. In this work, we propose to explicitly model heights in the BEV space, which needs no extra data like LiDAR and can fit arbitrary camera rigs and types compared to modeling depths. Theoretically, we give proof of the equivalence between height-based methods and depth-based methods. Considering the equivalence and some advantages of modeling heights, we propose HeightFormer, which models heights and uncertainties in a self-recursive way. Without any extra data, the proposed HeightFormer could estimate heights in BEV accurately. Benchmark results show that the performance of HeightFormer achieves SOTA compared with those camera-only methods.
2023-07-25 Prior Based Online Lane Graph Extraction from Single Onboard Camera Image Yigit Baran Can et.al. 2307.13344v1 null The local road network information is essential for autonomous navigation. This information is commonly obtained from offline HD-Maps in terms of lane graphs. However, the local road network at a given moment can be drastically different than the one given in the offline maps; due to construction works, accidents etc. Moreover, the autonomous vehicle might be at a location not covered in the offline HD-Map. Thus, online estimation of the lane graph is crucial for widespread and reliable autonomous navigation. In this work, we tackle online Bird's-Eye-View lane graph extraction from a single onboard camera image. We propose to use prior information to increase quality of the estimations. The prior is extracted from the dataset through a transformer based Wasserstein Autoencoder. The autoencoder is then used to enhance the initial lane graph estimates. This is done through optimization of the latent space vector. The optimization encourages the lane graph estimation to be logical by discouraging it to diverge from the prior distribution. We test the method on two benchmark datasets, NuScenes and Argoverse. The results show that the proposed method significantly improves the performance compared to state-of-the-art methods.
2023-07-25 A Visual Quality Assessment Method for Raster Images in Scanned Document Justin Yang et.al. 2307.13241v1 null Image quality assessment (IQA) is an active research area in the field of image processing. Most prior works focus on visual quality of natural images captured by cameras. In this paper, we explore visual quality of scanned documents, focusing on raster image areas. Different from many existing works which aim to estimate a visual quality score, we propose a machine learning based classification method to determine whether the visual quality of a scanned raster image at a given resolution setting is acceptable. We conduct a psychophysical study to determine the acceptability at different image resolutions based on human subject ratings and use them as the ground truth to train our machine learning model. However, this dataset is unbalanced as most images were rated as visually acceptable. To address the data imbalance problem, we introduce several noise models to simulate the degradation of image quality during the scanning process. Our results show that by including augmented data in training, we can significantly improve the performance of the classifier to determine whether the visual quality of raster images in a scanned document is acceptable or not for a given resolution setting.
2023-07-24 Why Don't You Clean Your Glasses? Perception Attacks with Dynamic Optical Perturbations Yi Han et.al. 2307.13131v1 null Camera-based autonomous systems that emulate human perception are increasingly being integrated into safety-critical platforms. Consequently, an established body of literature has emerged that explores adversarial attacks targeting the underlying machine learning models. Adapting adversarial attacks to the physical world is desirable for the attacker, as this removes the need to compromise digital systems. However, the real world poses challenges related to the "survivability" of adversarial manipulations given environmental noise in perception pipelines and the dynamicity of autonomous systems. In this paper, we take a sensor-first approach. We present EvilEye, a man-in-the-middle perception attack that leverages transparent displays to generate dynamic physical adversarial examples. EvilEye exploits the camera's optics to induce misclassifications under a variety of illumination conditions. To generate dynamic perturbations, we formalize the projection of a digital attack into the physical domain by modeling the transformation function of the captured image through the optical pipeline. Our extensive experiments show that EvilEye's generated adversarial perturbations are much more robust across varying environmental light conditions relative to existing physical perturbation frameworks, achieving a high attack success rate (ASR) while bypassing state-of-the-art physical adversarial detection frameworks. We demonstrate that the dynamic nature of EvilEye enables attackers to adapt adversarial examples across a variety of objects with a significantly higher ASR compared to state-of-the-art physical world attack frameworks. Finally, we discuss mitigation strategies against the EvilEye attack.
2023-07-24 Automatic Infant Respiration Estimation from Video: A Deep Flow-based Algorithm and a Novel Public Benchmark Sai Kumar Reddy Manne et.al. 2307.13110v1 link Respiration is a critical vital sign for infants, and continuous respiratory monitoring is particularly important for newborns. However, neonates are sensitive and contact-based sensors present challenges in comfort, hygiene, and skin health, especially for preterm babies. As a step toward fully automatic, continuous, and contactless respiratory monitoring, we develop a deep-learning method for estimating respiratory rate and waveform from plain video footage in natural settings. Our automated infant respiration flow-based network (AIRFlowNet) combines video-extracted optical flow input and spatiotemporal convolutional processing tuned to the infant domain. We support our model with the first public annotated infant respiration dataset with 125 videos (AIR-125), drawn from eight infant subjects, set varied pose, lighting, and camera conditions. We include manual respiration annotations and optimize AIRFlowNet training on them using a novel spectral bandpass loss function. When trained and tested on the AIR-125 infant data, our method significantly outperforms other state-of-the-art methods in respiratory rate estimation, achieving a mean absolute error of $\sim$2.9 breaths per minute, compared to $\sim$4.7--6.2 for other public models designed for adult subjects and more uniform environments.
2023-07-24 Freeform three-mirror anastigmatic large-aperture telescope and receiver optics for CMB-S4 Patricio A. Gallardo et.al. 2307.12931v1 null CMB-S4, the next-generation ground-based cosmic microwave background (CMB) observatory, will provide detailed maps of the CMB at millimeter wavelengths to dramatically advance our understanding of the origin and evolution of the universe. CMB-S4 will deploy large and small aperture telescopes with hundreds of thousands of detectors to observe the CMB at arcminute and degree resolutions at millimeter wavelengths. Inflationary science benefits from a deep delensing survey at arcminute resolutions capable of observing a large field of view at millimeter wavelengths. This kind of survey acts as a complement to a degree angular resolution survey. The delensing survey requires a nearly uniform distribution of cameras per frequency band across the focal plane. We present a large-throughput, large-aperture (5-meter diameter) freeform three-mirror anastigmatic telescope and an array of 85 cameras for CMB observations at arcminute resolutions, which meets the needs of the delensing survey of CMB-S4. A detailed prescription of this three-mirror telescope and cameras is provided, with a series of numerical calculations that indicate expected optical performance and mechanical tolerance.
2023-07-24 Trust-aware Safe Control for Autonomous Navigation: Estimation of System-to-human Trust for Trust-adaptive Control Barrier Functions Saad Ejaz et.al. 2307.12815v1 null A trust-aware safe control system for autonomous navigation in the presence of humans, specifically pedestrians, is presented. The system combines model predictive control (MPC) with control barrier functions (CBFs) and trust estimation to ensure safe and reliable navigation in complex environments. Pedestrian trust values are computed based on features, extracted from camera sensor images, such as mutual eye contact and smartphone usage. These trust values are integrated into the MPC controller's CBF constraints, allowing the autonomous vehicle to make informed decisions considering pedestrian behavior. Simulations conducted in the CARLA driving simulator demonstrate the feasibility and effectiveness of the proposed system, showcasing more conservative behaviour around inattentive pedestrians and vice versa. The results highlight the practicality of the system in real-world applications, providing a promising approach to enhance the safety and reliability of autonomous navigation systems, especially self-driving vehicles.